Stochastic-and-statistical-methods-in-hydrology-and-environmental-engineering-time-series-analysis-in-hydrology-and-environmental-engineering.pdf

  • Uploaded by: Geomar Perales
  • 0
  • 0
  • March 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Stochastic-and-statistical-methods-in-hydrology-and-environmental-engineering-time-series-analysis-in-hydrology-and-environmental-engineering.pdf as PDF for free.

More details

  • Words: 150,395
  • Pages: 469
Loading documents preview...
STOCHASTIC AND STATISTICAL METHODS IN HYDROLOGY AND ENVIRONMENTAL ENGINEERING VOLUME 3

TIME SERIES ANAL YSIS IN HYDROLOGY AND ENVIRONMENTAL ENGINEERING

Water Science and Technology Library VOLUME 10/3

Series Editor: V. P. Singh, Louisiana State University, Baton Rouge, US.A. Editorial Advisory Board: S. Chandra, Roorkee, u.P., India J. C. van Dam, Pijnacker, The Netherlands M. Fiorentino, Potenza, Italy W. H. Hager, Zurich, Switzerland N. Harmanciogiu, Izmir, Turkey V. V. N. Murty, Bangkok, Thailand J. Nemec, GenthodiGeneva, Switzerland A. R. Rao, West Lafayette, Ind., U.S.A. Shan Xu Wang, Wuhan, Hubei, P.R. China

The titles published in this series are listed at the end of this volume.

STOCHASTIC AND STATISTICAL METHODS IN HYDROLOGY AND ENVIRONMENTAL ENGINEERING Volume 3

TIME SERIES ANALYSIS IN HYDROLOGY AND ENVIRONMENTAL ENGINEERING edited by

KEITH W. HIPEL Departments of Systems Design Engineering and Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada

A. IAN McLEOD Department of Statistical and Actuarial Sciences, The University of Western Ontario, London, Ontario, Canada and Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada

U.S.PANU Department of Civil Engineering, Lakehead University, Thunder Bay, Ontario, Canada

VIJAY P. SINGH Department of Civil Engineering, Louisiana State University, Baton Rouge, Louisiana, U.S.A.

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.

Library of Congress Cataloging-in-Publication Data Stochastic and statistical methods in hydrology and environmental engineering. p. cm. -- (Water science and technology library; v. 10) Papers presented at an international conference held at the University of Waterloo, Canada, June 21-23, 1993. Inc I udes index. Contents: v. 1. Extreme values: floods and droughts I edited by Keith W. Hipel -- v. 2. Stochastic and statistical modelling with groundwater and surface water applications I edited by Keith W. Hipel -- v. 3. Time series analysis in hydrology and environmental engineering I edited by Keith W. Hipel ... [et al.l -- v. 4. Effective enVironmental management for sustainable development I edited by Keith W. Hipel and Liping Fang. ISBN 978-90-481-4379-5 ISBN 978-94-017-3083-9 (eBook) DOI 10.1007/978-94-017-3083-9

1. Hydrology--Sratistical methods--Congresses. processes--Congresses. I. Series. GB656.2.S7SS15 1994 551.4S'01'5195--dc20

2. Stochastic 94-2770S

ISBN 978-90-481-4379-5

Printed on acid-free paper

All Rights Reserved © 1994 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1994 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means. electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the copyright owner.

In Memory of Professor T.E. Unny (1929 - 1991)

Professor Unny is shown exammmg posters for the International Conference on Stochastic and Statistical Methods in Hydrology and Environmental Engineering held in his honour June 21 to 23, 1993. The photograph was taken at the University of Waterloo on December 20, 1991, eight days before Professor Unny's untimely death .

TABLE OF CONTENTS PREFACE . . • • • • . . • . . . . . . . • . • . •

xi

AN INTERNATIONAL CELEBRATION .•

xv

ACKNOWLEDGEMENTS . . . . . • . PART I:

CLIMATIC CHANGE Applications of Stochastic Modeling in Climate Change Impact Assessment D. P. LETTENMAIER

PART II:

3

Knowledge Based Classification of Circulation Patterns for Stochastic Precipitation Modeling A. BARDOSSY, H. MUSTER, L. DUCKSTEIN and I. BOGARD!

19

Grey Theory Approach to Quantifying the Risks Associated with General Circulation Models B. BASS, G. HUANG, Y. YIN and S. J. COHEN

33

A Nonparametric Renewal Model for Modeling Daily Precipitation B. RAJAGOPALAN, U. LALL and D. G. TARBOTON

47

FORECASTING Forecasting B.C. Hydro's Operation of Williston Lake - How Much Uncertainty is Enough D. J. DRUCE

63

Evaluation of Streamflow Forecasting Models T. TAO and W. C. LENNOX

77

Application of a Transfer Function Model to a Storage-Runoff Process P.-S. YU, C.-L. LIU and T.-Y. LEE

87

Seeking User Input in Inflow Forecasting T. TAO, I. CORBU, R. PENN, F. BENZAQUEN and L. LAI

99

Linear Procedures for Time Series Analysis in Hydrology P. R. H. SALES, B. de B. PEREIRA and A. M. VIEIRA PART III:

XIX

105

ENTROPY Application of Probability and Entropy Concepts in Hydraulics C.-L. CHIU

121

viii

PART IV:

PART V:

PART VI:

TABLE OF CONTENTS

Assessment of the Entropy Principle as Applied to Water Quality Monitoring Network Design N. B. HARMANCIOGLU, N. ALPASLAN and V. P. SINGH

135

Comparisons betweeen Bayesian and Entropic Methods for Statistical Inference J. N. KAPUR, H. K. KESAVAN and G. BACIU

149

An Entropy-Based Approach to Station Discontinuance N. B. HARMANCIOGLU

163

Assessment of Treatment Plant Efficiencies by the Entropy Principle N. ALPASLAN

177

Infilling Missing Monthly Streamflow Data Using a Multivariate Approach C. GOODIER and U. PANU

191

NEURAL NETWORKS

Application of Neural Networks to Runoff Prediction M.-L. ZHU, M. FUJITA and N. HASHIMOTO

205

Prediction of Daily Water Demands by Neural Networks S. P. ZHANG, H. WATANABE and R. YAMADA

217

Backpropagation in Hydrological Time Series Forecasting G. LACHTERMACHER and J. D. FULLER

229

TREND ASSESSMENT

Tests for Monotonic Trend A.1. MCLEOD and K. W. HIPEL

245

Analysis of Water Quality Time Series Obtained for Mass Discharge Estimation B. A. BODO, A. 1. MCLEOD and K. W. HIPEL

271

De-Acidification Trends in Clearwater Lake near Sudbury, Ontario 1973-1992 B. A. BODO and P. J. DILLON

285

SPATIAL ANALYSIS

Multivariate Kernel Estimation of Functions of Space and Time Hydrologic Data U. LALL and K. BOSWORTH

301

Comparing Spatial Estimation Techniques for Precipitation Analysis J. SATAGOPAN and B. RAJAGOPALAN

317

TABLE OF CONTENTS

PART VII:

PART VIII:

ix

SPECTRAL ANALYSIS Exploratory Spectral Analysis of Time Series A. LEWANDOWSKI

333

On the Simulation of Rainfall Based on the Characteristics of Fourier Spectrum of Rainfall U. MATSUBAYASHI, S. HAYASHI and F. TAKAGI

347

TOPICS IN STREAMFLOW MODELLING Cluster Based Pattern Recognition and Analysis of Streamflows T. KOJIRI, T. E. UNNY and U. S. PANU

363

ReMus, Software for Missing Data Recovery H. PERRON, P. BRUNEAU, B. BOBEE and L. PERREAULT

381

Seasonality of Flows and its Effect on Reservoir Size R. M. PHATARFOD and R. SRIKANTHAN

395

Estimation of the Hurst Exponent hand Geos Diagrams for a Non-Stationary Stochastic Process G. POVEDA and O. J. MESA

409

Optimal Parameter Estimation of Conceptually-Based Streamflow Models by Time Series Aggregation P. CLAPS and F. MURRONE

421

On Identification of Cascade Systems by Nonparametric Techniques with Applications to Pollution Spread Modeling in River Systems A. KRZYZAK

435

Patching Monthly Streamflow Data - A Case Study Using the EM Algorithm and Kalman Filtering G. G. S. PEGRAM

449

Runoff Analysis by the Quasi Channel Network Model in the Toyohira River Basin H. SAGA, T. NISHIMURA and M. FUJITA

459

Author Index .

469

Subject Index.

471

PREFACE Objectives To understand how hydrological and environmental systems behave dynamically, scientists and engineers take measurements over time. In time series modelling and analysis, time series models are fitted to one or more sequences of observations describing the system for purposes such as environmental impact assessment, forecasting, simulation and reservoir operation. When applied to a natural system, time series modelling furnishes an enhanced appreciation about how the system functions, especially one that is heavily affected by land use changes. This in turn means that better decisions can ultimately be made so that human beings can properly manage their activities in order to live in harmony with their natural environment. The major objective of this edited volume is to present some of the latest and most promising approaches to time series analysis as practiced in hydrology and environmental engineenng. Contents As listed in the Table of Contents, the book is. divided into the following main parts:

PART PART PART PART PART PART PART PART

I II III IV V VI VII VIII

CLIMATIC CHANGE FORECASTING ENTROPY NEURAL NETWORKS TREND ASSESSMENT SPATIAL ANALYSIS SPECTRAL ANALYSIS TOPICS IN STREAMFLOW MODELLING

An important topic of widespread public concern in which time series analysis has a crucial role to play is the systematic study of climatic change. In Part I, significant contributions to climatic change are described in an interesting set of papers. For instance, the first paper in this part is a keynote paper by Dr. D. P. Lettenmaier that focuses upon time series or stochastic models of precipitation that account for climatic driving variables. These models furnish a mechanism for transcending the spatial scales between general circulation models and the much smaller spatial scale at which water resources effects have to be studied and interpreted. The contributions contained in Part II provide useful results in hydrological forecasting. A range of intriguing applications in hydrological forecasting are given for case studies involving reservoir operation in British Columbia, Canada, Guangdong Province in China, Taiwan, the Canadian Province of Ontario, and Brazil. xi

xii

PREFACE

Within Part III, new developments in entropy are described and entropy concepts are applied to problems in hydraulics, water quality monitoring, discontinuance of hydrologic measurement stations, treatment plant efficiency and estimating missing monthly streamflow data. Neural networks are employed in Part IV for forecasting runoff and water demand. Trend assessment techniques have widespread applicability to environmental impact assessment studies. In Part V, a number of trend assessment techniques are evaluated and graphical, nonparametric and parametric trend methods are applied to water quality data. In Part VI, nonparametric and parametric approaches to spatial analysis are described and applied to practical hydrological problems. Next, some unique findings in spectral analysis are given in Part VII. Finally, Part VIII is concerned with a variety of interesting topics in streamflow modelling. Audience This book should be of direct interest to anyone who is concerned with the latest developments in time series modelling and analysis. Accordingly, the types of Professionals who may wish to use this book include: Water Resources Engineers Environmental Scientists Civil Engineers Earth Scientists Hydrologists Geographers Planners Statisticians Systems Engineers Management Scientists Within each professional group, the book should provide useful information for: Researchers Teachers Students Practitioners and Consultants

PREFACE

xiii

When utilized for teaching purposes, the book could serve as a complementary text at the upper undergraduate and graduate levels. The recent environmetrics book by K. W. Hipel and A. 1. McLeod that is entitled Time Series Modelling of Water Resources and Environmental Systems (published by Elservier, Amsterdam, 1994, ISBN 0 444 89270-2), contains an extensive list of time series analysis books (see Section 1.6.3) that could be used in combination with this current volume in university courses. Researchers should obtain guidance and background material for carrying out worthwhile research projects in time series analysis in hydrology and environmental engineering. Consultants who wish to keep their companies at the leading edge of activities in time series analysis and thereby serve their clients in the best possible ways will find this book to be an indispensable resource.

AN INTERNATIONAL CELEBRATION Dedication The papers contained in this book were originally presented at the international conference on Stochastic and Statistical Methods in Hydrology and Environmental Engineering that took place at the University of Waterloo, Waterloo, Ontario, Canada, from June 21 to 23, 1993. This international gathering was held in honour and memory of the late Professor T.E. Unny in order to celebrate his lifelong accomplishments in many of the important environmental topics falling within the overall conference theme. When he passed away in late December, 1991, Professor T.E. Unny was Professor of Systems Design Engineering at the University of Waterloo and Editor-in-Chief of the international journal entitled Stochastic Hydrology and Hydraulics. About 250 scientists from around the world attended the Waterloo conference in June, 1993. At the conference, each participant was given a Pre-Conference Proceedings, published by the University of Waterloo and edited by K.W. Hipel. This 584 page volume contains the detailed conference program as well as the refereed extended abstracts for the 234 papers presented at the conference. Subsequent to the conference, full length papers submitted for publication by presenters were mailed to international experts who kindly carried out thorough reviews. Accepted papers were returned to authors for revisions and the final manuscripts were then published by Kluwer according to topics in the following four volumes:

STOCHASTIC AND STATISTICAL MODELLING WITH GROUNDWATER AND SURFACE WATER APPLICATIONS edited by Keith W. Hipel EFFECTIVE ENVIRONMENTAL MANAGEMENT FOR SUSTAINABLE DEVELOPMENT edited by Keith W. Hipel and Liping Fang EXTREME VALUES: FLOODS AND DROUGHTS edited by Keith W. Hipel as well as the current book on: TIME SERIES ANALYSIS IN HYDROLOGY AND ENVIRONMENTAL ENGINEERING edited by Keith W. Hipel, A. Ian McLeod, U. S. Panu and Vijay P. Singh xv

xvi

AN INTERNATIONAL CELEBRATION

The Editors of the volumes as well as Professor Unny's many friends and colleagues from around the globe who wrote excellent research papers for publication in these four volumes, would like to dedicate their work as a lasting memorial to Professor T. E. Unny. In addition to his intellectual accomplishments, Professor Unny will be fondly remembered for his warmth, humour and thoughtful consideration of others. Conference Organization and Sponsorships

The many colleagues and sponsors who took part in the planning and execution of the international conference on Stochastic and Statistical Methods in Hydrology and Environmental Engineering are given below. Organizing Committee K. W. Hipel (Chairman) A. I. McLeod U. S. Panu V. P. Singh International Programme Committee Z. Kundzewicz (Poland) S. Al-Nassri (Malaysia) Gwo-Fong Lin (Taiwan) H. Bergmann (Austria) C. Lemarechal (France) J. Bernier (France) 1. Logan (Canada) B. Bobee (Canada) D. P. Loucks (U.S.A.) B. Bodo (Canada) I. B. MacNeill (Canada) D. S. Bowles (U.S.A.) A. Musy (Switzerland) W. P. Budgell (Norway) P. Nachtnebel (Austria) S. J. Burges (U.S.A.) D. J. Noakes (Canada) F. Camacho (Canada) N. Okada (Japan) S. Chandra (India) R. M. Phatarford (Australia) C-L. Chiu (U.S.A.) V. Privalsky (U .S.S.R.) J. Ding ( China) D. Rosbjerg (Denmark) L. Duckstein (U.S.A.) A. H. El-Shaarawi (Canada) J. D. Salas (U.S.A) G. A. Schultz (Germany) M. Fiorentino (Italy) S. Serrano (U.S.A.) E. Foufoula (U.S.A.) U. Shamir (Israel) I. C. Goulter (Australia) S. P. Simonovic (Canada) Y. Y. Haimes (U.S.A.) S. Sorooshian (U.S.A.) N. Harmancioglu (Turkey) A. Szollosi-Nagy (France) S. Ikebuchi (Japan) C. Thirriot (France) Karmeshu (India) W. E. Watt (Canada) M. 1. Kavvas (U.S.A.) S. J. Yakowitz (U.S.A.) J. Kelman (Brazil) V. Yevjevich (U.S.A.) J. Kindler (Poland) Y. C. Zhang (China) G. Kite (Canada) P. Zielinski (Canada) T. Kojiri (Japan) R. Krzysztofowicz (U.S.A.)

AN INTERNATIONAL CELEBRATION

xvii

University of Waterloo Committee

A. Bogobowicz S. Brown D. Burns C. Dufournaud 1. Fang G. Farquhar

T. Hollands J. D. Kalbfleisch E. LeDrew E. A. McBean K. Ponnambalam E. Sudicky

Financial Support

Conestoga/Rovers and Associates Cumming Cockburn Limited Department of Systems Design Engineering, University of Waterloo Faculty of Engineering, University of Waterloo Natural Sciences and Engineering Research Council (NSERC) of Canada Sponsors

American Geophysical Union American Water Resources Association Association of State Floodplain Managers Canadian Society for Civil Engineering Canadian Society for Hydrological Sciences IEEE Systems, Man and Cybernetics Society Instituto Panamericano de Geografia e Historia International Association for Hydraulic Research International Association of Hydrological Sciences International Commission of Theoretical and Applied Limnology International Commission on Irrigation and Drainage International Institute for Applied Systems Analysis International Statistical Institute International Water Resources Association Lakehead University Louisiana State University North American Lake Management Society The International Environmetrics Society The Pattern Recognition Society The University of Western Ontario University of Waterloo

xviii

AN INTERNATIONAL CELEBRATION

University of Waterloo President James Downey, Opening and Banquet Addresses D. Bartholomew, Graphic Services Danny Lee, Catering and Bar Services Manager D. E. Reynolds, Manager, Village 2 Conference Centre T. Schmidt, Engineering Photographic Audio Visual Centre Food Services Graduate Students in Systems Design Engineering

Technical Assistance Mrs. Sharon Bolender Mr. Steve Fletcher Mr. Kei Fukuyama Ms. Hong Gao Ms. Wendy Stoneman Mr. Roy Unny

ACKNOWLEDGEMENTS The Editors would like to sincerely thank the authors for writing such excellent papers for publication in this as well as the other three volumes. The thoughtful reviews of the many anonymous referees are also gratefully acknowledged. Moreover, the Editors appreciate the fine contributions by everyone who attended the Waterloo conference in June, 1993, and actively took part in the many interesting discussions at the paper presentations. Additionally, the Editors would like to say merci beaucoup to the committee members and sponsors of the Waterloo conference listed in the previous section. Dr. Roman Krzysztofowicz, University of Virginia, and Dr. Sidney Yakowitz, University of Arizona, kindly assisted in organizing interesting sessions at the Waterloo conference for papers contained in this volume. Furthermore, Dr. R. M. Phatarford, Monash University in Australia, and Dr. K. Ponnambalam, University of Waterloo, were particularly helpful in suggesting reviewers as well as carrying out reviews for papers published in this book. Finally, they sincerely appreciate all the thoughtful personnel at Kluwer who assisted in the publication of the volumes, especially Dr. Petra D. Van Steenbergen, the Acquisition Editor.

Keith W. Hipel

A. Ian McLeod

Professor and Chair Department of Systems Design Engineering

Professor Department of Statistical and Actuarial Sciences The University of Western Ontario

Cross Appointed Professor to Department of Statistics and Actuarial Science University of Waterloo

U.S. Panu Professor Department of Civil Engineering Lakehead University

Adjunct Professor Department of Systems Design Engineering University of Waterloo

Vijay P. Singh Professor Department of Civil Engineering Louisiana State University

April, 1994

xix

PART I CLIMATIC CHANGE

APPLICATIONS OF STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT

DENNIS P. LETTENMAIER Department of Civil Engineering FX-10 University of Washington Seattle, W A 98195 The development of stochastic models of precipitation has been driven primarily by practical problems of hydrologic data simulation, particularly for water resource systems design and management in data-scarce situations, and by scientific interest in the probabilistic structure of the arrival process of precipitation events. The need for better methods of developing local climate scenarios associated with alternative climate simulations produced by global atmospheric general circulation models (GCMs) has provided another application for stochastic models of precipitation, but necessitates different model structures. Early attempts to model the stochastic structure of the precipitation arrival process are reviewed briefly. These include first order homogeneous Markov chains, as well as more advanced point process models designed to represent the clustering of precipitation events often recorded in observations of daily and shorter time scale observation series. The primary focus of this paper, however, is stochastic models of precipitation that account for climatic driving variables. Such models provide a means of transcending the spatial scales between GCMs and the much smaller spatial scale at which water resources effects need to be interpreted. The models reviewed generally make use of two types of information. The first is a set of atmospheric variables measured over the GCM grid mesh with node spacing of several degrees latitude by longitude. The second is a set of concurrent point precipitation observations, at several locations within the largescale grid mesh, observed at the same time frequency (usually one day or less) as the large-scale atmospheric variables. A variety of methods of summarizing the atmospheric variables via subjective and objective weather typing procedures are reviewed, as are various approaches for stochastically coupling the large-scale atmospheric indicator variables with the precipitation arrival and amounts process.

INTRODUCTION Stochastic models of the precipitation arrival process were originally developed to address practical problems of data simulation, particularly for water resource systems design and management in data-scarce situations, and to aid in understanding the probabilistic structure of precipitation. Early attempts to model the stochastic structure of the precipitation arrival process (wet/dry occurrences) were based on first-order homogeneous Markov chains (e.g., Gabriel and Neumann 1957; 1962). Various extensions of Markov models have since been 3 K. W. Hipel etal. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 3-17. © 1994 Kluwer Academic Publishers.

4

D. P. LEITENMAffiR

explored to accommodate inhomogeneity (such as seasonality) of the transition probabilities (e.g., Weiss, 1964; Woolhiser and Pegram, 1979; Stern and Coe, 1984) and to incorporate precipitation amounts (e.g., Khanal and Hamrick, 1974; Haan, et al., 1976). Markov models have fallen from favor, however, because they are unable to reproduce the long-term persistence of wet and dry spells and the clustering observed in rainfall occurrence series at daily or shorter time intervals (Foufoula-Georgiou, 1985). Since that time, more advanced point process models have been developed, such as those of Kavvas and Delleur (1981), RodriguezIturbe, et al. (1987), Foufoula-Georgiou and Lettenmaier (1987), Smith and Karr (1985), and others. Most of this recent work, which is reviewed in Georgakakos and Kavvas (1987), is based on point process theory (e.g., LeCam, 1961). The Markov chain and point process models are similar to the extent that they are restricted to single-station applications, and are not easily generalizable to multiple station applications, at least without (in the case of Markov chain models) explosive growth in the number of parameters. In addition, all of the above models describe the precipitation process unconditionally, that is, they do not incorporate cause-effect information, such as descriptors of large-scale meteorological conditions that might give rise to wet, or dry, conditions. Recent interest in assessments of the hydrologic effects of climate change has placed different demands on stochastic precipitation models. Much of the concern about global warming has been based on simulations of climate produced by global general circulation models of the atmosphere (GCMs). These models operate at spatial scales of several degrees latitude by several degrees longitude, and time steps usually from several minutes to several tens of minutes. The models are fully self-consistent with respect to the energy and water budgets of the atmosphere, and therefore produce predictions of both free atmosphere variables (e.g., vertical profiles of atmospheric pressure, temperature, wind, and liquid and vapor phase moisture) as well as surface fluxes (precipitation, latent and sensible heat, short and long-wave radiation, ground heat flux). In principle, the surface fluxes could be used directly to drive hydrologic models which could serve to disaggregate the GCM surface fluxes spatially to predict, for instance, streamflow. However, this approach is not at present feasible for two reasons. First, the sc~e ~sm'2tch between the GCM grid mesh and the catchment scale (typically 10 -10 km) that is of interest for effects studies presents formidable obstacles. Second, GCM surface flux predictions are notoriously poor at scales much less than continental. Figure 1 shows, as an example, long-term average rainfall predicted by the CSIRO GCM (Pittock and Salinger, 1991) for present climate and CO 2 doubling for a grid cell in southeastern Australia as compared with an average of severallong-ter~ precipitation gauges located in the grid cell. Although the seasonal pattern (winter-dominant precipitation) is the same in the observations and model predictions, the model underpredicts the annual precipitation by a factor of about two. Such differences in GCM predictions of precipitation are not atypical (see, for instance, Grotch, 1988), and in fact the predictions shown in Figure 1 might in some respects be considered a "success" because the seasonal pattern is correctly predicted by the GCM. These results do highlight one of the dangers in attempting to dis aggregate GCM output directly: the signal (difference between 2 x CO2 and 1 x CO 2 climates) is considerably less than the bias (difference between 1 x CO 2 and historical climates). Giorgi and Mearns (1991) review what they term "semi-empirical approaches" to the simulation of regional climate change. These are essentially stochastic models, which rely on the fact that the GCM predictions of free atmosphere

5

STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT

variables are usually better than those of surface fluxes. They therefore attempt to relate local

200

X 1 x CO2

o 2xC02

E E 150 C

-

~

°a 0Q

...

c..

+ HIstorical

100

-+ -+

50

-+

,..+

0

J

F M A M J J A SON 0 MONTH

Figure 1. Comparison of monthly average precipitation simulated by CSIRO GCM (Pittock and Salinger, 1991) for a southeastern Australia grid cell with historical average of station data in the grid cell. (e.g., catchment-scale). surface variables (especially precipitation) to GCM free atmosphere variables. Among these methods are Model Output Statistics (MaS) routines, which are essentially regressions that adjust numerical weather predictions (produced on grid meshes smaller than those used for GCM climate simulations, but still "large" compared to the scale required for local assessments). MaS adjustments to precipitation are used, for instance, in quantitative precipitation forecasts (QPFs) for flood forecasting. One drawback of these routines is that they attempt to produce "best estimates" in the least squares sense. While this may be appropriate for forecasting, least squares estimates will usually underestimate the natural variability, which is a critical deficiency for climate effects assessments. Other semi-empirical approaches have been developed to relate longer term GCM simulations of free atmosphere variables to local precipitation. Among these are the canonical correlation approach of von Storch, et al. (1993), and the regression approach of Wigley, et al. (1990). The disadvantage of these approaches is that the resulting local variable predictions are at a time scale much longer than the catchment response scale, hence there is no practical way to incorporate the predictions within a rainfall-runoff modeling framework from which water resources effects interpretations might be made. This difficulty could presumably be resolved by using shorter time steps (e.g., daily rather than

6

D. P. LETTENMAIER

monthly, which would be more appropriate for hydrologic purposes). However, in the case of the regression approach of Wigley, et al. (1990), the input variables for local precipitation predictions include large-scale precipitation, which is responsible for much of the predictive accuracy. However, in their analysis, Wigley, et al. (1990) used the mean of station data for the large-scale precipitation. Unfortunately, as shown by Figure 1, GCM precipitation predictions are often badly biased, and this bias would be transmitted to the local predictions. Nonetheless, the considerable experience that has been developed over the last thirty years in developing local meteorological forecasts has largely been unexploited for local climate simulation. There is sufficient similarity between the two problems that investigation of extensions of methods such as MOS to hydrological simulation may prove fruitful.

STOCHASTIC PRECIPITATION MODELS WITH EXTERNAL FORCING Several investigators have recently explored stochastic precipitation models that operate at the event scale (defined here as daily or shorter) and incorporate, explicitly or implicitly, external large-area atmospheric variables. The motivation for development of these methods has been, in part, to provide stochastic sequences that could serve as input to hydrologic (e.g., precipitation-runoff) models. Most of the work in this area has utilized, either directly or via summary measures, large-scale free atmosphere variables rather than large-area surface fluxes. In this respect, their objective has been to simulate stochastically realistic precipitation sequences that incorporate large area information as external drivers. This approach is fundamentally different than disaggregation methods, such as MOS, which attempt to relate large-scale predictions directly to smaller scales. Weather classification schemes Weather classification schemes have been the mechanism used by several authors to summarize large-area meteorological information. The general concept of weather classification schemes (see, for example, Kalkstein, et al., 1987) is to characterize large-area atmospheric conditions by a single summary index. Externally forced stochastic precipitation models can be grouped according to whether the weather classification scheme is subjective or objective, and whether it is unconditional or conditional on the local conditions (e.g., precipitation occurrence) . Subjective classification procedures include the scheme of Baur (1944), from which a daily sequence of weather classes dating from 1881 to present has been constructed by the German Federal Weather Service (Bardossy and Caspary, 1990), and the scheme of Lamb (1972), which has formed the basis for construction of a daily sequence of weather classes for the British Isles dating to 1861. These subjective schemes are primarily based on large-scale features in the surface pressure distribution, such as the location of semipermanent pressure centers, the position and paths of frontal zones, and the existence of cyclonic and anticyclonic circulation types (Bardossy and Caspary, 1990). Objective classification procedures utilize statistical methods, such as principal components, cluster analysis, and other multivariate methods to develop rules for classification of multivariate spatial data. For instance, McCabe, et al. (1990) utilized a combination of principal components and cluster analysis to form daily

7

STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT

weather classes at Philadelphia. The statistical model was compared to a subjective, conceptual model, which was found to give similar results. Wilson, et al. (1992) explored classification methods based on K-means cluster analysis, fuzzy cluster analysis, and principal components for daily classification of weather over a large area of the Pacific Northwest. Both of the above methods are unconditional on local conditions, that is, no attempt is made to classify the days in such a way that local precipitation, for instance, is well-described by the weather classes. Figure 2, taken from Wilson, et al. (1992) shows that for one of the precipitation stations considered, the unconditional classification scheme used (principal components of surface pressure, geopotential heights at 850 and 700 mb, and the east-west wind component at 850 mb) over a widely spaced grid mesh) resulted in good discrimination of precipitation for only one (winter) of the four seasons considered. 12

E

10

z

8

~

6

n.

2

.l!-

0

n.

13 W a:

· ·· 0

4

0

. .....*" .

Season 1

.

....

•0

Class 1 CI..s2 Class 4

..

o

\
6+ft "'0

~

t: n. 13 W a:

00

~.

n.

&0_._""" K~A:Jt if

....

95 90 SO

..

S



K

0

a •

.0

4

.... .

K.O

'0'

0 0

o-

2 0

20 10 5

.

Season 3

0

o

0

Class 3

~

z

0.1

o'

.

.8Jro.o

.D . . . . . . . IO . . . ~1f60

....

95 90 80

50

20 10 5

0.1

EXCEEDANCE PROBABILITY. PERCENT'

EXCEEDANCE PROBABILITY, PERCENT S

E &

z

6

E

Season 2

...••...

0

~

t: n. 13 W a:

n.

• II

Q

'0

+.0 0

4

~

• Do

0



....

.. . ....

x-e ....... _ .. )C~

.

50

15

;:; W

a:

+h.IIO~

..

10

, D-

t .. ~

5

'"

10 5

EXCEEDANCE PROBABILITY, PERCENT

0

n.

~.

20

. .,a

Season 4

n.

o.

2

&

z

0.1

....

..,.,...., .o ... Q6+ef' +

.,. ...

A*~+

~r

M

9590

so

SO

20

10 5

nI

EXCEEOANCE PROBABIUTV. rr-nCFNT

Figure 2. Cumulative distribution of precipitation by three-month seasons (JFM, AMJ, JAS, OND) and weather class for Stampede Pass, WA (from Wilson, et al., 1992). Hughes, et al. (1993) used an alternative approach that selected the weather classes so as to maximize the discrimination of local precipitation, in terms of joint precipitation occurrences (presence/absence of precipitation at four widely separated stations throughout a region of dimensions about 1000 km). The procedure used was CART (Breiman, et al., 1984), or Qlassification and

D. P. LETTENMAIER

8

Regression Trees. The large area information was principal components of sea level pressure. Figure 3 shows the discrimination of the daily precipitation distribution at one of the stations modeled,

-

o A

C

v

en ~
+ x o

~

statet state 2 state 3 state 4 state 5

state 6

c

o v

-

.~

'0.

'g....

(\J

a..

o

99 5 20 50 80 Exceedance probability. percent

0.1

Figure 3. Cumulative distributions of precipitation by weather state for Forks, WA in winter, using CART weather classification procedure of Hughes, et al. (1993). Forks, according to weather class. As expected, because the classification scheme explicitly attempts to "separate" the precipitation (albeit occurrence/absence rather than amount) by the selected classes, the resulting precipitation distributions are more distinguishable than those obtained by Wilson, et al. (1992). Hughes, et al. (1993) also simulated daily temperature minima and maxima. For this purpose, they used a Markov model conditioned on the present and previous days' rain state. A final method of weather class identificatioIl is implicit. Zucchini and Guttorp (1991) describe the application of a set of models known as hidden Markov to precipitation occurrences. The objective of their study was to model the (unconditional) structure of the precipitation arrival process. The properties of the hidden states, which could be (although do not necessarily need to be) interpreted as weather states, were not explicitly evaluated. Hughes, et al. (1993) explored a larger class of nonhomogeneous hidden Markov models (NHMM), of which the model of Zucchini and Guttorp is a special case. He explored models of the precipitation occurrence process in which the atmospheric states were explicit, but were inferred by the NHMM estimation procedure. In this model, therefore, the weather state and stochastic precipitation structure are completely integrated. For this reason, further comments on the NHMM model are deferred to the next section.

STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT

9

Conditional stochastic precipitation models

Hay, et al. (1991) used a classification method based on wind direction and cloud cover (McCabe, 1990) which was coupled with a semi-Markov model to simulate temporal sequences of weather types at Philadelphia. Semi-Markov models (Cox and Lewis, 1978) with seasonal transition probabilities and parameters of the sojourn time distribution, were used to simulate the evolution of the weather states. This step is not strictly necessary if a lengthy sequence of variables defining the large-area weather states (the classification used required daily wind direction and cloud cover data) is available. Where such sequences are not available, which is sometimes the case for GCM simulations, fitting a stochastic model to the weather states has the advantage that it decouples the simulation of precipitation, and other local variables, from a particular GCM simulation sequence. The method of simulating daily precipitation conditional on the weather state used by Hay, et al. (1991) was as follows. For each weather state and each of 11 weather stations in the region, the unconditional probability of precipitation was estimated from the historic record. Then, conditional on the weather state (but unconditional on precipitation occurrence and amount at the other stations and previous time) the precipitation state was selected based on the unconditional precipitation occurrence probability. Precipitation amounts were drawn from the product of a uniform and exponential distribution. Retrospective analysis of the model showed that those variables explicitly utilized for parameter estimation (conditional precipitation occurrence probabilities, mean precipitation amounts) were reproduced by the model. An analysis of dry period lengths suggested that the length of extreme dry periods was somewhat underestimated. Bardossy and Plate (1991) also used a semi-Markov model to describe the structure of the daily circulation patterns over Europe, with circulation types based on synoptic classification. They developed a model of the corresponding rainfall occurrence process that was Markovian within a weather state (circulation Precipitation type), but independent when the weather state changed. occurrences were assumed spatially independent. Bardossy and Plate (1991) applied the model to simulate the precipitation occurrences at Essen, Germany. For this station, they found that the persistence parameter in the occurrence model was quite small, so that the model was almost conditionally independent (that is, virtually all of the persistence in the rainfall occurrence process was due to persistence in the weather states). The model reproduced the autocorrelations of the rainfall occurrences, as well as the distributions of dry and wet days, reasonably well. This is somewhat surprising, since other investigators (e.g., Hughes, et al., 1993) have found that conditionally independent models tend to underestimate the tail of the dry period duration distribution. However, this finding is likely to depend on both the structure of the weather state process, and the precipitation occurrence process, which is regionally and site-specific. Bardossy and Plate (1992) extended the model of Bardossy and Plate (1991) to incorporate spatial persistence in the rainfall occurrences, and to model precipitation amounts explicitly. The weather state classification procedure was the same as in Bardossy and Plate (1991), and they retained the assumption of conditional independence under changes in the weather state. Rather than modeling the occurrence process explicitly, they modeled a multivariate normal random variable W. Negative values of W corresponded to the dry state, and (a transform of) positive values are the precipitation amount. Within a run of a

10

D. P. LETTENMAIER

weather state, W was assumed to be lag-one Markov. Spatial correlation in the occurrence process, and in the precipitation amounts, was modeled via the first two moments of W, which were weather state-dependent. The model was applied to 44 stations in the Ruhr River basin. The model was able to reproduce the first two unconditional moments of rainfall amounts, and precipitation probabilities, as well as the dry day durations, reasonably well at one of the stations (Essen, also used in the 1991 paper) selected for more detailed analysis. Wilson, et al. (1991) developed a weather classification scheme for the Pacific Northwest based on cluster analysis of surface pressure and 850 mb temperature over a 10 degree by 10 degree grid mesh located over the North Pacific and the western coast of North America. A ten-year sequence of weather states (1975-84) was formed, and was further classified according to whether or not precipitation occurred at a station of interest. The partitioned weather state vector was then modeled as a semi-Markov process. For wet states, precipitation amounts were simulated using a mixed exponential model. The wet and dry period lengths were simulated quite well, although some of the weather state frequencies were misestimated, especially in summer. The authors suggested that a heavier tailed distribution than the geometric, used for the lengths-of-stay in the semi-Markov model, might give better performance. The model is somewhat limited in that its generalization to multiple stations results in rapid growth in the number of parameters. Wilson, et al. (1992) explored a slightly different multiple station model, based on a Polya urn structure. Rather than explicitly incorporating the wet-dry state with the weather state, they developed a hierarchical modified model for the rainfall state conditioned on the weather state and the wet-dry state of the higher order station(s). In a Polya urn, the wet-dry state is obtained by drawing from a sample, initially of size N + M, a state, of which N are initially wet, and Mare initially dry. For each wet state drawn, the sample of wet states is increased by n. Likewise, for each dry state drawn, the sample of dry states is increased by m, and in both cases the original state drawn is "replaced". Thus, the Polya urn has more persistence than a binomial process, in which the state drawn would simply be replaced, and the probability of the wet or dry state is independent of the dry or wet period length. The modification to the Polya urn (employed by others as well, e.g., Wiser, 1965) is to replace the persistent process with a binomial process once a given run (wet or dry period) length w has been reached. In addition, the parameters of the model (N, n, M, m, and w) are conditioned on the weather state and the wet-dry status of the higher stations in the hierarchy, but the memory is "lost" when the state (combination of weather state and wet-dry status of higher stations in the hierarchy) changes. The model was applied to three precipitation stations in the state of Washington, using a principal components-based weather classification scheme for a region similar to that used by Wilson, et al. (1991). The precipitation amounts were reproduced reasonably well, especially for the seasons with the most precipitation. The dry and wet period lengths were also modeled reasonably well, although there was a persistent downward bias, especially for the lowest stations in the hierarchy. The major drawback of this model is that the number of parameters grows rapidly (power of two) with the number of stations. Also, the model performs best for the highest stations in the hierarchy, but there may not be an obvious way of determining the ordering of stations. All of the above models define the weather states externally, that is, the selection of the weather states does not utilize station information. Hughes, et al.

STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT

11

(1993) linked the selection of the weather states with observed precipitation occurrence information at a set of index stations using .the CART procedure described above. Precipitation occurrences and amounts were initially modeled assuming conditional independence, by simply res amp ling at random from the historical observations of precipitation at a set of target stations, given the weather states. They found that this model tended to underestimate the persistence of wet and dry periods. Model performance was improved by resampling precipitation amounts conditional on the present day's weather state and the previous day's rain state. Unlike the models of Bardossy and Plate (1991; 1992) the Markovian persistence was retained regardless of shifts in the weather state. Inclusion of the previous rain state reduced the problem with simulation of wet and dry period persistence. However, by incorporating information about the previous day's rain state the number of parameters grows rapidly with the number of stations. A somewhat different approach is the hidden Markov model (HMM), initially investigated for modeling rainfall occurrences by Zucchini and Guttorp (1991). The hidden Markov model is of the form p( R t ISI,Rr- 1) = P(RtIS t ) p(StI S 1) = P(St ISt_1)

l-

(la) (lb)

where R t is the rainfall occurrence (presence-absen~ at time t, St is the value of the hidden state at time t, and the notation SJ denotes the values of the unobserved process S from time 1 to T. Essentialfy, the model assumptions are that the rainfall state is conditionally independent, that is, it depends only on the value of the hidden state at the present time, and the hidden states are Markov. R t can be a vector of rainfall occurrences at multiple stations, in which case a model for its (spatial) covariance is required. The shortcoming of the HMM is that the hidden states are unknown, and even though they may well be similar to weather states, they cannot be imposed externally. Therefore, only unconditional simulations are possible, and in this respect the model is similar to the unconditional models of the precipitation arrival process discussed in Section 1. Hughes (1993) explored a class of nonhomogeneous hidden Markov models (NHMM) of the form

p(RtlsI,R~ ,Xl)

= P(RtIS t )

(2a)

P(StISr-1,X1) = P(StI St_1'X t )

(2b)

where X t is a vector of atmospheric variables at time t. In this model, the precipitation process is treated as in the HMM, that is, it is conditionally independent given the hidden state St. However, the hidden states depend explicitly on a set of atmospheric variables at time t, and the previous hidden state. As for the HMM, if R t is a vector of precipitation states at multiple locations, a model for the spatial covariances is required. Also, X t can be (and in practice usually will be) multivariate. Hughes (1993) explored two examples in which X t was a vector of principal components of the sea level pressure and 500 mb pressure height, and the model for P(StISt_1'Xt) was either Bayesian or autologistic. The Bayes and autologistic moilers are similar in terms of their parameterization; the structure of the autologistic model is somewhat more obvious structurally and is used for illustrative purposes here. It is of the form

D. P. LETTENMAIER

12

P(S IS X) t t-l' t -

exp(aSt 1 St + Xtb st 1 St) - , - , exp( a + Xb ) k St_l,k t St_l,k

L

(3)

where St denotes the particular values of St. In this model, if there are m hidden states, and w atmospheric variables (that is, X t is w-dimensional) the logistical model has m(m-l)(w+l) free variables. Note that the model for the evolution of St conditioned on St-l and Xt is effectively a regional model, and does not depend on the precipitation stations. Hughes (1993) explored two special cases of the model: 1: as t _l ,St = a St and b st _l ,st 2: b st _l ,st = bst .

= bst'

and

In the first case, the Markovian property of the NHMM is dropped, and the evolution of the hidden states depends only on the present value of the atmospheric variables. In the second model, the "base" component of the hidden state transition probabilities is Markov, but the component that depends on the atmospheric variables is a function only of the present value of the hidden state, and not the previous value. In one of the two examples explored, Hughes found, using a Bayes Information Criterion to discriminate between models, that the second model was the best choice. In the other example, the full Markov dependence was retained. In the two examples, Hughes evaluated the means of the weather variables corresponding to the hidden states. He found that the large area characteristics were reasonable. For winter, the states with the most precipitation on average corresponded to a low pressure system off the north Pacific coast, and the cases with the least precipitation corresponded to a high pressure area slightly inland of the coast. In the'second example, with modeled precipitation occurrences at 24 stations in western Washington, transitional states with differences in the surface and 500 mb flow patterns were shown to result in partial precipitation coverage (precipitation at some stations, and not at others). These results suggest that the NHMM may offer a reasonable structure for transmitting the effects of large area. circulation patterns to the local scale.

APPLICATIONS TO ALTERNATIVE CLIMATE SIMULATION Case studies Although the development of most of the models reviewed above has been motivated in part by the need for tools to simulate local precipitation for alternative climate scenarios, there have only been a few applications where climate model (GCM) scenarios have been downscaled using stochastic methods. Hughes, et al. (1993) estimated parameters of semi-Markov models from five-year 1 x CO 2 and 2 x CO 2 GFDL simulations of surface pressure and 850 mb temperature. From these five-year sequences, they computed daily weather states using algorithms developed from historical sea level pressure observations, and fit

STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT

13

semi-Markov models to the weather states as described above. The semi-Markov models were used to simulate 40-year weather state sequences corresponding to the 1 x CO 2 and 2 x CO 2 runs. Daily precipitation (and temperature maximaminima, using the model described in Section 2.2) were then simulated for the 40year period, and were used as input to a hydrologic model which was the basis for an assessment of shifts in flood risk that might be associated with climate change. Zorita, et al. (1993) used a model similar to that of Hughes, et al. (1993) to simulate daily precipitation for four sites in the Columbia River basin. They found that the model performed reasonably well in winter, but they had difficulties in application of the CART procedure to determine climate states in summer. They found that when stations relatively far from the Pacific Coast were used in definition of the multistation rain states in the CART algorithm, no feasible solutions resulted. They were able to avoid this problem by restricting the index stations to be relatively close to the coast, but when they attempted to apply the model to the middle Atlantic region, they could obtain CART weather states only when the index stations were quite closely spaced, and then only in winter. The difficulty appeared to be the shorter spatial scale of summer precipitation, and weaker coupling of local precipitation with the regional circulation patterns. Summer precipitation in the middle Atlantic region is dominated by local, convective storms, the occurrence of which is not well predicted by large-scale circulation patterns. This is true also, to a lesser extent, in the inland portion of the Columbia River basin. Some complications One of the motivations for development of stochastic models thai couple largearea atmospheric variables with local variables, such as precipitation, is to provide a means of downscaling simulations of alternative climates for effects assessments. However, as noted in the previous sections, most of the applications to date have been to historic data. For instance, local precipitation has been simulated using either an historic sequence of weather states (e.g., Hughes, et al., 1993) or via a stochastic model of the historic weather states (e.g., Hay, et al., 1991, Bardossy and Plate, 1992; Wilson, et al., 1992). For most of the models reviewed, it should be straightforward to produce a sequence of weather states corresponding to an alternative climate scenario (e.g., from a lengthy GCM simulation). There are, nonetheless, certain complications, the most obvious of which is biases in current climate (baseline) GCM simulations. For instance, Zorita, et al. (1993) found it necessary to use a weather state classification scheme based on sea level pressure anomalies to filter out biases in the mean GCM pressure fields. Otherwise the stochastic structure of the weather state sequences formed from the baseline GCM simulation were much different than those derived from the historic observations. Selection of the variables to use in the weather state classification is problematic. Wilson, et al. (1991) classified weather states using sea level pressure and 850 mb temperature. However, if this scheme is used with an alternative, warmer climate, the temperature change dominates the classification, resulting in a major change in the stochastic structure of the weather class sequence that may not be physically realistic. Although this problem is resolved by use of variables, such as sea level pressure, that more directly reflect large-area circulation patterns, elimination of temperature from consideration as a classifying variable is troublingly arbitrary. A related problem is the effect of the strength of

14

D. P. LETTENMAIER

the linkage between the weather states and the local variables. In a sense, the problem is analogous to multiple regression. If the regression is weak, i.e., it doesn't explain much of the variance in the dependent variable (e.g., local precipitation) and changes in the independent variables (e.g., weather states) won't be evidenced in predictions of the local variable. Therefore, one might erroneously conclude that changes in, for instance, precipitation would be small, merely because of the absence of strong linkages between the large-scale and local conditions (see, for example, Zorita, et al., 1993). Application of all of the models for alternative climate simulation requires that certain assumptions be made about what aspect of the model structure will be preserved under an alternative climate. All of the models have parameters that link the large-area weather states with the probability of occurrence, or amount of, local precipitation. For instance, in the model of Wilson, et al. (1992) there are parameters that control the probability of precipitation for each combination of weather state and the precipitation state at the higher order stations. In the model of Bardossy and Plate (1991) there is a Markov parameter that describes the persistence of precipitation occurrences given the weather state. These parameters, once estimated using historical data, must then be presumed to hold under a different sequence of weather states corresponding, for instance, to a GCM simulation. Likewise, many of the models (e.g., Bardossy and Plate, 1992; Hughes, 1993) have spatial covariances that are conditioned on the weather state. The historical values of these parameters likewise must be assumed to hold under an alternative climate. Essentially, the assumption required for application of the models to alternative climate simulation is that all of the nonstationarity is accounted for by the weather classes. One opportunity that has not been exploited is use of historical data to validate this assumption. For instance, Bardossy and Caspary (1990) have demonstrated that long-term changes have occurred in the probabilities of some European weather states. It should be possible, by partitioning the historical record, to determine whether conditional simulations properly preserve associated shifts in local precipitation, such as wet and dry spell lengths, and precipitation amounts. Another complication in application of these models to alternative climate simulation is comparability of the GCM predictions with the historic observations. For instance, Hay, et al. (1991) used a weather classification scheme based on surface wind direction and cloud cover. The resulting )'Veather classes were shown to be well related to precipitation at a set of stations in the Delaware River basin. Unfortunately, however, GCM predictions of cloud cover and wind direction for current climate are often quite biased as compared to historic observations, and these biases will be reflected in the stochastic structure of the weather class sequence. Finally, most of the models have been limited to simulation of local precipitation, although other variables, such as temperature, humidity, and wind are usually required for hydrological simulations. Hughes, et al. (1993) developed, along with the daily precipitation model described earlier, a model of the daily mean temperature and temperature range. They conditioned these variables on the present and previous days' rain state. For simulation of a CO 2-doubled scenario, they incremented the conditional mean temperatures by the aifference between the 1 x CO 2 and 2 x CO 2 850 mb temperature. Bogardi, et al. (1993a;b) coupled a weather state-driven precipitation model similar to that of Bardossy and Plate (1992) with a model of daily temperature conditioned on weather state, and,

STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT

15

via nonparametric regression, the 500 mb pressure height. The 500 mb pressure height was found to provide a reasonable index to within-year surface temperature variations for several stations in Nebraska. Both of these models have disadvantages. The method used by Hughes, et al. (1993) to infer temperature under CO 2 doubled conditions makes an arbitrary assumption that the change in the mean station temperature would be the same as the regional mean temperature change at 850 mb. The regression-based approach of Bogardi, et al. (1993a) essentially assumes that the transfer function relating within-year variations in the 500 mb pressure height to station temperature variations applies to differences between climate scenarios as well. That this may not be realistic is suggested by the fact that the simulated changes in winter station temperatures for CO 2 doubling are much larger than those in summer, even though most GCMs simulate large summer surface air temperature changes in the Great Plains.

CONCLUSIONS The coupling of weather state classification procedures, either explicitly or implicitly, with stochastic precipitation generation schemese is a promising approach for transferring large-area climate model simulations to the local scale. Most of the work reported to date has focused on the simulation of daily precipitation, conditioned in various ways on weather classes extracted from largearea atmospheric features. The approach has been shown to perform adequately in most of the studies, although there remain questions as to how best to determine the weather states. Further, no useful means has yet been proposed to determine the strength of the relationship between large-area weather classes and local precipitation, and to insure that weak relationships do not result in spurious downward biases in inferred changes in local precipitation at the local level. This is an important concern, since at least one of the studies reviewed (Zorita, et al., 1993) found conditions under which weather classes well-related to local precipitation could not be identified. There have been relatively few demonstrated applications of these procedures for climate effects interpretations. One of the major difficulties is accounting for biases in the GCM present climate, or "base" runs. In addition, few of the models reviewed presently simulate variables other than precipitation needed for hydrological studies. Temperature simulation is especially important for many hydrological modeling applications, but methods of preserving stochastic consistency between local and large-scale simulations are presently lacking.

ACKNOWLEDGMENTS The assistance of James P. Hughes and Larry L. Wilson in assembling the review materials is greatly appreciated.

REFERENCES Bardossy, A., and H. J. Caspary (1990) "Detection of climate change in Europe by analyzing European circulation patterns from 1881 to 1989", Theor. Appl. Climatol., 42(3), 155-167. Bardossy, A., and E. J. Plate (1991) "Modeling daily rainfall using a semi-Markov

16

D. P. LETTENMAIER

representation of circulation pattern occurrence", J. Hydrol., 122(1-4), 33-47. Bcirdossy, A., and E. J. Plate (1992) "Space-time model for daily rainfall using atmospheric circulation patterns", Water Resour. Res., 28(5), 1247-1259. Baur, F., P. Hess, and H. Nagel (1944) "Kalender der Grosswetterlagen Europas 1881-1939", Bad Homburg, 35. Bogardi, 1., 1. Matyasovszky, A. Bardossy, and L. Duckstein (June 1993a) "Estimation of local climatic factors under climate change, Part 1: Methodology", in Proceedings, NATO Advanced Study Institute on Engineering Risk and Reliability in a Changing Physical Environment, Deauville, France. Bogardi, 1., I. Matyasovszky, A. Bardossy, and L. Duckstein (June 1993b) "Estimation of local climatic factors under climate change, Part 2: Application", in Proceedings, NATO Advanced Study Institute on Engineering Risk and Reliability in a Changing Physical Environment, Deauville, France. Breiman, L., J.H. Friedman, R.A. Olsen, and J.C. Stone (1984) Classification and regression trees, Wadsworth, Monterey. Cox, D.R., and P.A.W. Lewis (1978) The Statistical Analysis of Series of Events, Metheun, London. Foufoula-Georgiou, E. (1985) "Discrete-time point process models for daily rainfall", Water Resources Technical Report No. 93, Univ. of Washington, Seattle. Foufoula-Georgiou, E., and D. P. Lettenmaier (1987) "A Markov renewal model for rainfall occurrences", Water Resour. Res., 23(5), 875-884. Gabriel, K. R., and J. Neumann (1957) "On a distribution of weather cycles by lengths", Q. J. R. Meteorol. Soc., 83, 375-380. Gabriel, K. R., and J. Neumann (1962) "A Markov chain model for daily rainfall occurrences at Tel Aviv", Q. J. R. Meteorol. Soc., 88, 90-95. Georgakakos, K. P., and M. L. Kavvas (1987) "Precipitation analysis, modeling, and prediction in hydrology", Rev. Geophys., 25(2), 163-178. Giorgi, F., and L.O. Mearns (1991) "Approaches to the simulation of regional climate change: A review", Rev. Geophys., 29(2), 191-216. Grotch, S.L. (April 1988) "Regional intercomparisons of general circulation model predictions and historical climate data", U.S. Department of Energy Report DOEjNBB-0084, Atmospheric and Geophysical Sciences Group, Lawrence Livermore National Laboratory, Livermore, CA. Haan, T.N., D.M. Allen, and J.O. Street (1976) "A Markov chain model of daily rainfall", Water Resour. Res., 12(3),443-449. Hay, L. E., G. J. McCabe, Jr., D. M. Wolock, and M. A. Ayers (1991) "Simulation of precipitation by weather type analysis", Water Resour. Res., 27(4), 493-50l. Hughes, J.P., D.P. Lettenmaier, and P. Guttorp (1993) "A stochastic approach for assessing the effects of changes in regional circulation patterns on local precipitation" , in press, Water Res. Res. Hughes, J.P. (1993) "A class of stochastic models ·for relating synoptic atmospheric patterns to local hydrologic phenomena", Ph.D. Dissertation, Department of Statistics, University of Washington. Kalkstein, L.S., G. Tan, and J.A. Skindlov (1987) "An evaluation of three clustering procedures for use in synoptic climatological classification", Journal of Climate and Applied Meteorology 26(6), 717-730. Kavvas, M. L., and J. W. Delleur (1981) "A stochastic cluster model of daily rainfall sequences", Water Resour. Res., 17(4), 1151-1160. Khanal, N.N., and R.L. Hamrick (1974) "A stochastic model for daily rainfall data synthesis", Proceedings, Symposium on Statistical Hydrology, Tucson, AZ, U.S.

STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT

17

Dept. of Agric. Publication No. 1275, 197-210. Lamb, H.H. (1972) "British Isles weather types and a register of daily sequence of circulation patterns, 1861-1971", Geophysics Memorandum No. 110, Meteorology Office, London. LeCam, L. (1961) "A stochastic description of precipitation", paper presented at the 4th Berkeley Symposium on Mathematics, Statistics, and Probability, University of California, Berkeley, CA. McCabe, G. J., Jr. (1990) "A conceptual weather-type classification procedure for the Philadelphia, Pennsyivania, area", Water-Resources Investigations Report 894183, U.S. Geological Survey, West Trenton, NJ. Pittock, A.B. and M.J. Salinger (1991) "Southern hemisphere climate scenarios", Climate Change, 18, 205-222. Rodriguez-Iturbe, I. B. Febres de Power, and J. B. Valdes (1987) "Rectangular pulses point process models for rainfall: Analysis of empirical data", J. Geophys. Res., 92(D8), 9645-9656. Smith, J. A., and A. F. Karr (1985) "Statistical inference for point process models ofrainfall", Water Resour. Res., 21(1), 73-80. Stern, R.D., and R. Coe (1984) "A model fitting analysis of daily rainfall data", J. R. Statist. Soc. A, 147, 1-34. von Storch, H., E. Zorita, and U. Cubasch (1993) "Downscaling of climate change estimates to regional scales: Application to winter rainfall in the Iberian Peninsula", in press, Journal of Climate. Weiss, L.L. (1964) "Sequences of wet and dry days described by a Markov chain model", Monthly Weather Review, 92, 169-176. Wigley, T.M.L., P.D. Jones, K.R. Briffa, and G. Smith (1990) "Obtaining subgrid-scale information from coarse-resolution general circulation model output", J. Geophys. Res., 95, 1943-1953. Wilson, L. L., D. P. Lettenmaier, and E. F. Wood (1991) "Simulation of daily precipitation in the Pacific Northwest using a weather classification scheme", in Land Surface-Atmosphere Interactions for Climate Modeling: Observations, Models, and Analysis, E. F. Wood, ed., Surv. Geophys., 12(1-3), 127-142, Kluwer, Dordrecht, The Netherlands. Wilson, L.L., D.P. Lettenmaier, and E. Skyllingstad (1992) "A hierarchical stochastic model of large-scale atmospheric circulation patterns and multiple station daily precipitation", J. Geophys. Res., 97(D3), 2791-2809. Wiser, E. H. (1965) "Modified Markov probability models of sequences of precipitation events", Mon. Weath. Rev., 93(8~, 511-516. Woolhiser, D.A., and G.G.S. Pegram (1979) 'Maximum likelihood estimation of Fourier coefficients to describe seasonal variations of parameters in stochastic daily precipitation models", J. Appl. Meteorol., 8(1), 34-42. Zorita, E., J.P. Hughes, D.P. Lettenmaier, and H. von Storch (1993) "Stochastic characterization of regional circulation patterns for climate model diagnosis and estimation of local precipitation" , in review, J. Climate. Zucchini, W., and P. Guttorp (1991) "A hidden Markov model for space-time precipitation", Water Resour. Res., 27(8), 1917-1923.

KNOWLEDGE BASED CLASSIFICATION OF CIRCULATION PATTERNS FOR STOCHASTIC PRECIPITATION MODELING A. BA.RDOSSYI, H. MUSTER\ L. DUCKSTEIN 2 and I. BOGARDP lInstitut fur Hydrologie und Wasserwirtschaft, University of Karlsruhe, Kaiserstr. 12,76128 Karlsruhe, Germany 2Systems and Industrial Engineering Department, University of Arizona, Tucson, Arizona, 85721 USA 3Department of Civil Engineering, W348, Nebraska Hall, University of Nebraska, Lincoln, NE 685880531, USA

A fuzzy rule-based methodology is applied to the problem of classifying daily atmospheric circulation patterns (CP). The subjective classification of European CP's given in Hess and Brezowsky (1969) provides a basis for constructing the rules. The purpose of the approach is to produce a classification that can be used to simulate local precipitation on the basis of the 700 hPa pressure field rather than reproduce the existing subjective classification. For comparison, an artificial neural network is applied to the same problem. 'The performance of the fuzzy classification as measured by any of three precipitationrelated indices is in general better to that of the neural net. The performance is about equal to that of Hess and Brezowsky. The fuzzy rule-based approach thus has potential to be applicable to the classification of Global Circulation Model (GCM) produced daily CP for the purpose of predicting the effect of climate change on space-time precipitation over areas where no classification exists.

INTRODUCTION The main purpose of this paper is to develop a methodology based on fuzzy rules (FR) to reproduce the precipitation generation features of an existing subjective classification of daily atmospheric circulation patterns (CPs) over Europe. This is a novel approach both from the methodological and application viewpoint: FR have been used extensively in control problems but not in modeling. One of the first applications of fuzzy rule-based modeling in hydrology (groundwater) is found in Bardossy and Disse (1992) but so far no surface hydrology or hydrometeorologic examples could be found in the open literature. Like FR, artificial neural networks (NN) provide a non-linear numerical mapping of inputs into outputs. Neither the two approaches need a mathematical formulation of how the output depends on the input. Unlike NN, FR needs the formulation of rules which may be difficult even for experts but does not necessarily require a training data set. On the other hand, unlike FR, NN may be applied to ill defined problems but after training NN is a pure unstructerd black box and knowledge gained by the 19 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 19-32. © 1994 Kluwer Academic Publishers.

20

A. BARDOSSY ET AL.

training algorithm cannot be decoded. Because the two approaches complement each other, it seems interesting to compare their performance. The present study belongs to a long-term collaborative project directed at developing a new methodology for the disaggregation of global hydrometeorogical input in regional or local hydrological systems and the prediction of climate change effects on these systems. In the first part of this approach, the CP's are classified, so that their occurrence, persistence and transition probabilities may be investigated. The CP's may be based on observation data or on the output of GCM's, in a changed climate (for example 2 x CO 2 ) scenario case. The reason for selecting large scale CP as input into local or regional hydrologic systems is that long records of reliable pressure measurements are available. Furthermore the GCM's are based on weather forcasting models which can predict the future pressure condition with much higher accuracy than other parameters. Results obtained in West Germany, Nebraska and Central Arizona indicate the existence of a strong relationship between daily CP types and local hydrologic observations, such as precipitation, temperature, wind or floods (Bardossy and Caspary, 1990; Bardossy and Plate,1991; Bogardi et al., 1992; Matyasovszky et al., 1992; Duckstein et al., 1993). This relationship was essentially described under the form of, say, daily precipitation at a given site conditioned upon the event that the type of CP over the region was i = 1,···, I. A fundamental element of this approach is thus a phenomenologically valid classification of the CP that can generate a simulated series of precipitation events with high information content. On the other hand, CPs are elements of complex dynamic large- scale atmospheric phenomena, so that any classification scheme is fraught with uncertainty and imprecision (Yarnal and White, 1987, 1988; Bardossy and Caspary, 1990). Here we apply a FR- and a NN- based approach to account explicitly for the imprecision in the subjective classification of European CPs by Baur et al. (1944). These two approaches should be able to reproduce the classification quite well with respect to the prediction of climate change effects on local hydrological systems. The paper is organized as follows: in the next section, background information on existing CP classification schemes, is presented. The following section provides a description of the fuzzy rule- based approach to modeling geophysical phenomena and describes briefly the NN approach. Then the two approaches are applied to a European case study and the results are evaluated. The final section consists of a discussion and conclusions.

DEFINITION OF CmCULATION PATTERNS AND CLASSIFICATION APPROACHES Definition of circulation pattern Following Baur et al. (1944) daily atmospheric circulation patterns consisting of continent-size pressure contours (at sea level, 700 hPa or 500 hPa) are described only in terms of three large scale features, namely: 1. The location of sea level semipermanent pressure centers, such as Azores high

CIRCULATION PATTERNS FOR STOCHASTIC PRECIPITATION MODELING

21

or Iceland low. 2. The position and paths of frontal zones. 3. The existence of cyclonic and anticyclonic circulation types. Furthermore, a distinction between seasons may be in order. Classifications approaches CP classification techniques may be grouped into subjective and objective procedures. The former group has been in existence for over a century in Europe, as described in Baur et al (1944), and about half a century in the USA. The latter type has emerged as a result of the development of high-speed computers and availability of statistical software packages, mostly principal component analysis and clustering. Subjective or manual techniques, such as the one used as a case study herein and described below, depend on the hydrometeorologist's interpretation of the major features and persistence of a given pattern. Objective techniques, in contrast, are based on statistical approaches such as hierarchical methods (Johnson, 1967), k-means methods (Macqueen, 1967), and correlation methods (Bradley et al., 1982; Yarnal, 1984). For example, in our Eastern Nebraska study, 9 types of CPs have been identified after having performed a principal components analysis coupled with the k-means method (Matyasovszky et al., 1992). Between these two groups, FR and NN are knowledge based classifications. They are subjective and phenomenological. The subjectivity is taken into account either explicitly by rules or implicitly by a training phase that uses expert knowledge. They are objective because after model construction, same inputs are always classified in the same way . Subjective classification for European conditions Using a record of 109 years, Baur et al. (1944) have developed a subjective classification of CPs for European conditions; later Hess and Brezowsky (1969) have used this classification to construct a catalogue of European CPs from 1881 to 1966. On the basis of Baur et al. (1944), Hess and Brezowsky (1969) recognize 3 groups of CPs divided into 10 major types, 29 subtypes and one additional subtype for the undetermined cases. The 10 major types are divided into subtypes primarily by adding the letter a or z at the end of the abbreviation of the major type to denote anticyclonic (a) or cyclonic (z) circulation. The three groups of CP are zonal, half-meridional and meridional. Hess and Brezowsky (1969) describe the characteristics of each circulation group in some detail. The zonal circulation group is described as follows: broad areas of high sea level pressure cover subtropical and lower middle latitudes. Low sea level pressure occurs in subarctic and higher middle latitudes. Upper air flow is west to east. Cyclone tracks run from the eastern North Atlantic Ocean to the European continent. All

22

A. BARDOSSY ET AL.

circulations of the major type "West" (W) are classified as zonal circulations. As an illustration, Figure 1 shows the 500 hPa contour map for the circulation subtype "West, cyclonic" (Wz) which persisted for 8 days after November 11, 1987 (Deutscher Wetterdienst, 1948-1990)

Figure 1: Typical 500 hPA contour map of circulation type Wz. The half-meridional circulation group corresponds to a near equilibrium between zonal and meridional components of air flow. Typical examples of half-meridional circulations are the major types "Northwest" and "Southwest". In comparison with the major type "West" the anticyclonic pressure centers are shifted northwards to about 50 o N. The pressure centers are located above the eastern Atlantic Ocean in the case of "Northwest" (NW) types, over Eastern Europe for the "Southwest" (SW) types, over Central Europe for the" Central European high" (HM). Due to the varying circulation components the major subtype "Central European low" (TM) has been added to the half-meridional circulation types. The meridional circulation group is characterized by stationary, blocking high pressure centers at sea level. Due to the locations of the sea level pressure centers

CIRCULATION PAITERNS FOR STOCHASTIC PRECIPITATION MODELING

23

and the resulting main flow directions to Central Europe the major types "North" (N), "South" (S) and "East" (E) can be distinguished. In addition all trough types with a north to south axis are classified as meridional circulations. The major types "Northeast" (NE) and "Southeast" (SE) are also included with the meridional circulation group because the normally coincide with blocking North and Eastern European highs. Further illustrations of these three types of CP are found in Bardossy and Caspary (1990). The fuzzy rule-based approach will describe CPs by assigning fuzzy quantifiers to the normalized value of pressure at grid points or pixels. Linguistic attributes such as "The pressure centers are located above the eastern Atlantic Ocean" could also have been used, but certain properties of CPs would be very difficult to expressed even by fuzzy sets.

KNOWLEDGE BASED CLASSIFICATION Classification data base Building up the FR classifier at a first step, expert knowledge encoded in the subjective classification of Hess and Brezowsky (1969) is used. No training data set (given pressure maps and corresponding subjective circulation pattern) is needed. Applying the FR classifier at a second step, in order to have a unified basis for the classification,the daily observed 700 hPa values have to be normalized. The observed pressure maps were available from the gridded data set of the National Meteorological Center (NMC), USA. Let h( Uk, t) be the observed 700 hPa surface at location Uk and time t, T be the total time horizon, and the temporal mean of k(Uk) be: (1) For each day t, the height is normed using the formula:

(2) This way for each day the 700 hPa surface is mapped on the interval [0,1]. Training the NN at the first step needs a properly defined input/output - data set. The input data set is encoded as follows: for each "subjectively" defined CP i the mean and the standard deviation of the corresponding normalized 700 hPa daily values are calculated:

(3) (4) The training of the NN was performed by a sequence of circulation patterns. For a given subjective circulation pattern i the corresponding 700 hPa surface h( Uk, t)

A. BARDOSSY ET AL.

24

data was obtained using a normal distributed random variable with mean riii( Uk) and standard deviation Si(Uk). The heights h(Uk,t) are normed by (2) and is used in this form as input for the NN. The output was an activation of the output neuron i corresponding to CP i.

Fuzzy rule based classification To classify CPs by the use of fuzzy rules, each CP type is first described by a set of rules and then the classification is done by selecting the CP type for which the so called degree of fulfillment (DOF) is highest and at least given at a certain level. If this level is not given, transition CP can defined as a combination of the CPs, for example the two CPs with highest DOF. Each fuzzy rule corresponding to a circulation pattern of type i consists of a set of premises Bi,h given in the form of fuzzy numbers with properly choosen membership functions f..LB;,h' h = 1, ... Hi, If Bi,1

and

Bi,2 ... and

Bi,H;

then

CP is i

(5)

Here Hi is the number of premises used to describe type i. As mentioned above, the premises consist of normalized pressure values at a few selected pixels. In contrast to ordinary (crisp) rules, fuzzy rules allow partial and simultaneous fulfillment of rules. This means that instead of the usual case, in which a rule is either applied or is not applied, a partial applicability becomes possible. For any vector of premises (all' .. ,aK) the DOF of rule i can be defined as a function of the individual fulfillment grades expressed by the corresponding value of the membership function:

(6) Finally the classifier selects the index i with the highest Di value (highest DOF of the rules) and at least at a given level as the class i. Four classes v of rules are defined, according to the normalized pressure values: • very low values, class v = 1 • not high values, class v

=2

• not low values, class v = 3 • very high values, class v

=4

The combination of the fulfillment grades within a rule class (for example, v: very high values) is done by a combination of" AND" and" OR" operations. Suppose Bi,I ... Bi,R correspond to the same class v, then the partial DOF Div corresponding to this class v is taken as a convex combination of "OR" and "AND" fulfillment grades, using a properly selected value of 'Yv as described in more detail below. With o:$ 'Yv :$ 1, the value of Div is calculated as:

Div

= 'YvFo (f..LB;) al) ... f..LB;,r (aR)) + (1 -

'Yv)Fa

(f..LB;,l (al)

... f..LB;,R (aR))

(7)

CIRCULATION PATTERNS FOR STOCHASTIC PRECIPITATION MODELING

25

with Fo being the" OR" function:

Fo (XI, X2) = For

R

variables,

(XI,""

XR-I)

Fo (Xl"'"

+ X2 -

Xl

XIX2,

(XI, X2)fR 2

(8)

Fo is defined recursively as:

= Fo (Fo(xb""

XR)

xR-d, XR)

(9)

The" AND" function Fa. is defined as in Bardossy and Disse (1992) as: R

Fa (XI,

.•. ,XR)

= II Xr

(10)

r=l

Finally the four class values Di as:

Dib Di2' D i3 , Di4

are combined into the DOF of rule i,

4

Di

II Div

=

(11)

v=l

The FR classification is numerically very simple. November 29, 1986, has been selected to illustrate the methodology. The CP type that persisted November 26-29, 1986 is "BM" with i = 2. The membership functions of the four classes are given as above. Normalized pixel values, membership grades for the selected day and rule defining the CP type BM are shown in Table 1. For example, in the case v = 4, "very high" , 14 :;:: 0.7, Fo = 1.00 Fa = 0.6375 and D24 = 0.7· 1.00 + 0.3·0.6375 = 0.8912. The overall fulfillment grade is D2 = 0.3944 which is the maximum value. The DOF of other CP types is less then 0.3, for Wz is 0 and that of type HM it is found to be 0.1247. This example shows that even in the case when one pixel does not fulfill the prescribed pressure level as it is the case of pixel 1 for v = 2, the fuzzy rules can assign it to the selected case.

Neural Network based classification NN are mathematical models of the brain activity. Several network architectures have been developed for different applications (McCord and Illingworth, 1991). We have used a four-layer feedforward architecture with a "back-propagation"(Rummelhardt et al., 1986) learning algorithm. The basic unit of an NN is the neuron. The function of a neuron is described by the transformation of the input signals to an output signal. Given the output of a neuron i, OJ, i=1 ... n, the input of a neuron j, inpi> is given by inpj

=L i

WijOi

+ OJ

(12)

In (12), Wij are weights between neurons i and neuron j, choosen properly by a learning algorithm described below, and OJ is the bias of the neuron j.

A. BARDOSSY ET AL.

26

Table 1: Numerical example of calculation of DOF for CP type BM that Nov 29 , 1986 Pixel Normalized Membership Function Rule Class Long. Lat. Pressure 9 Value J.L 0.13 0.68 W25° N65° v=l 0.00 Very low W15° N65° 1.00 WOo N70° 0.15 0.62 0.63 0.00 W15° N35° 0.12 0.84 W25° N75° v=2 WOo N80° 0.18 0.96 Medium low 0.31 0.63 E25° N75° 0.62 0.40 W20° N40° W15° N45° 0.84 0.92 v=3 0.96 0.68 Medium high WOo N50° 0.95 0.70 ElO° N50° 0.85 0.90 E15° N45° 0.94 0.85 W5° N55° v=4 E5° N55° 1.00 1.00 Very high E15° N55° 0.90 0.75

occurred on DOF

D2v

0.8265

0.6983

0.7473

0.8912

The output of neuron j, OJ, equals the activation of neuron j, aj, which is given by applying a sigmoid transformation function on inpj: 1 = --:--:---""7" 3 1 + exp( -inpj)



(13)

The reason for employing the sigmoid function is that it is differentiable which is an essential condition for back-propagation. The NN utilized herein consists of a set of structured neurons with three different types of layers: • The input layer: these are the neurons which are activated by the input signal comming from outside. • Two hidden layers: these are the neurons which are supposed to perform the transformation of the input to an output. • The output layer: these are the neurons which provide signals for the outside. Each neuron of a layer is conneced to each neuron of the adjacent layer. This means that a signal is sent to the next layer (feedforward). Figure 2 shows the four layers of the NN. The interconnecting weights Wij have to be determined with the help of a backpropagation supervised learning procedure. For this purpose a training set consisting

CIRCULATION PAITERNS FOR STOCHASTIC PRECIPITATION MODELING

Input layer

Hidden layers

27

Output layer

Figure 2: Structure of the Neural Network. of measured input data and corresponding desired output data is used. Weights which minimize the squared difference between the known output of the training set and the calculated output of the NN have to be found stepwize from the output layer to the input layer by a gradient search method.

APPLICATION CP classification by fuzzy rules and Neural Networks The above described two procedures were used to classify the CPs over Europe. As stated earlier the basis of the classification is the subjective Baur classes given in Hess and Brezowsky (1969). For each day the measured 700 hPa surface is taken at 51 selected points Uk. For both procedures, the data base for model building and validation respectively was presented above. FR was defined encoding the expert knowledge of Hess and Brezowsky (1969) by

A. BARDOSSY ET AL.

28

fuzzy rules as follows: a few (2 to 4) points are selected for each class v = 1"",4 (very high to very low). The "Iv values are selected depending on the class v with "11,4 = 0.7 and "12,3 = 0.1. Classes v = 1,4 means higher uncertainty - thus a convex combination with a higher "OR" component an a value of "Iv closer to 1 is needed. Classes v = 2,3 are more restrictive - thus a convex combination with a lower "OR" component an a value of "Iv closer to 0 is needed. The proper selection of "I was done by trial and error; results turned out not to be very sensitive to the choice of 'Y - but it is necessary to use some mix of AND and OR rules because a pure AND rule may be too weak (one zero element makes the DOF equal to zero) and an OR rule too strong (DOF too large). The architecture of the NN used consists of 51 neurons corresponding to 51 data sites, the output layer consists of 29 neurons, corresponding to the 29 subjective classified CP's by Hess and Brezowsky (1969). The first hidden layer consists of 45 neurons, the second hidden layer consists of 40 neurons. Concerning the proedure for building the NN architecture (number of hidden layers, number of neurons), there exists some heuristic rules but nevertheless the answer has been found by trial and error. With increasing number of neurons the estimation error in the training phase decreases but the model becomes overdetermined. Both classification schemes were applied to a measured sequence of daily 700 hPa elevations for the 10 year period 1977 to 1986. The methods were unable to reproduce exactly the subjective series, but the stated goal of the classification was to develop a semi-objective classification method, which resemble the subjective one and whose quality was measured by the difference between generated and measured precipitation values, as in the next section. Use for precipitation modeling Parameters of the precipitation model (Bardossy and Plate, 1992) are estimated so as to obtain the conditional probability of precipitation and the mean daily precipitation at a site given the CPo To measure the quality of a classification for precipitation estimation or generation, three information measures are introduced as follows. The first one measures the squared difference between the conditional probability PAt of precipition at day t, given that the CP is At and the unconditional probability P of precipitation at a given site: II

" = (r 1 _ 1 L.J(PAt t

2 1

p) )2'

(14)

with (15) Thus the maximum value of II depends on p. In the present application, the possible maximum of II is about 0.5.

CIRCULATION PATIERNS FOR STOCHASTIC PRECIPITATION MODELING

29

The second information measure describes the squared difference between conditional and unconditional mean precipitation. (16) with (17) where m is the unconditional mean daily precipitation amount at a given site and rnA, is the mean daily precipitation conditioned on the CP being of type At.

The third information measure also depends on the mean precipitation and measures the relative deviation from the unconditional mean:

with

mA , 13 = -1 "l L..... -1I T t m

(18)

OS h S 1

(19)

Table 2: Average information content of three CP Classification Method II Hess-Brezowsky 0.243 0.211 Fuzzy rules Neural Network 0.167

Table 3: Average information content of three CP Classification Method II Hess-Brezowsky 0.265 Fuzzy rules 0.217 Neural Network 0.193

classification schemes (summer). 12 13 1.996 0.609 1.778 0.521 1.580 0.441

classification schemes (winter). 12 13 2.559 0.679 2.251 0.629 2.353 0.586

In order to compare the subjective and knowledge based classifications, the mean information content of 25 stations in the Ruhr catchment was calculated for the subjective classification of Hess and Brezowsky (1969) and then FR and NN classifications. Tables 2 and 3 show the results in the summer and the winter seasons. From these tables it is clear that the subjective classification delivers the best results. However any of the three measures of information content, 11 ,12 ,13 does vary much between the three approaches, the difference between the two seasons being larger than that between the approaches.

30

A. BARDOSSY ET AL.

For the fuzzy rule based classification the information loss compared to the subjective classification is less then 20 %. The NN classification does not perform as well as the FR. Given the simplicity of the fuzzy rule-based approach, we would recommend it fore future use.

DISCUSSION AND CONCLUSIONS Why is it important to develop a fuzzy rule based classification if the subjective approach is slightly better than the fuzzy rule-based one? This question may be answered as follows: • We do not have a subjective classification for GCM-produced CPs and thus intend to use the FR (or trained NN) classification to obtain a catalog of daily CP types corresponding to the 1 x CO 2 and 2 x CO 2 cases. The use of stochastic linkage between daily CP types and daily local climatic factors makes it possible to predict the effect of climatic change on local/regional precipitation (Bartholy et al., 1993) . • There is no catalog of subjectively classified daily CP over the USA or most of the regions of the world. Using FR, it is possible to obtain such a catalog of CPs for any large-scale area. It has been often argued that there is no physical basis for types obtained by objective classification schemes such as principal component analysis and cluster analysis. The fuzzy rule-based classification has the capability of using the identification of the main weather patterns by meteorological experts and may thus be constructed on a phenomenological basis. In contrast to NN, the FR classifier has been built encoding expert knowledge without using a time series of subjectively classified daily CP. Thus FR (but not NN) can be applied in regions where no time series of CP exists but both approaches need expert knowledge of CP for these regions. For further research, the performance of an objective phenomenological classifier, taking into account explicit local hydrological parameters (precipitation, temperature, winds, floods) seems to be of interest. For this the FR classifier may has to be modified using some features of NN. The two approaches complement each other in a very evident way as demonstrated recently in research each by Takagi and Hayashi, (1991), Kosko, (1992) or Goodman et al. (1992). To conclude, the fuzzy rule-based methodology appears to perform better than neural network and almost as well as the Baur-type subjective classification. It appears to be a usable approach to construct a time series of classification where none is available.

CIRCULATION PATIERNS FOR STOCHASTIC PRECIPITATION MODELING

31

ACKNOWLEDGMENTS Research presented in this paper has been partially supported by the US National Science Foundation under grants #BCS 9016462/9016556, EAR 9217818/9205717 and a grant from the German Science Foundation (DFG).

REFERENCES Bardossy A. and Caspary, H. (1990) "Detection of climate change in Europe by analyzing European Atmospheric Circulation Patterns from 1881 to 1989", Theoretical and Applied Climatology 42, 155-167. Bardossy, A. and Disse, M. (1993) "Fuzzy rule-based models for infiltration", Water Resources Research 29, 2, 373-382. Bardossy, A. and Plate, E.J. (1991) "Modeling daily rainfall using a semi-Markov representation of circulation patterns" , Journal of Hydrology 122, 33-47. Bardossy, A. and Plate, E.J. (1992) "Space-time model for daily rainfall using atmospheric circulation patterns", Water Resources Research 28, 5,1247-1260. Bartholy, J., Bogardi, I., Matyasovszky, I. and Bardossy, A. (1993) "Prediction of daily precipitation reflecting climate change", Session HS4/1 European Geophysical Society, XVIII General Assembly, Wiesbaden, Germany. Baur, F., Hess P. and Nagel, H. (1944) Kalender der GroBwetterlagen Europas 18811939, Bad Hornburg, FRG. Bradley, R. S., Barry, R. G. and G. Kiladis (1982) Climatic fluctuations of the Western United States during the period of instrumental Records, Final Report to the National Science Foundation, University of Massachusetts. Bogardi, I., Duckstein, L., Mat yasovszky, I. and Bardossy, A. (1992) Estimating space-time local hydrological quantities under climate change, Proceedings, Fifth International Conference on Statistical Climatology, Toronto. Deutscher Wetterdienst (1948-1990) Die Groawetterlagen Europas, Amtsblatt des Deutschen Wetterdienstes, 1-33, Deutscher Wetterdienst - Zentralamt, Offenbach am Main. Duckstein, L., Bardossy, A. and Bogardi, I. (1993) "Linkage between the occurrence of daily atmospheric circulation patterns and floods: an Arizona case study" , Journal of Hydrology, to appear. Goodman, R.M., Higgins, C.M., Miller, J.W. (1992) "Rule-based neural networks for classification and probability estimation", Neural Computation 4, 781-804.

32

A. BARDOSSY ET AL.

Hess, P. and Brezowsky, H. (1969) Katalog der Grosswetterlagen Europas, Berichte des Deuschen Wetterdienstes Nr. 113, Bd. 15, 2. neu bearbeitete and ergiinzte Aufi., Offenbach a. Main, Selbstverlag des Deutschen Wetterdienstes. Johnson, S. C. (1967) "Hierarchical clustering schemes", Psychometrika 32, 261-274. Kosko, B. (1992) Neural Networks and Fuzzy Systems, Prentice-Hall International, London. Matyasovszky, 1., Bogardi, 1., Bardossy, A. and Duckstein, L. (1992) Comparing historical and global circulation model produced atmospherical circulation patterns, working paper 93-2, SIE, Bldg 20, University of Arizona, Tucson, AZ 85721. MacQueen, J. (1967) "Some methods for classification and analysis of multivariate observations", Fifth Berkeley Symposium on Mathematics 1, 281-298. McCord Nelson, M. and Illingworth, W.T. (1991) A practical guide to neural nets, Addison-Wesley, Reading. Rummelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) "Learning representations by back-propagation errors", Nature 323, 9, 533-536. Takagi, H. and Hayashi, 1. (1991) "NN-Driven Fuzzy Reasoning", Int. Jour. of Approximate Reasoning 5, 191-212. Yarnal, B. (1984) "A procedure for the classification of synoptic weather maps from gridded atmospheric pressure surface data", Computers and Geosciences 10, 397-410. Yarnal, B. and White, D. (1987) "Subjectivity in a computer - assisted synoptic climatology I: classification results", J. Climatol. 7,119-128. Yarnal, B., White, D. and Leathers, D. J. (1988) "Subjectivity in a computerassisted synoptic climatology II: relationships to surface climate", J. Climatol. 8,227-239.

GREY THEORY APPROACH TO QUANTIFYING THE RISKS ASSOCIATED WITH GENERAL CIRCULATION MODELS

IAtmospheric Environment Service, 4905 Dufferin Street, Downsview, Ontario. M3H 5T4, Canada 2Dept. of Civil Engineering, McMaster University, Hamilton, Ontario L8S 4L7 Assessing the risk to water resources facilities under climate change is difficult because the uncertainty associated with 2xC0 2 climate scenarios cannot be readily quantified. Grey systems theory is used to develop a grey prediction model (GPM) that provides an interval of uncertainty. The GPM is used to extrapolate a numerical interval around the decadal averages of precipitation and temperature tltrough the year 2010 for a site in Northwestern Canada. The extrapolation is calibrated on 20 years of data and validated against observations for the 1980's. The values in the 1990's correspond to observed trends in the area. The temperature and precipitation values are used to develop a grey water balance model. The grey' intervals for annual potential evapotransipiration, deficit and surplus are used to evaluate the reliability of a transient and three equilibrium climate change scenarios scenarios. The grey intervals are not coincident with the transient output, but they are trending towards the equilibrium scenario values. This suggests that this particular transient scenario is inadequate for risk assessment, and although the equilibrium scenarios appear to be within the grey interval, they represent years beyond a reliable GPM extrapolation. INTRODUCTION The possibility of global climate change, and the subsequent changes to climate at the local level, may alter the viability of new and existing water resource structures. One decision-making tool that has been gaining acceptance in water resource management and hydrology is risk assessment - a series of techniques that are used to evaluate decisions when the future cannot be forecast with .certainty. Risk is defined as a combination of the probability of an event occurrence and the consequences associated with that event. It is partly a function of the quality of information used to define a climatological event, and the uncertainty in the observed or predicted data is strongly linked to the level of risk in a decision. Generally, the larger the uncertainty, the higher the risk in making a decision. 33 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 33-46. © 1994 Kluwer Academic Publishers.

34

B. BASS ET AL.

General circulation models (GeMs) have been used in developing scenarios because they provide a physically-based dynamic simulation of the atmosphere. GeMs can be used to simulate another climate in its equilibrium state (equilibrium model) or they can be used to simulate the transition leading up to a new climate (transient model). The scenarios of a future climate equilibrium under a doubling of atmospheric levels of carbon dioxide is an example of the former. The output of the transient model is provided as a decadal average from the present up to a decade when the climate is expected to achieve a new equilibrium. With either type of simulation, many GeM variables, in particular precipitation, are not easily verified. The GeM output is not sufficient for estimating future variability, therefore it is not amenable for standard approaches of risk estimation. Grey theory can be used to estimate an interval within which a variable is expected to fall. This type of evaluation has been successfully applied in risk estimation in agriculture (Bass et al., 1992). Grey theory was first applied by Deng (1984) to deal with uncertainty in systems analysis. In a grey theory approach, all components of a system are divided into three categories: white (certain), grey (uncertain) and black (unknown). Unlike a probability or fuzzy distributions, a grey set only has an upper and lower limit which can approximate uncertainty when the available data are insufficient for standard stochastic approaches. In dealing with climate scenarios, grey systems approach could incorporate the data into a grey decisionmaking model or a grey model could be used to extrapolate an interval within which a variable is expected to fall - a grey prediction model (GPM). Thispaper is a preliminary exploration of latter approach as a means of evaluating climate scenarios derived from general circulation models (GeMs) for decision-making and risk assessment. A GPM is developed from an observed series of temperature and precipitation (1965-1985). The GPM is used to extrapolate the series to 2013, and the resulting grey interval is validated against observations and compared to the Goddard Institute for Space Studies Transient GeM (GISST). During the 1970's and 1980's, the GISST decadal average precipitation exceeds the observed averages and the temperature change during the 1980's and 1990's is lower than what has been observed. The grey interval is used to adjust the GISST scenario without imputing the variability from the observed data onto the GeM scenario. The grey decadal monthly averages are input into a water budget accounting model. The grey water budget interval is interpreted as a range within which the surplus, deficit and the potential evapotranspiration are expected to fall. The results are compared with the water budgets derived from the GISST, the observed decadal monthly averages, and three 2XC02 equilibrium climate change scenarios based on the GISS, Geophysical Fluid Dynamic Laboratory 1987 (GFDL87) and the Oregon State University (OSU) GeMs (Cohen, 1989). Although the use of grey theory in this context appears to be similar to time series and other stochastic approaches, it can be applied without the required assumptions of these other methods. It is also appropriate in examining data that lacks significant autocorrelation, and two different data sources can be incorporated into one series. The GPM based climate change scenario does not provide a

35

GREY THEORY APPROACH TO QUANTIFYING RISKS

probability distribution, but the grey interval may be more appropriate since it is difficult to make assumptions regarding the future variability of temperature and precipitation. The analysis illustrates that grey theory may be an effective means of using transient GeM scenarios, and evaluating the 2XC02 scenarios of other GeMs. The analysis also suggest that the water balance may provide a better means of detecting climate change although this is only a preliminary result and bears further investigation.

GREY THEORY MODELLING Grey theory is a method for estimating and incorporating uncertainty when the data are too sparse for the use of standard stochastic approaches (Deng, 1984). It has proven to be an appropriate method in systems analysis in linear programming models (Huang and Moore, 1993). Grey theory creates a model of the data from a minimum and maximum value. Grey theory can also be used in a prediction mode to estimate and extrapolate a series of maximum and minium values from a series of observations. In this approach the series is split into two series of high and low values, and two series are generated thus yielding a dynamic grey interval. Let us consider a data set X(O) with n elements corresponding to n time periods: 1f.O) '" { x(O)

Ii'"

1, 2, ... , n}

(1)

where x(O)(i) is the ith element corresponding to period i. The problem under consideration, for the grey prediction model (GPM), is the prediction of x(O)(i) for i > n when standard statistical approaches are not applicable. A GPM is introduced which can effectively approximate the uncertainty existing in X(O). First, there are some requisite definitions related to the GPM.

®

Definition 1. Let R denote a set of real numbers. A grey number (y) is a closed and bounded set of real numbers with a known interval but an unknown probability distribution (Huang and Moore, 1993):

®

®

where ® 1v) is the whits,ed 10w~oundAf (y) and (y) is the whitened upper bound ~ (y). When ~(y) = '(51 (Y), '(51 (y) becomes a deterministic number, y. Definition 2. A grey vector ®(Y) is a tupel of grey numbers: and a grey matrix ®(Y) is a matrix whose elements are grey numbers:

36

B. BASS ET AL.

The operations for grey vectors and matrices are defined to be analogous to those for real vectors and matrices. The concepts of accumulated generating operation (AGO) and inverse accumulated generation operation (IAGO) are required for the GPM.

Definition 3: The rth AGO of X(O) is defined as follows (Huang, 1988): ~S')

= {x(r)(i) Ii = 1,2, ... ,

n; n1}

(Sa)

where: k

X(D)(k)

= ~ x(1ll)(i),

k E [1, n], I~l

1:1.

and n = (r, r-I, '" , 1) and CJ) = (r-I, r-2, ... ,0). The rth IAGO of X (I) ( a(r)(X(t» is defined in a similar manner (Huang, 1988). The AGO and IAGO are defined from X(I) to x(r). u(r)(X)

= {u(r)(x(t)(i» I i=1,2,

•.. ,n; IH}

(5b)

where: u('>(x(t)(k)

and

= x(Y)(k),

f3 = (I, 2, ... , r) and y = (t-I, t-2, ...

k E [l,n], t~l.

, t-r).

The concept of the grey derivative is introduced as follows:

d®(X)/d®(t) • where ®(t) follows:

= [k, k+I].

u(1)(JC(k+1»,

k E [l,n],

(6)

The support of ®(x) corresponding to ®(t) is defined as

Thus, given the following GPM as a differential equation:

37

GREY THEORY APPROACH TO QUANTIFYING RISKS

.5l~t»]

= [X(k+1)

(7)

+ X(k'J)/2

(8)

we can convert it to "(1)(x(1)(k+1)) +

a .s{X(l)(®(t»] = u,

k

E [1, n-1]

(9)

Let:

C = [a, b)T y(O) = (x(O)(i) S[x(l»)

= {[S[x(I)(®(t»)

(10)

I i = 2,3, ... ,n}T

with ®(t)

= [k,k+I) I

(11) k = 0,1, ... ,nI}

(12)

where C T is a vector of the parameters in (9); y
is a vector of x(O)(i) with i = 2, 3, ... , n; and S[X(I»)T is a vector of support of x(l)(®(t» corresponding to ®(t). Thus, we have: y(O) = a S[X(l)) + b E, = [S[X(l»), E) [a, b)T, = [S[X(l»), E) C,

(13)

where E = (1, 1, ... , I)T. Letting: B

=

[S[X(I»), E),

(14) where B is a matrix consisting of S[X(l)) and E, we have:

(15) Hence, x(')(k+ 1), V k, can be obtained by solving (9). Thus from the definition of the IAGO, x(O)(k+I), V k, can be obtained from x(l)(k+I), V k. Obviously, when k >

38

B. BASS ET AL.

n-l, the obtained x(O)(k+l) provides a prediction of the x value in a future period HI. WATER BUDGET MODElLING The Thomthwaite climatic water balance model produces a climate-based accounting of the water gains and losses at a location or for a region. The air temperature and precipitation are used to compute the water budget estimating soil moisture deficit and runoff as residuals ( Mather, 1978). Although the Thomthwaite model does not incorporate wind effects, or the direct effect of elevated levels of C02 on transpiration, it has been found to provide reasonable reliable estimates of water balance components in most clim~tes (Mather, 1978). The model can be used to compute daily water budgets, but the monthly water budget mode is used in order to utilize the transient GeM model and for comparison with other water balance studies in the same area (Cohen, 1991). The soil moisture capacity was assumed to be 100 mm, and the minimum mean monthly temperature required for snowmelt was O.l°C in order to match the assumptions used by Cohen (1991) in a GeM-based assessment of water balance in the same area. SIDDY LOCATION The analysis was carried out at Cold Lake, Alberta. This location was chosen because it is adjacent to the Saskatchewan River sub-basin that Cohen (1991) examined using two GeM scenarios and the Thomthwaite water balance model. Being relatively close to the Mackenzie Basin, which is the focus of a major climate impact assessment, the results may also have bearing for future research in that area. Cold Lake is also situated half-way between two GISS-GeM grid points, which are used as a geographic "grey interval" in the water balance model. Although the Thomthwaite model may not be appropriate for locations at these latitudes (SO~ - S8~ are at the northerly edge of acceptability for the Thomthwaite model, this study is only an exploratory evaluation of this technique. DATA The GPM is computed with 20 years of monthly temperature and precipitation data (1966-198S) for Cold Lake, Alta. The grey temperature model is validated against four years of data (1986-1989), and the precipitation grey model is tested against six years of data (1986-1991). The water budget model is run with the GPM and input from two grid points from the GISS transient GeM (GISST). The two grid points are almost equidistant, directly north (1l0'W,S8~) and -south (llO"W,S~O ) of Cold Lake, Alta which is situated at 1l0"W,S4~. The temperature and precipitation output from the GISST are only available as decadal averages, and for this study, the first four decades (1970 - 2010) are compared to the GPM for the same time period. Three water budget components are compared to the GISST and the three equilibrium scenarios for the decade of the 2040's. The

39

GREY THEORY APPROACH TO QUANTIFYING RISKS

three equilibrium scenarios (GFDL87, GISS, OSU) have been interpolated to 11 O"W,54"N.

RESULTS 1be lrey pmliCdoD model (GPM) The GPM was developed for monthly temperature and precipitation data (19651985). The interval is validated against observations for the years 1986-1989 (Figure 1) for temperature and 1986-1991 for precipitation (Figure 2).

20~------------------------------~ ~ 10+-~~~----~--~--~~~----~~~~

II P

: E

T

o+-~----~--~--~~~----~--~--~~

U

~ -10~~----~~------~~----~~~--~~

I-Observed

a Grey-Max

.. Grey.Min

Figure 1. Observed and grey monthly temperature (OC) 1986-1989.

160 140 PREC (mm)

120 100 80 60 40

co

.~ A 1 ~L'"

20 ito. 0

·~A

Ie

N. P J

~~



.-'"1

1--- 0bsaY0d



,I jl

..<

·u·

J

;;;If.

CI

co

10

'A I!

~

.,

CI

~. ~~ ~~~-~

Grey-Max

BO

.. Grey-Min

Figure 2. Observed and qrey monthly precipitation (mm) 1986-1991.

B. BASS ET AL.

40

The grey model interval appears to be a valid description of the monthly temperature and precipitation at Cold Lake although it does not incorporate all of the summer peaks in precipitation. The decadal grey interval averages from the GPM are compared to the GISST-GeM, at the two grid points for the 1970's through the first decade of the twenty-first century (Figure 3). Climatological analysis of the annual and spring temperature departures from normal, for the climate region enclosing Cold Lake, indicate that there are significant positive anomalies during the 1980's and the early part of the 1990's (Environment Canada, 1992; personal communication) which are reflected in the GPM but not in the GISST output.

T E

2o.---------------------~----~~--_

M 10~~~~----~~----~~~--~~~~ p E

0

~..j/.--_¥r--_J..1_...Jjk__"_--.......___..JI_--4\o~

R

A -10~~--~~~----~~~----~~----~

T

~ ~O~------~~-------=~------~------~

E

)C Obscnal -GeM -1l0,5BN

-

GeM -1I0.S0N a Gn:y-Max

.. Orey-Min

Figure 3. Decadal monthly avg temperature (OC). The GISST precipitation is greater than the both the observed and the GPM precipitation for the four decades except during the summer months in the first decade of the twenty-first century (Figure 4).

160 140 p 120 R E 100 c 80 (mm) 60 40 20 0

18717.

• Observed -GISST HOW,saN .. Grey-Min

1880'.

1880s

-GISST 1l0W,sON a Grey-Max

Figure 4. Decadal monthly average precipitation (mm).

2000's

GREY THEORY APPROACH TO QUANTIFYING RISKS

Water budget modellin& Three variables are extracted from the water balance modelling: the annual potential evapotranspiration (PE), the deficit (D) and the surplus(S). The water budget model is run using decadal averages derived from observations (1970's and 1980's), the GPM and the GISST-GCM (Table 1). During the 1970's and 1980's the observed PE falls within the grey interval, and for the 1980's, the observed deficit falls between the grey interval while it is smaller than the lower grey range for the 1970's. This is most likely due to the higher monthly average rainfall between April and August in the 1970's. During the same period the water budget derived from the GISST temperature and precipitation produces the opposite result. During the next two decades the grey deficits decrease (95.9 - 25.9mm) and a small surplus is evident (20.1 - 93.1mm). The PE is slightly larger throughout the four decades at the high end of the grey interval (530.6 - 595.6 mm) while it remains almost constant at the lower end of the interval. This result is due to the fact that the grey temperature interval is quite small (Figure 3), and the grey precipitation interval was much larger (Figure 4). TABLE l. Annual water budget (mm)

PE 1970

D -63.1 -9S.9 -148.9 -1.7 -0.2

S 0.0 0.0 0.0 396.4 347.9

530.4 543.6 51S.5 478.2 406.8

-140.5 -6S.9 -230.4 -38.1 0.0

0.0 0.0 0.0 339.5 398.1

GREYMAX GREYMIN GISST 50N GISST 58N GISST 50N(A) GISST 58N(A)

565.3 513.8 477.3 414.9 536 457.2

-35.4 -264.9 -17.1 0.0 -126.6 -110.5

20.9 0.0 380.3 365.3 23.2 0.0

GREYMAX GREYMIN GISST 50N GISST 58N GISST 50N(A) GISST 58N(A)

595.6 519 495.8 435.6 645 549.3

-25.9 -297.8 -8.6 -2.7 -153 -173.2

93.1 0.0 405.6 329.8 16.1 0.0

OBS GREYMAX GREYMIN GISST SON GISST S8N

522.1 530.6 S13 467.8 402.8

1980

OBS GREYMAX GREYMIN GISST SON GISST 58N

1990

2000

The GISST water budgets also exhibit patterns similar to the 1970's and 1980's over the next two decades. At llOW, 50N the surplus increases from the

41

42

B. BASS ET AL.

1980's through the twenty-first century due to an increase in spring and summer precipitation (Figure 4). At llOW, 58N, the surplus decreases for the same period due to smaller levels of summer precipitation. Since the GPM appears to be a valid description of the monthly temperature and precipitation at Cold Lake during the 1970's and 1980's, it is used to adjust the GISST temperature upward and the GISST precipitation downward (Appendix 1). At both grid points (GISSTA) the PE increased, and at both grid points the deficits and surplus are now within the respective grey intervals. The GISSTA water budget corresponds more closely then the unadjusted GISST to the GISS, GFDL87, and OSU 2XC02 equilibrium scenarios for the Cold Lake location (Cohen, 1989; 1991). Analysis of Oimafe-Dependent Decisions Bass et al. (1994) present a method for evaluating data quality for weatherdependent decisions. This framework presents data uncertainty as a numerical interval which a decision maker interprets as encompassing the actual or "true" value for a decision. How much risk a user is willing to accept has to do with the importance of a decision and the size of the interval that is acceptable. The annual grey deficit, surplus and PE, both the GISST and GISSTA scenarios, and the three equilibrium scenarios are plotted in Figures 5-7. The GFDL87, GISS and OSU scenarios are plotted for the 2040's, since their climates are supposed to be representative of some future equilibrium. In addition the GISST values for the 2040's are also plotted. For each water budget component, the GISST scenario is outside of the grey interval. In each figure the grey interval is directed to the three equilibrium scenarios although there are obvious limits in projecting the grey interval to the 2040's. In Figure 5, the grey interval is probably too large for an effective decision (small level of risk) for the 1990's. Nevertheless, it demonstrates that the GISSTA deficit is within the grey interval although the high end is very close to the GISST deficit as well. Figure 6 provides a more reasonable decision interval, encompassing the GISSTA surplus, for the first decade of the twenty-first century that is clearly separated from the GISST surplus. In addition the grey water budget points towards the general direction of the three equilibrium scenarios. Figure 7 provides a clear evaluation of the quality of both the GISST and the GISSTAPE.

43

GREY THEORY APPROACH TO QUANTIFYING RISKS

0

OISS

-50

osu

-100

• N

DEFICIT

1

OISS

(mm)150 -200 -250

70 80 90 0 10 20 30 40 -OREY-MAX _OISST 11OW,5llN

-OREY-MIN _OISST 11OW.5ON ..... OISST(A) 11OW.50N ..... OISST(A) 11OW.58N

pigure 5. Annual Deficits (1970-2040).

500

T"'""----------~1 OW.58N

E 400

+--.,.......,.,.....----,,......_ _ _ _ _...J,f·OW.5ON •

-~ E

-I

300

'ils

+-----------1

Q.

~ 200 +-----------1

en

100 +-----=---------4 GISS

O 1~~~~~~~~~(~11~ow~·~~~losu GFDl 70 80 90 0 10 20 30 40

-GREY-MAX _GISST 11OW.58N

• GREY-MIN ... GISST 11OW.5ON ..... GISST(A) 11OW.5ON..... GISST(A) 11OW.58N

pigure 6. Annual surplus (1970-2040).

44

B. BASS ET AL.

750r-----------------~

700

+----------=-1

650~------~--------~

PE

(mm)600

+----~-----~

550

+----,,,,,,.--+--r-----~

500

+--~~"=""----~

450+---~-----~~:.

400~~~~~~~~~~

70 80 90 0 10 20 30 40

-GREY-MAX _GlSST 11OW,58N

-GREY-MIN

.... GISST 11OW,5ON

-t-GISST(A) 11OW,5ON -t-GISST(A) 11OW,58N

Figure 7. Potential Evapotranspiration (1970-2040).

In this case, most of the scenarios, all the GISST and two of the GISSTA, fall outside of the grey interval and would most likely be rejected. The grey water budget interval also indicates that beyond the first decade of the twenty-first century, neither the GISST or the GISSTA appear to be valid. However, the grey water budget points in the direction of the three equilibrium scenarios, and this includes the GISST for I10W, SON. CONCLUSIONS The results of this analysis suggest that grey prediction model may be an appropriate tool for evaluating the risks associated with a GeM, particularly. The GPM adequately represented the decadal averages for temperature and precipitation for the 1970's and the 1980's. Preliminary analysis of temperature trends in northwestern Canada in the early 1990's suggest that spring temperatures have been anomalously warm, which is also reflected in the GPM. The grey water budget components also enclosed the water budgets based on observations for the same period. Assuming that the GPM is valid for the 1990's and the first decade of the twenty-first century, it provides a means of evaluating scenarios of the PE and the surplus. However, the grey deficit interval is most likely too large to provide a useful evaluation of deficit scenarios in the twenty-first century. In addition the grey PE and surplus intervals also point in the general direction of three equilibrium 2XC02 scenarios, although it would be premature to suggest that these scenarios will remain valid for Cold Lake, Alberta in the 2040's. While the GPM

GREY THEORY APPROACH TO QUANTIFYING RISKS

appears to be valid for the monthly temperature and precipitation data at Cold Lake, further testing at other sites and through the 1990's is required in order to generalize these results.

REFERENCES Bass, B., Russo, J.M. and Schlegel, J.W. (1994) "Data Quality in WeatherDependent Decisions" (in press). Cohen, S.J., Welsh, L.E. and Louie, P.Y.T. (1989). Possible impacts of climatic warming scenarios on water resources in the Saskatchewan River sub-basin. Canadian Climate Centre, Report No. 89-9. Available from Climatological Services Division, AES, Downsview, Ontario, Canada. Cohen, SJ. (1991). "Possible impacts of climatic warming scenarios on water resources in the Saskatchewan River sub-basin, Canada" Climatic Change 19,291317. Deng, J. (1984) The Theory and Methods of Socio-economic Grey Systems (in Chinese), Science Press, Beijing, China. Environment Canada (1992) The State of Canada's Climate: Temperature Change in Canada 1895-1991. SOE Report 92-2. Huang, G.H. (1988) "A Grey Systems Analysis Method for Predicting Noise Pollution in Urban Areas", The Third National Conference on Noise Pollution Control, Chengdu, Sichuan, China (in Chinese). Huang, G.H. and Moore, R.D. (1993) "Grey linear programming, its solving approach, and its application". International Journal of Systems Science 24, 159172. Mather, J.R. (1978). The Climatic Water Balance in Environmental Analysis. Lexington Books, Lexington, Mass. USA

45

46

B. BASS ET AL.

APPENDIX I The GISST outputs were adjusted using the grey white mid value (WMV) and the half-width between the two GISST grid points. The WMV is analogous to a grey mean and was defined for each decade. A mean value was defined for each month,

(16) where i represents the month. The WMV is defined as

(17)

Similarly a mean GISST value was defined for each decade. Each monthly mean GeM precipitation values were adjusted by the subtracting the difference between the GeM mean and the WMV. For the GeM temperature values, this difference was added to each monthly mean. The values for each grid point were recreated by adding the GeM half-width to the adjusted monthly mean for HOW, SON and subtracting this value from the adjusted monthly mean for 110W, SBN. The halfwidth is defined as

A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION

Balaji Rajagopalan, Upmanu Lall and David G. Tarboton Utah Water Research Laboratory Utah State University, Logan, UT - 84322-8200 USA ABSTRACT A nonparametric wet/dry spell model is developed for describing daily precipitation at a site. The model considers alternating sequences of wet and dry days in a given season of the year. All the probability densities of interest are estimated nonparametrically using kernel probability density estimators. The model is data adaptive, and yields stochastic realizations of daily precipitation sequences for different seasons at a site. Applications of the model to data from rain gauges in Utah indicate good performance of the model.

INTRODUCTION Stochastic models for precipitation occurrence at a site have a long, rich history in hydrology. The description of precipitation occurrence is a challenging problem since precipitation is an intermittent stochastic process that is usually non stationary, can exhibit clustering, scale dependence, and persistence in time and space. Our particular interest is in developing a representation for daily precipitation in mountainous regions in the western United States. Webb et.al (1992) note that, a mixture of markedly different mechanisms leads to the precipitation process in the western United States over the year and even within a given season. A rigorous attack on the problem would perhaps need to consider the classification of different precipitation regimes at different time scales, the identification of such classes from available data, and the specification of a stochastic model that can properly reproduce these at a variety of time scales. Our focus is on developing appropriate tools to analyze the raw daily data without deconvolution of the mixture based on synoptic weather classification. In most traditional stochastic models, probability distributions are assumed for the length of wet or dry spells and also for the precipitation amounts. While such distributions may fit the data reasonably well in some situations and for some data sets, it is rather disquieting to adopt them by fiat. It is our belief that hydrologic models should (a) show (rather than obscure) the interesting features of the data; (b) provide statistically consistent estimators; and (c) be robust. Consistency implies that the estimates converge in probability to the correct behaviour. The standard practice of assuming a distribution and then calibrating the model to it clearly obscures features of the data and may not lead to a consistent estimator from site to site. This is particularly relevant where the underlying process is represented by a mixture of generating processes and is inhomogeneous. The 47 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 47-59. © 1994 Kluwer Academic Publishers.

B. RAJAGOPALANET AL.

48

issue of interest is not the best fit of a model but the ability to represent a heterogeneous process in a reasonable manner. This motivates the need for a stochastic model for the generation of synthetic precipitation sequences that is conceptually simple, theoretically consistent, allows the data to determine its structure as far as possible, and accounts for clustering of precipitation events and process heterogeneity. Here we present results from a nonparametric seasonal wet/dry spell model that is capable of considering an arbitrary mixture of generating mechanisms for daily precipitation and is data adaptive. The model yields stochastic realizations of daily precipitation sequences for different seasons at a site that effectively represent a smoothed bootstrap of the data and are thus equivalent in a probabilistic sense to the single realization observed at the site. The nonparametric (kernel) probability density estimators considered in the model do not assume the form of the underlying probability density, rather they are data driven and automatic. The model is illustrated through application to data collected at Woodruff, Utah.

MODEL FORMULATION The random variables of interest are the wet spell length, w days, dry spell length, d days, daily precipitation, p inches, and the wet spell precipitation amount, Pw inches. Variables wand d are defined through the set of integers greater than 1 (and less than season length), and p and Pw are defined as continuous, positive random variables. The year is divided into four seasons, viz., Season - I (January - March), Season - U(April June), Season - III(July - September), and Season - IV(October - December). The precipitation process is assumed to be stationary within these seasons. Precipitation measurements are usually rounded to measurement precision (e.g., 0.01 inch increments). We do not expect the effect of such quantization of the data to be significant relative to the scale of the precipitation process, and treat precipitation as a continuous random variable. A mixed set of discrete and continuous random variables is thus considered. The precipitation process over the year is shown in Figure 1.

Figure 1.

Precipitation process over the year The key feature of the model is the nonparametric estimation of the probability density function (using kernel density estimators) for the variables of interest, rather than fitting parametric probability densities. The reader is referred to Silverman (1986) for a pragmatic treatment of kernel density estimation and examples of applications to a number of areas. The model is applied to daily precipitation for each season. The pdf's estimated for

A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION

49

each season are, f(w) the pdf of wet spell length, f(d) the pdf of dry spell length, f(p) the pdf of daily precipitation amount f(p). Kernel density estimators are used to estimate the pdfs of interest from the data set. Synthetic precipitation sequences are generated continuously from season to season, following the strategy indicated in Figure 2. A dry spell is first generated using f(d), a wet spell is next generated using f(w). Precipitation for each of the w wet days is then generated from f(p). The process is repeated with the generation of another dry spell. If a season boundary is crossed, the pdfs used for generation are switched to those for the new season. For the univariate continuous case, the random variable of interest (p) is generated from the kernel density estimate following a two step procedure given by Oevroye (1986, p. 765) and also in Silverman (1986). While, the discrete variables (wand d) are generated from the cumulative mass function. The above procedure neglects correlation between sequential wet and dry spell lengths and correlation between daily rainfall amounts within a wet spell. These correlations can be incorporated through the use of conditional pdf's and the disaggregation of total wet spell precipitation into daily amounts, (LaB and Rajagopalan, in preparation). For the data sets analysed here, all the correlations mentioned above were found to be insignificant. Consequently we did not use conditional pdfs and disaggregation here .

-

Dry Spell d days from fed)

....

.......

Wet Spell w days from few)

, Independent daily precipitation • w days from f(p)

Figure 2

Structure of the renewal model for daily precipitation

Kernel estimation of continuous univariate PDF The continuous, univariate pdf of interest is f(p), the pdf of daily precipitation for each season. The kernel density estimator (Rossen blatt, 1956) is defined as: fn(p)

=L ...LK(p-Pi) n

i=l nh

h

(2.1)

This estimates the probability density f(p) based on n observations Pi. K(.) is a kernel

B. RAJAGOPALAN ET AL.

50

function defined to be positive, symmetric, have unit integral, symmetric and has finite variance. These requirements ensure that the resulting kernel density estimate is a valid density. The symmetry condition is not essential, but is used to avoid bias. The subscript en' emphasizes that this is an estimate based on en' data points. The bandwidth parameter h controls the amount of smoothing of the data in our density estimate. An estimator with constant bandwidth h, is called a fixed kernel estimator. Commonly used kernels are: K(t) =(21ttl12e-t212 K(t) = 0.75 (1 - t2 ) K(t) =(15/16) (1 - t2 )2

Gaussian Kernel Epanechnikov Kernel Bisquare Kernel

(2.2a)

I tiS 1 I tiS 1

(2.2b) (2.2c)

One can see from Equation 2.1, that the kernel estimator is a convolution estimator. This is illustrated in Figure 3. The kernel density estimate can also be viewed as a smoothing of the derivative of the empirical distribution function of the data.

2.5 2 1.5

I.

1

Data point

I

0.5 o+-~-----~~~~~~~~-----~~~~~--~

o

5

x

10

15

20

Figure 3. Example of Kernel pdf. using 5 equally spaced values (black dots) with Bisquare Kernel, and a fixed bandwidth (h=4) Note that x is assumed to be a continuous variable The choice of the bandwidth and kernel can be optimized through an analysis of the asymptotic mean square error (MSE), (E[(f(P)-fn (P»2]) or mean integrated square error (MISE), the integral of MSE over the domain. Under the requirements that the kernel be positive and symmetric, having unit integral and finite variance, Silverman (1986, p. 41) shows that the optimal kernel in terms of minimizing MISE is the Epanechnikov kernel. However it is only marginally better than others listed above. Silverman (1986, Eqn. 3.21) shows that the optimal bandwidth, hopt, is a function of the unknown density f(p). In practice a certain distribution is assumed for f(p) and the MISE is minimized to obtain optimal bandwidth hopt with reference to the assumed distribution. Kernel probability density estimation can also be improved by taking h to be variable, so that the smoothing is larger in the tails where data is sparse, but less where the data is dense. A number of bandwidth selection methods have historically been used, like the

A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION

51

cross validation methods (maximum likelihood and least squares cross validation, see Silverman (1986), Sec. 3.4). These methods are prone to undersmoothing (Silverman, 1986). This is pronounced when the data is concentrated near a boundary. This is the case with precipitation where there is a finite lower bound (precipitation> 0) to the domain. Symmetric kernels near the boundary can violate this. One approach is to relax the symmetry constraint and use boundary kernels such as suggested by Muller (1992). Here however we chose to avoid the issue by working in a log transformed variable space. A fixed bandwidth kernel density estimate (Eqn. 2.1) is applied to In(p) and the resulting probability density is back transfonned, to get: (2.3) h was chosen as optimal with reference to the normal distribution in the log space.Epenichnikov kernels were used. The optimal bandwidth is (using Silverman 1986, Eqn.3.1) hp = 2.1250 n-1/5

(2.4)

where 0 is the standard deviation of the log transformed data. This method provides adaptability of bandwidth and also gets around the boundary issue. Figure 4(a) shows this method applied to precipitation data collected at Woodruff, Utah over the years 1948-1989 for Season 1 (Jan.-Mar.). Note that this follows the data as reflected by the histogram well. There are differences between the kernel estimate and a fitted exponential distribution, but from Figure 4(a) it is hard to see which is better. Figure 4(b) shows the cumulative distribution functions obtained by integrating the probability density function. Both kernel and exponential cdf estimates are compared to the empirical cdf using the Weibul plotting position (i / (n+ 1», with 95% confidence limits ±

(2.5)

It can be seen from Figure 4(b) that the cdffrom exponential distribution leaves the 95% confidence interval, while that from the kernel estimator lies entirely within. This suggests that the true density function of the data is different from exponential.

Kernel estimation of discrete univariate PDF The discrete, univariate probability mass functions (pmf's) of interest are f(d) and f(w) for each season. In a traditional alternating renewal model, the wet spell length and dry spell length are assumed to be continuous and independent variables, often assumed to be exponentially distributed. Roldan and Woolhiser (1982) consider wet and dry spells to be discrete variables and assume a geometric distribution. Indeed, to account for clustering, one could sample the dry spells from two geometric distributions with switching based on a Markov Chain, as done by Foufoula-Georgiou and Lettenmaier (1987). However, kernel method allows the representation of an arbitrary structure or appropriate degree of mixing of distributions, that is "honest" to the data, and provides a good, alternate building block. Even under the assumption of independence of wand d, the kernel estimator will tend to reproduce wet spell lengths and dry spell lengths with

B. RAJAGOPALAN ET AL.

52

relative frequencies that match those in the historical data set. One nonparametric estimator of the discrete probability distribution of w or d is the maximum likelihood estimator that yields directly the relative frequencies (e.g., number of wiln, for the ith wet spelliengtb Wi in a sample of size n). The kernel method is better, because (a) it allows interpolation and extrapolation of probabilities to spell lengths that were unobserved in the sample, and (b) it has higher MSE efficiency. Wang and Van Ryzin (1981) developed geometric kernels for discrete variables. Simonoff (1983) has developed a maximum penalized likelihood (MPLE) estimator for discrete data. Hall and Titterington (1987) show how discrete kernel estimators can be formed continuous kernels evaluated at discrete points and renormalized. These three methods were compared and we found that the Hall and Titterington (1987) approach worked best. The geometric tended to undersmooth, while MPLE oversmoothed. The Hall and Titterington (1987) estimator is similar to the estimator in Equation (2.1), but with an adjustment for use with discrete data. It is given as: n

f(w)

= n-IhL Kh{h(w-wj)}

(2.6)

1

where, h is the bandwidth and h (integers) w-wiis defined as,

E

(0,1], Kh (h(w-wj)) evaluated only for discrete

K(.h) is the kernel, s(h) is the scale factor to rescale the discrete kernel to sum to unity, and is s(h)

= {hL KGh)}-1

(2.8)

j

This is effectively a sum over all integers j in (-h- 1,h- l ). For h ~ I the kernel estimator equals the cell proportion estimator. Recognizing that in Equation (2.6) the pmf f(w) is conditional on bandwidth h i.e f(wlh). The bandwidth h is selected as a minimizer of a cross validation function which they suggested as,

L f2(wlh) - 2L fi(wi1h)

CV(h) =

(2.9)

w

Where, f(w) is estimated using Equation (2.6), while fw(wlh) is also estimated using Equation (2.6) but by dropping the point Wi. They proved that the above cross-validation automatically adapts the estimator to an extreme range of sparseness types. If the data is only slightly sparse, cross-validation will produce an estimator which is virtually the same as the cell-proportion estimator. As sparseness increases, cross-validation will automatically supply more and more smoothing, to a degree which is asymptotically optimal. To alleviate the boundary problem, Dong and Simonoff (1992) have developed boundary kernels for most of the commonly used kernels. Dong and Simonoff (1993) have successfully tested the boundary kernels on data sets that are similar to the ones we have. We have used the bisquare kernel (as defined in Equation 2.2c) and the corresponding boundary kernels for our analysis. Figures 4(c) and (d) illustrate this approach applied to wet and dry spells. In Figure

0

J

0 0

0

(\j

0

" E a.

6j

J

o

I _.-

\

0.0

2

\

WeI Spell Length

6

8

Observed Propn. Kernel Estimate Geometric distn,

Pml 01 weI spell length lor Season 1

4

'.,

'~

Figurp 4«(;)

\

10

E0.

C\I

2

-3

./'

7:'

-2

-1

ci

0

10

c:i

0

c:i

10

-

c:i

8

10 Dry Spell Length

6

.. 12

14

Figure 4(d) Pmt of dry spell length tor Season 3

4

'

Observed Propn Kernel Estimate Geometric dlstn

log(ppt.)

-4

;"-/

.. /

.I

'/

Figure 4(b) CDF of daily precip. for Season 1

0

ci

C\I

ci

~

'"c:i

95%C.I kernel CDF Exp. CDF

~

.... :~.--. .... .....

ppt. (in)

0.6

()

Cl

u.

ci

co

q

Figure 4(a) PDF of daily precip. for Season 1

0.2 0.4

Histogram of His!. Data Kernel Estimated PDF Fitted Exp. Distn. # of data points = 718

O/..,I~;''''&'''',;y.'':'''>';w.Sfli$_QeIiIm:r-5ll~SJ:'(~.

1 ~~[jl:\..,

~~

~

g 1

~ j ~\

~ j

N

LJ)

0

z

~

V\ W

Z

(5

ntI:I ::a ::J > >-l

i':l

"tl

><

> F

t;j

Cl

~

t;j

0

i':l ~

0

'Il

t""

tI:I

t;j

0

> t""

z ~

i':l tI:I

(5

~ tI:I >-l i':l

> i':l >

"tl

Z

0

>

54

B. RAJAGOPALAN ET AL.

4(c) there is no appreciable difference between kernel estimate and fitted geometric distribution. In Figure 4(d) the kernel estimate is seen to be a better smoother of the observed proportion, than a fitted geometric distribution. The above results indicate that the kernel estimators provide a flexible or adaptive representation of the underlying structure, under weaker assumptions (e.g continuity, smoothness) of the density than classical parametric methods.

Simulation Results The above pdf and pmf estimates were used in the simulation model described earlier, applied to the Woodruff Utah, data. In order to test the synthetic generation of the model, the following statistics were computed for comparing the historical record. 1. Probability distribution function, mean, standard deviation and probability mass function of dry and wet spells per season. 2. Length of longest wet and dry spell per season. 3. Mean and standard deviation of daily precipitation per season. 4. Probability density function of daily precipitation per season. 5. Maximum daily precipitation per season. 6. Percentage of yearly precipitation per season.

Twenty five simulations were made and the above statistics were calculated for the three variables. They are plotted along with the historical statistics as boxplots. The box in the boxplots indicates the interquartile range of the statistic computed from twenty five simulations, while the lines extending outward from the boxes go upto 95% range of the statistic. The dots are the values of the statistic that fall outside the 95% range. The black dot joined by solid lines is the statistic of the historical record. The boxplots show the range of variation in the statistics from the simulations and also show the capability of the simulations to reproduce historical statistics. Figures 5, 6 and 7 show the boxplots of the various statistics, for each season, for the three variables, daily precipitation, wet spell length and dry spell length respectively. It can be seen from the above figures that the simulation procedure reproduces the characteristics well. The number of wet spells and dry spells are simulated more than the historic record. The reason is that in the historic data there are many missing values, which results in lesser wet and dry spells, while simulations are made for the entire length of the record. This introduces a small bias, as a result of which the historical statistics tend to fall outside the boxes in the boxplots. This can be observed from Figures 6(c) and 7(a). Thus, the model provides a promising alternative to the parametric approach. The assumption free and data adaptiveness of the non parametric estimators makes the model more robust to distributional assumptions. Further work is required by way of analysing more data sets, and comparing with the traditional stochastic models like Markov chain, Markov renewal model etc. The model is being generalized by Lall and Rajagopalan (in preparation) to handle situations were the correlations between the variables are significant.

55

A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION

.r::

0, c:

~

Qi

Q.

'"~

"0

'0

C'1 I/)

c: 0

I/)

'" Q)

C/)

> Q)

E

'" '0 I/)

~ Q. )(

d1

e t~ ::>

C>

u:

s

9

L

4lfiualUads NP 10 "/lapIS

.r::

"6 c:

.r::

0, c:

~

f--

Qi

I,

f----f

~

'Of

Qi

a

'"~

Q.

I/)

"0

~

'0

"0

'0

CO)

c:

c: '" E'" 0 '"'" '0 C/) '" '0

f--

Q)

Q)

Ci x 0

co

~~

I- ---i

CO)

~ f---

US

SP

lJI5uailiads AJp ueaw

OP

1!l 0

Ci

~ L-L--

0.

Of:

.....u

f---

C>

u: SS

(\j

'"E x

r-

:;

09

(5

Q)

CO

Q)

S9

'"

C/)

0

3: t-

OL

'"0

In

f-;

E ::> E

"x

c:

~

;:::Q)

:;

Q>

u.

co

-

'"

0

'"

"!

c 0 -'

Ol

Q)

Vi

?;

Qi

UJ

a.

Q)

E

I

ill

co

0

",I

'
ill

'" -

c en

;:

Qi

Vl

a.

Qi

-'"

.c: 0, c

Seasons

3 4

2

I

T

Seasons

T

3

4

1

~

p

T

Boxplots of mean of wet spell length

2

Figure 6(c) Boxplots of maximum of wet spell length

1

Figure 6(a)

----,--

T '"

'"

~

'0 :> Q)

~

Qi

-a; a.

~

0

1 2

1

T

Seasons

3

1

T

4

1

T

Figure 6(b) Boxplots of stddev. of wet spell length

q~

N

"":

"!

co

r'

)-

>-,j

tIl

Z

)-

l'

"C )-

0

Cl

)-

......

!':1

:;0 )-

U\ 0\

~

'0

""'>-

~

>-

Ci. a.

«>

0>

Ci. Q.

0

N

0

0

N

If)

0

('")

0

0

0

00

0

0

0

N

0

"

0


Co

a .,.c

0

N

2

3

4

C!

T

2

1

3

4

T

Figure 5(d) Boxplots of maximum ot dally precipitation

4

~

$

Figure 5(c) Boxplots of % of yearly precipitation

3

~ Ll'!

T

Seasons

2

i~

Seasons

/

I

Figure 5(b) Boxplots of stdev. of daily precipitation

4

0

...c:i

I/)

...c:i

Figure 5(a) Boxplots of mean of daily precipitation

3

~

~

(f)

T I

Seasons

2

~ 8:

~ c:i

~

Seasons

1

T/

# .L

T

~

Z

g

>

~

1:!1

>-<:

;g

> P

tl

Z Cl

1:!1 t""'

6tl

:;0:1

2l

1!l

tl

0

S'::

::;: > t""'

~

1:!1

:;0:1

~

1:!1

S'::

> :;0:1 >

z0 z"0

>

58

B. RAJAGOPALAN ET AL.

Acknowledgements Partial support of this work by the U.S. Forest Service under contract notes, INT915550-RJVA and INT-92660-RJVA, Amend #1 is acknowledged. The principal investigator of the project is D.S. Bowles. We are· grateful to J.Simonoff, J. Dong, H.G. Muller, M.e. Jones. M. Wand and SJ. Sheather for stimulating discussions, provision of computer programs and relevant manuscripts. The work reported here was supported in part by the USGS through their funding of the second author's 1992-93 sabbatical leave, when he worked with BSA,WRD,USGS, National center, Reston, VA.

REFERENCES Devroye, L. (1986) Non-uniform random variate generation, Springer-Verlag, New York. Dong Jianping and J. Simonoff. June 10, (1992) "On improving convergence rates for ordinal contingency table cell probability estimation", unpublished report. Dong Jianping and J. Simonoff. (1991), "The construction and properties of boundary kernels for sparse multinomials", unpublished report. Foufoula-Georgiou, E. and D.P. Lattenmaier. (1986) "Continuous-time versus discretetime point process models for rainfall occurrence series", Water Resources Research 22(4), 531-542. Hall, P. and D.M. Titterington. (1987) "On smoothing sparse multinomial data", Australian Journal of Statistics 29(1), 19-37. Kendall, Sir M. and A. Stuart. (1979) Advanced Theory of Statistics, Vol. 2, Macmillan Publishing Co., New York. Lall, U. and B. Rajagopalan. " A nonparametric wet/dry spell model for daily precipitation", in preparation for submission to Water Resources Research. Muller, H.G. (1991) "Smooth optimum kernel estimators near endpoints", Biometrika 78(3),521530. Roldan J. and D.A. Woolhiser. (1982) "Stochastic daily precipitation models, 1. A comparison of occurrence processes". Water Resources Research 18(5), 1451-1459. Rosenblatt. M. (1956) "Remarks on some nonparametric estimates of a density function". Annals of Mathematical Statistics 27, 832-837. Silverman. B.W. (1986) Density estimation for statistics and data analysis. Chapman and Hall, New York. Simonoff, J.S. (1983) "A penalty function approach to smoothing large sparse contingency tables", The Annals of Statistics 11(1),208-218.

A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION

59

Wang, M.C. and J. Van Ryzin. (1981) "A class of smooth estimators for discrete distributions", Biometrika 68(1), 301-309. Wand, J.S. Marron and D. Ruppert. (1991) "Transformations in density estimation", Journal of the American Statistical Association 86(414), 343-353. Webb, R.H. and J.L. Betancourt. (1992) "Climatic variability and flood frequency of the Santa Cruz river, Pima county, Arizona", USGS water-supply paper 2379.

PART II FORECASTING

FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKEHOW MUCH UNCERTAINTY IS ENOUGH OJ. DRUCE Resource Planning, B.C Hydro Burnaby Mountain System Control Centre C/O Podium B, 6911 Southpoint Drive Burnaby, B.C. Canada V3N 4X8 For the past several years, the British Columbia Hydro and Power Authority has made available to the forest industry and others, probabilistic forecasts of month-end elevations for its largest storage reservoir, Williston Lake. These forecasts consist of the median, lower decile and upper decile values for each month over a 24 month period and are updated on a monthly basis. They are generated by a stochastic dynamic programming model, in combination with a simulation model. The SDP model derives a monthly operating policy for Williston Lake, the adjacent 2416 MW G.M. Shrum generating station and the 700 MW Peace Canyon run-of-river hydroelectric project located downstream on the Peace River. The operating policy provides releases for each month that are conditional on the reservoir storage state and on a randomized historical weather state. The sample of month-end reservoir levels calculated by the simulation model is easily manipulated to directly elicit the percentiles for the forecast. Analyses of the forecasts issued since April 1989 indicate that while the median forecasts have been relatively accurate, the decile forecasts with lead times of less than a year have underrepresented the uncertainty in the reservoir levels. Furthermore, preliminary results suggest that an ungraded version of the SDP model, that includes a stochastic export/import market, will not add sufficient uncertainty for the shorter lead times. A source of forecast error that deserves more attention has, however, been indentified.

INTRODUCTION Williston Lake, with a live storage capacity of 40,000 x 106 m3, is the largest reservoir operated by the British Columbia Hydro and Power Authority (B.C. Hydro). Williston Lake was created when the W.A.C. Bennett Dam was constructed on the Peace River in the 1960's. It filled for the first time in 1972. Adjacent to the W.A.C. Bennett Dam is the 2416 megawatt (MW) G.M. Shrum (GMS) hydroelectric power plant and just downstream is the 700 MW Peace Canyon (PCN) run-of-river project. This hydroelectric complex is located in northeastern British Columbia, as shown in Figure 1. Traditionally, forecasts of Williston Lake levels were produced only for in-house use, i.e., for operations planning. Then, in the late 1980's, it became apparent to B.C. Hydro that other users of Williston Lake, primarily the forest industry, were keenly interested in future water levels. B.C. Hydro responded by distributing the forecasts to the forest companies and to area managers who deal more directly with the local interests. At the same time, B.C. Hydro began relying more on economic criteria and less on water level 63 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 63-75. © 1994 Kluwer Academic Publishers.

D.J.DRUCE

64

YUKON B.C.

Figure 1. Location map. forecasts for the operations planning of Williston Lake. The more recent versions of the water level forecasts consist of the median, lower decile and upper decile values for each month over a 24 month period and are updated monthly. They are obtained by simulating the operation of Williston Lake and the GMSIPCN power plants under a sample of historical weather years. The simulation operating policy is developed by a stochastic dynamic programming (SOP) model in which uncertainty is driven by randomized monthly weather sequences. The SOP model also establishes the marginal value of water stored in Williston Lake, as a function of time and storage level. Due to their size, Williston Lake and the GMSIPCN plants are often used to balance the system load and resources over time periods ranging from hours to years. The marginal value of water in Williston Lake is therefore a good proxy for the B.C. Hydro system marginal cost. A forecast of the system marginal cost is produced by combining the Williston Lake water level forecast with the marginal water value information. For this paper, the 24 month water level forecasts issued since April 1989 have been analyzed to obtain performance statistics for the median forecasts and to determine whether the methodology produces upper and lower decile forecasts that are credible.

FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE

65

The evaluation of decile forecasts is particularly relevant because it provides some indication of where the forecasting methodology may be deficient in accounting for uncertainty. The following section provides a brief description of the B.C. Hydro system and explains some of the operating considerations that add uncertainty to the Williston Lake water level forecasts. OVERVIEW OF THE B.C. HYDRO SYSTEM

B.C. Hydro supplies electricity to most of the province from an integrated system of power plants that are predominantly hydroelectric. The nameplate generating capacity of the system is 10,390 MW, with the 30 hydroelectric plants contributing 9332 MW. Based on their operating regime, each of the hydroelectric plants can be placed into one of three groups, namely, the Peace River plants, the Columbia River plants and the rest of the system. The Peace River plants, OMS and PeN, are unique in that they provide most of the monthly and annual operating flexibility in the system. The Columbia River projects are characterized by large power plants with some storage reservoirs, but relatively little control over monthly or annual generation levels. This lack of control is partly the result of the Columbia River Treaty signed with the United States in 1961. Mica Dam, on the main stem of the Columbia River is one of three dams built in Canada under terms of the Columbia River Treaty. The reservoir created by Mica Dam, Kinbasket Lake, has a live storage capacity of 14,800 x 106 m 3 of which approximately 8630 x 106 m3 are operated in accordance with Treaty requirements. Monthly releases from Treaty storage or month-end storage targets for the upcoming year are predetermined from studies jointly prepared by the United States and Canadian Entities and, for the most part, are independent of runoff conditions. Storage targets rather than releases are specified for the summer months to facilitate refill of Treaty storage over a wide range of monthly inflow volumes. As a result, there is greater uncertainty over the monthly releases and generation from the 1736 MW Mica power plant at that time of the year. The 1843 MW Revelstoke project, located downstream of Mica Dam, has considerable local inflow and 1560 x 106 m 3 of live storage capacity, but is generally operated as a run-of-river plant. Consequently, the Columbia River Treaty obligations play a dominant role in the monthly operation of both projects. There is however some short-term operating flexibility available through the use of non-Treaty storage. B.C. Hydro has an additional 1137 MW of generating capacity at two power plants, Kootenay Canal and Seven Mile, located on tributaries of the Columbia River, but has little or no control over the operation of the respective upstream storage reservoirs. The variation in the summer generation from Columbia River plants is usually accommodated by adjusting the generation from Peace River plants and is reflected in the Williston Lake levels. The rest of the hydroelectric system is comprised of many small to moderatesized projects where the operation is dictated more by the hydrologic regime and storage limitations than by system requirements. The percentage of the total system generation provided by each of the three groups is shown in Figure 2. Thermal projects have not contributed much generation in recent years, but the 912 MW Burrard thermal plant, located near Vancouver, is available for system support as r~quired. By design, B.C. Hydro generally has more energy than it needs to meet the domestic demand, plus any other firm obligations, except under prolonged low water conditions. Rather than have this surplus energy accumulate (as water) in the system reservoirs, until it eventually spills, B.C. Hydro attempts to market the energy to utilities in the Pacific

D. I. DRUCE

66

15" OTHER

---~

2" THERMAL 31" PEACE

4bX COLUMBIA _ _- o J

Figure 2. B.C. Hydro system generation by source 1984 to 1992. Northwest and Southwest United States. However, depending on the load/resource balance in the Pacific Northwest, transmission access to these markets may be quite restricted. Utilities in the Pacific Northwest have priority rights to the transmission grid and, under many water conditions, crowd out energy from B.C. Hydro. In the past, the quantity of surplus energy available for the current year was estimated from deterministic energy studies that assumed average water conditions over a four year planning period. Energy beyond what was required to refill the reservoirs in the first year could be exported without concern about the effect on the longer term reliability of the system. However, for these studies, only the energy that could be generated by the hydroelectric projects was considered. In the event that the reservoir inflows were less than expected, thermal power plants could then be operated to refill the system. As a policy, thermal power plants were not operated concurrently with exports. Since B.C. Hydro is only a marginal supplier to the large Pacific Northwest/Southwest market, export prices were typically set just below the decremental cost of the importing utility, after allowing for transmission costs. The incremental cost of actually supplying energy from the hydroelectric system was generally unknown, but was thought to be small, and had no influence on either the price or quantity of the exports. By the late 1980's, B.C. Hydro had started to move away from the notion that hydroelectric energy had only a nominal production cost and toward marginal pricing, based on opportunity costs. In the short-run, B.C. Hydro's opportunity costs are directly affected by the forecasted markets for exports and for alternate resources and by the risk of a system spill. For estimates of its system marginal cost, B.C. Hydro relies on the SDP model developed for the operations planning of Williston Lake. An outline of how that model is used for forecasting both the Williston Lake levels and the system marginal marginal cost is included as the next section.

FORECASTING METHODOLOGY Most stochastic dynamic programming models that are used for reservoir management

67

FORECASTING B.C. HYDRO' S OPERATION OF WILLISTON LAKE

treat the inflows as observations of a stochastic sequence (Yakowitz, 1982). However, for the SDP model of Williston Lake operation, it was the weather that was assumed to be the stochastic process (Druce, 1990). The weather not only causes the inflows to Williston Lake, but also, affects the domestic supply and demand for energy. This SDP formulation was chosen so as to increase the level of uncertainty considered by the model and thereby add realism. Yeh et al (1992) have also acknowledged reservoir inflows and load demands to be seasonally and cyclically varying stochastic quantities and caution that they can present severe complications, if accommodated explicitly in the optimization. Under the assumption that Williston Lake is operated to balance system loads and resources, it is necessary to establish how much of the load can be served by the other resources. The GMS/PCN plants can then be operated to supply the residual fIrm load and to accommodate any interruptible sales or purchases. The schematic, presented as Figure 3, shows the components included in the derivation of the GMS/PCN load, as well as other input and output for what has come to be known internally as the "marginal cost" model. DOMESTIC LOAD FORECAST

COLUMBIA RIVER . INFLOW FORECAST

I

HISTORICAL GENERATION FROM OTHER HYDROELECTR IC PLANTS

COMM ITTED EXPORTSI IMPORTS/BURRARD OPERA TlON

GENERATION FROM COLUMBIA RIVER PLANTS

WILL ISTON LAKE INFLOW FORECAST

RESIDUAL FIRM LOAD FOR GMS/PCN

INTERRUPTIBLE EXPORT AND ALTERNATE RESOURCE MARKET FORECASTS

I

MARGINAL COST MODEL I

MARGINAL COST FORECAST

\

WILLISTON LAKE LEVEL FORECAST

Figure 3. Schematic for B.C. Hydro's marginal cost model. Uncertainty for both loads and resources is based on the effects of randomized monthly weather sequences from the years 1973 to 1992, i.e. the year before the current year. The monthly domestic load forecast is adjusted by 20 gigawatt-hours (GW.h) per degree Celsius whenever the mean temperature for an historical month at a weather station near the load centre deviates from the long term mean temperature for the corresponding month. However, this empirical relationship typically alters the monthly load by less than three per cent. Uncertainty is added to the resource side by linking the weather months and the energy historically generated by the rest of the hydroelectric system, excluding the four large plants in the Columbia River group. As mentioned earlier, the majority of these smaller projects have limited storage capacity and their operation is related to the local hydrology rather than the system load. The year-to-year variability in the small hydro generation for a given month is usually less than about five

68

D.J.DRUCE

percent of the system load. Examination of Figure 4 will help put this source of uncertainty into perspective. b.

II.

1-

TOTAL SYSTEM

1---

GMS/PCN

1-

SMALL HYDRO

1

5....

,.., 4....

.c

3t!)

'-'3.... >t!)

ffi ffi 2.... 1. II.

73 74 75 7b 77 78 79 81 81 82 83 84 85 8b 87 88 89 9. 91 92 YEAR

Figure 4. Variability in monthly load and generation. For the Mica and Revelstoke plants on the Columbia River, seasonal inflow forecasts are generated for the current year, using a conceptual hydrologic model (Druce, 1984). The model is initialized with the best estimate of the hydrologic state of the basin at the time of the forecast and then the runoff response is calculated for a sample of historical weather years. These forecasts are first issued in January, then updated each month through to August. It is therefore feasible to forecast the generation available from the Mica and Revelstoke plants as a function of the weather sequence. Unfortunately, it is difficult to generalize the Treaty operation of Mica Dam to cover a wide range of runoff conditions and to date, B.C. Hydro has only been able to forecast the generation corresponding to the expected inflows. For the Kootenay Canal and Seven Mile plants, it is not possible to forecast the generation associated with various weather sequences because the prerequisite type of inflow forecast is simply not available. The net effect is that a deterministic forecast is made of the energy supplied by the four large Columbia River plants. The committed sales, purchases and Burrard operation make up the last component in the derivation of the firm load for the GMS/PCN plants. This information is assumed to be accurately known, but in fact may be quite uncertain due to forced outages and gas supply disruptions. The largest source of uncertainty considered by the SDP model is the water supply forecast for Williston Lake. For the current year, the forecast is generated by a conceptual hydrologic model, in the same manner as for the Mica and Revelstoke plants on the Columbia River. For subsequent years, the inflow forecast is based on the

69

FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE

historical inflows. In each case, the inflow sequences, corresponding to the weather years of 1973 to 1992 are divided into monthly values to produce monthly inflow and the weather pairings. Monthly weather sequences are explicitly assumed to be independent in the SDP formulation and that implies that the monthly inflow values should not be highly autocorrelated. The statistical summary of historical inflows to Williston Lake, included as Table 1, shows that monthly autocorrelation is low to moderate. The range in the annual water supply for Williston Lake, when converted to energy, amounts to about 18 per cent of the annual system load. Although, for the current year, that level of uncertainty can be reduced by the hydrologic forecasting, as illustrated by Figure 5. TABLE 1. Summary of inflow volumes to Williston Lake for 1973 to 1992. INFLOW VOLUMES

TIME PERIOD JANUARY FEBRUARY MARCH APRIL MAY JUNE JULY AUGUST SEPTEMBER OCTOBER NOVEMBER DECEMBER ANNUAL

MEAN 834 625 705 1449 6593 9573 5444 2584 2043 2179 1367 955 34351

MINIMUM (m 3 x 106 ) 547 429 448 671 3900 5762 2532 1327* 1107* 1056* 722 512 25396

R2WITH MAXIMUM PREVIOUS MONTH 1323 851 1302* 3542* 10359 13121 * 8992 5008 3103 3694* 2007 1348*

0.31 0.39 0.17 0.58 0.14 0.03 0.10 0.61 0.09 0.39 0.29 0.55

42383

* Records established during the 1989 to 1992 forecasting period. For the past four years, the observed annual inflows have generally fallen within the forecasted ranges. But, individual monthly values were often outside the forecasted monthly range and that has resulted in substantial short-term errors in the water level forecasts. The occurrence of such errors will tend to reduce over time as more weather years are added to the data base. The interruptible export and alternate resource markets are also major sources of uncertainty for B.C. Hydro. In an upgraded version of the SDP model, these markets are treated as stochastic variables. However, for the model that has been used for forecasting, deterministic forecasts of the monthly quantities and prices for both markets are input.

D.l.DRUCE

70

ill. I ..

,....

.J::

:3 8. II.

- - - - - -0-- ---- -I}-- -

e

- - - -0- -

- - - - -f)"

,,

,,

:3 0

....J

u. b.1I1I

,

'8 . . ........

z

H

....J

............. 'Il

,,

<J:

:=!

~

4. 111111

W

~ ~

,, "t1

z H

(.!)

,,

2. 11111

FORECASTING MODEL

\-1}- NAIVE \+- HYDROLOGIC \

.4------+------~----~----_+------~----~----~

1 JAN

1 FEB

1 MAR

1 APR 1 MAY FORECAST DATE

1 JUN

1 JUL

1 AUG

Figure 5. Reduction in the uncertainty of the annual inflow to Williston Lake. Given all the information described above. the SDP model selects for each reservoir state and weather state. the monthly release from Williston Lake that will maximize the net revenue to B.c. Hydro. That operating policy is then passed on to a companion model that simulates the operation of the hydroelectric complex for each of the historical weather years. from a known initial reservoir level. The sample of possible month-end reservoir levels calculated by the simulation model are manipulated to obtain the percentiles that are presented graphically. as shown in Figure 6. Based on the water level forecast and the marginal value of water stored in Williston Lake. established by the SDP model for each storage level and month over the planning horizon. a marginal cost forecast is also routinely produced for internal use. It provides decision support for interruptible sale. purchase and storage transactions. Other applications for the marginal cost model have been previously described by Druce (1989). Upper and lower decile values are plotted along with the median values to provide some indication of the uncertainty associated with the forecasts. This is common practice with the seasonal water supply forecasts available in British Columbia. Alberta and the western United States. Moreover. an economic benefit can be expected whenever a probabilistic forecast is used instead of a categorical forecast for decision making and the gain tends to increase as the predictability decreases (Alexandridis and Krzysztofowicz. 1985). In the following section. the median and the decile water level forecasts for Williston Lake are analyzed in an attempt to establish forecast credibility and to learn where the forecasting methodology can be improved. FORECAST EVALUA TION The water level forecasts were evaluated for accuracy by comparing the median forecasted and the observed month-end values over each rolling 24 month period since April 1989. Forecast statisitics are displayed. as a function of lead time. in Figure 7.

71

FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE

FULL SUPPLY LEVEL b711

b&5 ,....

,.------, ,

:: bbll ~ ...... QI

,5 b55

S

,

" ,, ,,

H

~ &51 ...J

lJJ

'.

b45

PROBABILITY OF EXCEEDENCE

1- 1.11 1- 1.51 1--- '.91 1

b41l

A M J J A SON D J F M A M J 1993

J

1994

A SON D J F M 1995

Figure 6. Water level forecast for Williston Lake.

I-+- BIAS I~ MEAN IERI 1-&- RMSE I

2.51

2."

,

,



2

JJ'

)3"-

9 ... .B--B-iJ..

"'0., 0.. 1)...

~

\ ~

4

&

8

11

12

14

'1\

LEAD TIME (months)

1b

18

21

22

24

Figure 7. Forecast statistics for SDP and simulation models. From these results, it appears that the forecasting methodology is very slightly positively biased, but reasonably accurate. It was surprising to find however, that the greatest errors, on average, have occured in the first 12 months of the forecast period.

D. J.DRUCE

72

greatest errors, on average, have occured in the first 12 months of the forecast period. The same forecast statistics were calculated for a naive model, i.e. using the average observed month-end water levels available prior to each forecast year. Those results, plotted in Figure 8, also reveal greater errors in the fIrst 12 months of the forecast period. It was therefore concluded that the perverse error pattern was not due to the simulation modelling, but due to the weather and market conditions that prevailed in the four year forecasting period. The modelling actually compensated quite well.

1-1-

l.5'

B- -IT -9- -&-B

l.1I

BIAS

MEAN \ER\

I-a-

RMSE

I

... 1:1 .... '1),. ....

,....

I....

2.5'

B... n

D'iJ....

on

QI

!; 2.11

'1),. ....

B. .

QI

e ""

. . IT -It

~ ~

~_&-a-&-B_n

D,

-II"

I!i 1. 5. IX ffi

lJ..

'11'11

1.11

.... 1.51

I

2

4

b

8

11

12

14

LEAD TIME (months)

1&

18

28

22

24

Figure 8. Forecast statistics for naive model. The reliability of the upper and lower decile forecasts was tested by calculating how frequently they contained the observed values. If the forecasting methodology accurately accounted for uncertainty, then the frequency should be 80 per cent of the time. Once again, as shown in Figure 9, results are poorer for the forecasts with the shorter lead times. However, this was no surprise. In fact, work on upgrading the SDP model to include more uncertainty has been underway for some time. B.C. Hydro has not had much success in subjectively forecasting export markets and it was decided that the uncertainty in the export market should be acknowledged explicitly in the modelling. Also, by adding a stochastic export market to the SDP model it was anticipated that the reliability of the decile forecasts of Williston Lake levels would improve, for the shorter lead times. The upgraded SDP model has two additional state variables, to separately account for the monthly export quantity and export price. The size of the export market available to B.C. Hydro is determined, for the most part, by water conditions for the United States PacifIc Northwest. When the forecasted or observed water supply for the Columbia River at The Dalles is 111,000 x 106 m3 or less, B.C. Hydro has, on average, had access to a larger export market. The revised SDP model, therefore, has a state variable for water conditions in the PacifIc Northwest that can take on one of two values. The export

73

FORECASTING B.C. HYDRO' S OPERATION OF WILLISTON LAKE

18111 9111

,... '" >-

88~

__~E~XP~E~C~TE~D~FR~E~QU~E~N:CY~__~______________________

N

u

m 18

::J C!/ I.U 0::

u.. b8

58

11

2

4

b

8

111

12

14

LEAD TIME (months)

1b

18

21

22

24

Figure 9. Frequency that decile forecasts contained observed values. market quantity varies monthly for each water supply state, but over a year amounts to 7900 GW.h versus 4700 GW.h. For simulation purposes, the future water supply states are modelled as a lag-one Markov process with the monthly state transition matrices based on the actual forecasts issued over the period 1970 to 1992. The expected monthly price for energy in the interruptible export market is forecasted using an empirical relationship with the spot price of natural gas in the United States, at Henry Hub, Louisiana. The New York Mercantile Exchange futures market provides the forecast for the spot price of natural gas at Henry Hub for the first 18 months, then various other sources of information are used to extend the forecast over the six year planning horizon. However, the empirical relationship is not particularly strong, with a R 2 value just over 0.50. The SDP model has therefore been reformulated to consider deviations from the expected price as a state variable. Three possible deviations are included, roughly -5, 0 and +5 mills per kilowatt hours from the expected price. The exact values are calculated from the residuals of the regression equation and are updated as new data on prices becomes available. They are the mean deviations for three class intervals chosen to have equal frequencies over the period of record. Again, the export price deviation states are modelled as a lag-one Markov process with the monthly state transition matrices calculated from the residual pattern. The alternate resource or import market has some links with the export market through share-the-profit pricing formulas. Other sources of uncertainty for the alternate resources have yet to be considered. For February, March and April of 1993, the original and the upgraded versions of the SDP and simulation models were operated in parallel. From a comparison of their respective water level forecasts, it appears that the addition of the stochastic export/import market will increase the reliability of the decile forecasts, but only for lead times of four months or more. However, this observation is based on a very small sample of forecasts and is subject to change as more forecasts are produced.

74

D.J.DRUCE

Another area of concern has been the deterministic forecast of the generation from the large Columbia River power plants, since that group supplies such a large proportion of the system generation. Perhaps through more reliable modelling of the operation of these plants, the accuracy of the Williston Lake level forecasts could improve sufficiently, over the shorter lead times, that it would not be necessary to add even more complexity to the SDP model. This hypothesis was investigated by calculating error statistics for the Columbia River generation forecasts for durations of one to eight months. The error statistics were computed for three combinations of plants - the main stem plants Mica and Revelstoke, the tributary plants Kootenay Canal and Seven Mile and all four plants together. The results for the Mica and Revelstoke plants are shown in Figure 10, in terms of Williston Lake storage. The monthly generation patterns for the main stem plants and the tributary plants are negatively correlated. Consequently, the errors for the Mica and Revelstoke generation forecasts are, in most cases, worse than those for all four plants combined. It is apparent from a comparison of the error statistics presented in Figures 7 and 10 that much of the error in the Williston Lake level forecasts, for the shorter lead times, can be attributed to the forecasts of the generation supplied by the Mica and Revelstoke plants. These results are quite encouraging because they point to greater effort in modelling those Columbia River plants that B.C. Hydro has more control over and for which probabilistic seasonal inflow forecasts already exist.

I-+- BIAS 1-1- MEAN IERI I-e- RMSE I

2.51

2."

IX

fi11 ...

IX

UJ

~"

1.51

,,

,

, ,,

It ....

, ..

8. . . -" ,,"

--

-n-----..IJ----

_-11"---

_-"0,,_----.£1

I.II~----~--_,r_--~----_r----~----._--~----_,

I

2

4

DURATION (months)

b

8

Figure 10. Forecast statistics for Mica and Revelstoke generation.

CONCLUSIONS The SDP and simulation models used for the operations planning of Williston Lake produce forecasts of month-end water levels that are relatively accurate, over a planning horizon of 24 months. However, upper and lower decile forecasts, based on the uncertainty of weather effects, are not credible for lead times of less than one year. Preliminary results from upgraded versions of the SDP and simulation models, that

FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE

75

include the extra uncertainty of stochastic export/import markets, indicate that the reliability of the decile forecasts will likely increase only for lead times of four months or more. The credibility of decile forecasts, with shorter lead times, may improve with better modelling of the effects of the operation of the Mica and Revelstoke power plants on the Columbia River.

REFERENCES Alexandridis, M.J. and Krzysztofowicz, R. (1985) "Decision models for categorical and probabilistic weather forecasts", Applied Mathematics and Computation 17,241-266. Druce, D.J. (1984) "Seasonal inflow forecasts by a conceptual hydrologic model for Mica Dam, British Columbia", in J.J. Cassidy and D.P. Lettenmaier (eds.), A Critical Assessment of Forecasting in Western Water Resources Management, American Water Resources Association, Bethesda, Md., pp. 85-91. Druce, D.J. (1989) "Decision support for short term export sales from a hydroelectric system", in J.W. Labadie, L.E. Brazil, I. Corbu and L.E. Johnson (eds.), Computerized Decision Support Systems for Water Managers, American Society of Civil Engineers, New York, N.Y., pp 490-497. Druce, D.J. (1990) "Incorporating daily flood control objectives into a monthly stochastic dynamic programming model for a hydroelectric complex", Water Resources Research 26(1), 5-11. Yakowitz, S. (1982) "Dynamic programming applications in water resources", Water Resources Research 19 (4), 673-696. Yeh, W. W-G., Becker, L., Hua, S-Q., Wen, D-P., and Liu, D-P. (1992) "Optimization of real-time hydrothermal system operation", Water Resources Planning and Management Division, ASCE 118(6),636-653.

EVALUATION OF STREAMFLOW FORECASTING MODELS

Tao Tao t and William C. Lennox2 tWater Resources Department, Ontario Hydro 700 University Ave. H9C22, Toronto, Ontario, Canada MSG lX6 2Department of Civil Engineering, University of Waterloo Waterloo, Ontario, Canada N2L 3Gl For application purposes, it is no longer a sound investment to develop a streamflow forecasting model from basics. Currently, streamflow forecasting models are available for nearly every scenario one can imagine. A model could be stochastic or conceptual; lumped parameter or distributed parameter. The task of developing a model has been transferred to one of evaluation and selection since no single model can be applied universally without sacrificing some element of its performance. Therefore, it is necessary to have some kind of consensus as to how forecasting models are evaluated and selected for each individual application. In the past, the evaluations were often conducted by comparing the forecasted and the observed streamflows with numeric andlor graphic criteria with little consideration given to the specific application. However, in this study, forecasting models are evaluated through simulated real-time applications to investigate which one maximizes the system performance. INTRODUCTION Being able to forecast future streamflows is of major interest to operators of water resource systems. Many different approaches have been tried: repeating historic inflow series, using mean values of historic inflow series, constructing stochastic models based on the statistical analysis of historic inflow series and developing physically based conceptual models. Unfortunately, a streamflow forecasting model which is a success in one application could be a failure in another. Different forecasting models may have to be selected for different applications. Historically, the evaluation and selection are often conducted by comparing the forecasted and the observed streamflows through numeric andlor graphic criteria (WMO 1986). Little consideration is given to the particular application where the forecast is required. However, an operator could be more concerned with which model can be used to achieve a better management of the water resource system rather than how good the forecasted values are in comparison with the actual future streamflows. In this study, different forecasting models are evaluated through simulated real-time applications, in addition to using numeric criteria (WMO 1986), to see which one can best improve the performance of a reservoir system. In this study, eleven models are created for one-month ahead inflow forecasting. The forecasted inflow for the coming month and the mean inflows for future months (leading 77

K. W. Hipel et al. (eds.). Stochastic and Statistical Methods in Hydrology and Environmental Engineering. Vol. 3. 77-85. © 1994 Kluwer Academic Publishers.

T. TAO AND W. C. LENNOX

78

to infinity) constitute a future inflow sequence. The inflow sequence is then used in deciding the optimal release policy for the coming month in the simulated real-time monthly operation of a two-reservoir system. In the following sections, eleven forecasting models and five numeric evaluation criteria (WMO 1986) are first detailed. The real-time operation of a reservoir system and its simulation are then described. Finally, a case study involving the simulated real-time operation of a system of two reservoirs in parallel under different scenarios is given. STREAMFLOW FORECASTING

The eleven forecasting models created are listed as follows, 1.

Using monthly mean values of historic series.

For a given inflow series Q,(m, yr), the monthly mean values are calculated as follows: I Y Q,(m)=-L Q/..m.yr)

(1)

Y"..I

where i (i=I.2 •.. n). m (m=1,2, ... ,12) and yr (yr=I,2, ...• l') represent generating station, month and year, respectively. n is the number of generating stations and Yis the length of the series in terms of year. The forecast based on the monthly mean values is. Q/..m.yr)=Q/..m)

2.

yr=Y+ I,Y+2,...

(2)

Periodic, first order Markov process based on historic series, spatial correlation assumed.

Streamflows usually repeat the same pattern annually. The periodic, first order Markov process is constructed by finding a set of parameters for every two consecutive months instead of estimating one set of parameters for the whole series. The forecast is given by:

"

Qj(m,yr) =a/..m) + L b,(m) • Qj(m-l,yr)

yr=Y+I,Y+2, ...

(3)

j-I

The parameters at) and bl) are estimated using least squares estimation technique based on Y years of data. 3.

Periodic, first order Markov process based on historic series. independence assumed.

The forecast is given by Eq.(3) except that b;fJ equals to zero for i¢j. 4.

Second order Markov process based on deseasonalized historic series, spatial correlation assumed.

The deseasonization is in fact a process of removing the trend from a stochastic series which is equivalent to deducting the monthly mean values from the historical series. Let the deseasonalized series be q,[(yr-l) ·12+m], then

79

EV ALVATION OF STREAMFLOW FORECASTING MODELS

qJ..(yr-l) 012 +m] =Q~m,yr) -Q/(m)

The forecast is given by:

2

yr=1,2, ... ,Y

(4)

12+m-t]

(5)

"

L L biP) oqj[(yr-1)

Q/(m,yr)=(2t(m)+Qj+

/-1 J-l

o

yr=Y+1,Y+2, ...

Theoretically aj should be zero because q/[(yr-1) o 12+m] is a series with zero mean. However, since there exist rounding errors in the computation process, a/ will hardly be zero even though it might be close to zero. S.

Second order Markov process based on deseasonalized historic series, independence assumed.

The forecast is given by Eq.(S) except that bi) equals to zero for i¢j. 6.

Second order Markov process based on logarithm-taken and deseasonalized historic series, spatial correlation assumed.

The streamflow series usually follows log-normal distribution. Let the new series, after taking the natural logarithm of the original historical series, be W,{m, yr). Then ~(m,yr)=ln[Qj(m,yr)]

yr=1,2, ... ,Y

(6)

The model building and parameter estimating are the same as Model #4 except that Q,{m, yr) is now replaced by W,(m, yr), However, the noise term is no longer additive as far as the forecasts are concerned. The final forecast is obtained as: Q/(m,yr) =exp[l¥,(m,yr)]

7.

(7)

Second order Markov process based on logarithm-taken and deseasonalized historic series, independence assumed.

Model #7 is the counterpart of Model #5 as Model #6 to Model #4. The next four forecasting models are based on first order Markov process. They are the same as Models #4 to #7 except that b;J..t) equals to zero for t=2. 8.

First order Markov process based on deseasonalized historic series, spatial correlation assumed.

9.

First order Markov process based on deseasonalized historic series, independence assumed.

10.

First order Markov process based on logarithm-taken and deseasonalized historic series, spatial correlation assumed.

11.

First order Markov process based on logarithm-taken and deseasonalized historic series, independence assumed.

T. TAO AND W. C. LENNOX

80

Models #2 and #3 have twelve sets of parameters for twelve months. Models #4 to #11 have only one set of parameters for all twelve months. The models are used only for one-month ahead inflow forecasting. The future inflow sequence used in finding the optimal release policy are generated through combining the forecasts for the coming month and the mean values for all other months leading to infinity. The numeric evaluation criteria (WMO 1986) used here are given below: a.

Ratio of standard deviations of forecasted to observed streamflows:

(8) b.

Ratio of the sum of squares of the monthly residuals to the centred sum of squares of the monthly observed streamflows:

(9)

c.

Ratio of the standard deviations of the residuals to the mean observed streamflows: (10)

d.

Ratio of the mean error to the mean observed streamflows: (11)

e.

Ratio of the absolute error to the mean observed streamflows: (12)

where Yo (mean values with bar) and yjrepresent observed and forecasted streamflows, and N is total number of months involved. REAL-TIME RESERVOIR OPERATIONS Real-time operation is an on-going decision-making process, run daily, weekly or monthly over the life expectancy of a reservoir system. The operation is best described as an attempt to achieve the optimal compromise between reservoir storages and reservoir releases to meet the multiple objectives of the system. The multiple objectives of a reservoir system may be represented by an appropriate performance index. This performance index serves two purposes: to evaluate the afterthe-fact system performance during a specific operating span, and to decide the optimal release policy for each decision period. The performance index for evaluation is in a form as:

-

~

L g[S(t),R(t),t]

J=

(13)

where S(t) is a storage vector, representing reservoir storages at the beginning of decision period t; R(t) is a release vector, representing reservoir releases during decision

EVALUATION OF STREAMFLOW FORECASTING MODELS

81

period t; to and ft represent the beginning and the end of a specific operating horizon and

g indicates some physical or non-physical quantity for evaluation.

The performance index for decision-making or real-time operation is derived from the performance index for evaluation and is expressed as j

...

=L

g[S(k),R(k),k]

(14)

1-/

where t is for the coming decision period. It should be noted that in this study the optimization horizon extends to infinity. A n-reservoir system can be described in matrix form as: S(t+ 1) =S(t)-FR(t) -L(t) +Q(t)

(15)

where Q(t) is an nx1 uncontrollable inflow vector; L(t) is an nx1 loss vector and Pis an nxn system matrix. The diagonal elementsfu's of P are always equal to 1. The offdiagonal elementfv is -1 if the reservoir i receives release from the reservoir j, and 0 otherwise. The uncontrollable inflow is defined as the part of total inflow to a reservoir after subtraction of releases from upstream reservoirs. The formulated problem requires the use of nonlinear optimization. Most optimization techniques search for the optimum of a non-linear system through time consuming iterative processes. The selected technique may be feasible in real-time operations. However, the time required for simulating this process many times over may not be acceptable. The eleven forecasting models are tested indirectly by finding an approximately equivalent quadratic performance index and using linear quadratic control, where the analytic solution is available. If such a quadratic performance index exists, the problem can now be rewritten as, j

=.!.2 f

([S(k) -S,.(k)YA[S(k)-Sr
(16)

1 _/

subject to the following system equation, S(k)=S(k-l)-FR(k-1)-L(k-1)+Q(k-1)

(17)

The optimal controller is given as (Pindyck 1972), with

R(k) =Rr
(18)

P=A +[P-l+FB -lp1)-l

(19)

p(k)=(P-A)[Q(k)-L(k)-PRr
(20)

where S~k) is an nx1 target storage vector; R~k) is an nx1 target release vector; A and Bare nxn weighing matrices; P is the solution to the Riccati equation Eq.(19) and is an nxn matrix with all elements constant; and p(k) is the solution to the tracking equation Eq.(20) and is an nx1 vector. A is symmetric, positive semi-definite and B and P are symmetric, positive definite.

T. TAO AND W. C. LENNOX

82

The solution process starts with finding the target storages and releases to formulate an approximately equivalent performance index. The targets are obtained by optimizing the original performance index with respect to mean monthly values of historic inflows for several years. The optimization is subject to all system constraints. Since the future inflows are made of monthly mean values from the second month on and the inflows are periodic, the targets are periodic as well. No matter what the initial storages are, the future storages and releases will be the same as the target storages and releases after some months. It can be observed from Eq. (18) that p(k) can be predetermined as, p(k)=-PS';"k)

(21)

The optimal release policy for the coming month can then be obtained as follows: R(t) =R';"t) +C{[S';"t+ 1) +FR';"t)] -[S(t)+Q(t) -i{t)]}

(22)

C=-B-IPT(P-A)=B-1PT[(P-A)FB-lpTp_Pj

(23)

where

If the reservoir releases required in the release policy of Eq.(22) are not within the release constraints, they are first set to meet the release constraints. In the process of implementing the release policy, the releases are further adjusted to ensure that the storage constraints are met even it means the violations of release constraints. That is consistent with the reality. CASE STUDY: THE EAST RIVER WATERSHED The East River watershed is located in the Province of Guangdong, Southern China. There are two generating stations on the East River Watershed: the Harvest Dam and the Maple Dam (Figure 1). The Maple Dam is upstream in the East River, and the Harvest Dam is downstream of the tributary, the Harvest River, which joins the East River below the Maple Dam. The system parameters are listed in Table 1. Table 1 Basic parameters of generating stations

Drainage Area

Unit km2

Streamflow Record

year

Average Inflow Normal Pool Level Useful Storage Minimum Pool Level Dead Storage Average Tail Water Elevation

cms m l
Minimum Release Requirement

cms

Installed Capacity

MW

Harvest Dam

Maple Dam

5734 34 196 116 6480 94 4310 35 100 302.5

5150 34 134 166 1249 128 286 90.5 90

160

EVALVATION OF STREAMFLOW FORECASTING MODELS

83

Southern China

Harvest River

arvest Dam Canton

East River

o

South China Sea Figure 1 Location of the East River Watershed Thirty-four years of monthly historic inflow series are available. The data serves two purposes. One is for parameter estimation of forecasting models. Another is for simulating the real-time operation where they are used as future "actual n inflows. The system performance index used to decide the optimal release policy for the coming month is '"

2

2

J=:L [:L C -:L Pi(k)]2 t-t

1=1,2, ... ,408

i

i-l

(24)

i-l

where Ci is the installed capacity at station i and

and

if T/gHi(k)Ri(k) ~ Ci if T/gHi(k)Ri(k) < Ci

(25)

(26)

where ." is the efficiency factor and g is the acceleration of gravity. The constraints on the system are given as follows: (27)

T. TAO AND W. C. LENNOX

84

The system is optimized over seven years or eighty-four months based on monthly mean values of inflows and subject to constraints given in Eqs.(15) and (27) to obtain the target storages and releases. February is assumed having 28.25 days. The results from the numerical evaluation of eleven forecasting models are presented in Table 2. The values in the brackets in the first row indicate what should be expected from a perfect forecast. It can be observed that the second model is generally the most favoured. Table 2 Numerical evaluation of forecasting models M#1 M#2 M#3 M#4 M#5 M#6 M#7 M#8

M#9

M#10 M#11

CO(1.0) 0.6939 0.8080 0.7138 0.7686 0.7646 0.7547 0.7517 0.7890 0.7495 0.7380 0.7339

NTM(-1.0) 0.0369 -0.3017 -0.1271 -0.1699 -0.1576 -0.1313 -0.1262 -0.1829 -0.1226 -0.0965 -0.0915

S(O.O) 0.7124 0.5873 0.6221 0.6428 0.6476 0.6639 0.6644 0.6573 0.6568 0.6716 0.6711

R(O.O) -0.0007 0.0008 -0.1074 -0.0005 -0.0004 -0.1013 -0.1020 -0.0023 -0.0004 -0.1048 -0.1056

A(O.O) 0.4394 0.3583 0.3595 0.3843 0.3858 0.3579 0.3597 0.3978 0.3928 0.3645 0.3648

Table 3 shows the results of simulated system operations. The fIrst eleven represent the simulated real-time operations using linear quadratic control with forecasted inflows from eleven different forecasting models for the coming month. The twelfth is the same as the first eleven except that the 'actual' inflows are substituted for the forecasted inflows for the coming month. There is an expected improvement over the first eleven. This indicates the merits of improved inflow forecasting. Table 3 System performance of simulated real-time operation

LQC M#1 LQCM#2 LQC M#3 LQCM#4 LQC M#5 LQC M#6 LQC M#7 LQC M#8

LQC M#9 LQC M#1O LQC M#11 LQC actual Ideal case

Avg. Power Generation (MW) Total Harvest Maple 187.68 120.33 67.35 120.21 67.20 187.41 67.87 187.85 119.98 120.74 187.97 67.23 187.96 120.75 67.21 120.56 67.57 188.13 188.09 120.51 67.58 120.25 66.94 187.19 120.31 67.17 187.48 120.14 67.54 187.68 120.12 67.59 187.71 121.03 69.28 190.31 121.78 73.05 194.82

J(10-6)

33.31 33.51 33.37 33.46 33.45 33.49 33.49 33.68 33.51 33.53 33.52 32.76 30.30

Reliability: Harvest 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%

min reI. Maple 94.9% 95.1% 95.6% 95.4% 95.1% 95.4% 95.4% 95.4% 95.1% 95.4% 95.4% 97.3% 97.8%

EVALUATION OF STREAMFLOW FORECASTING MODELS

85

Since there is no iteration involved in the linear quadratic control, it takes less than one minute to complete the computation for the first twelve scenarios on a Compaq 386/33L. However, it takes much more time to fmd targets. The last scenario, called the 'Ideal case', serves as a reference and can never be reached. It is assumed that the future inflows are perfectly known beforehand. The system goes through a one-shot optimization over all 408 months. It should be noted that the simulated operation using the forecasting model #6 produces only 3.4% less total power than the ideal case. The interesting finding is that the forecasting model #2 which is favoured by the numeric criteria does not outperform other forecasting models in terms of maximizing the power generation or minimizing the performance index. SUMMARY

The results shown above do not warrant any specific conclusion as to which method should be used in the evaluation and selection of a streamflow forecasting model. However, the study demonstrates that a streamflow forecasting model can be evaluated by a method other than the usual numeric and/or graphic criteria. The method is to recognize the application for which the model is intended and to assess its merits based on the final results, such as increase of power generation, reduction of flood damage, etc. ACKNOWLEDGEMENTS The writers would like to thank Professor Xi-can Shi of Tsinghua University, Beijing, China for providing the data used in this study. REFERENCES

Pindyck, R.S. (1972) HAn application of the linear quadratic tracking problem to economic stabilization policy. H IEEE Trans. A.uto. Control AC -17(3),287-300. World Meteorological Organization (1986): Intercomparison of Models of Snowmelt Runoff, No.646, Geneva, Switzerland.

APPLICA TION OF A TRANSFER FUNCTION MODEL TO A STORAGE-RUNOFF PROCESS

P.-S. YO, C.-L. LIU and T.-Y. LEE Department of Hydraulics and Ocean Engineering, National Cheng Kung University Tainan, Taiwan 70101, R.O.c.

In the storage approach to conceptual rainfall-runoff models, runoff is commonly simulated as a function of storage. Based on this hydrologic phenomenon, a storage-runoff forecasting model is developed to compare with the rainfall-runoff forecasting model. The model order and parameters are first calibrated by using Schwartz's Bayesian criterion. Eight storm events were then used for verification. One to six hours ahead forecast hydrograph according to both rainfall-runoff and storage-runoff models have a time problem. After both models are corrected by a backward shift operator, the time problem is relieved. Based on comparison between the forecasted and observed hydrographs and eight kind criteria, it seems that the storage-runoff forecasting model has performance superior to that of the rainfall-runoff forecasting model.

INTRODUCTION Floods are one of the most destructive acts of nature. Real time flood forecasting has been developed for flood protection and warning system recently. Depending upon the response time of a basin, a mathematical model to be used for real-time may consist of some parts of the following three basic elements: (1) rainfall forecasting model, (2) rainfall-runoff forecasting model, and (3) flood routing model (Reed, 1984). Because lots of catchments have quick response to rainfall input, a rainfall forecasting model is desirable which will act in unison with a rainfall-runoff model to extend the forecast lead time. Normally, the rainfall forecasting models are subject to significant forecasting error even if the forecast lead time is short (Einfalt, 1991). An alternative method used to extend the forecast lead time is discussed in this paper.

MATHEMATICAL MODEL A wide range of rainfall-runoff forecasting models have been developed recently including: (1) unit hydrograph and other methods using the S-curve, the discrete linear cascade 87 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 87-97. © 1994 Kluwer Academic Publishers.

88

P.-s. YU ET AL.

reservoir model (Chander and Shanker, 1984; Bolinski and Mierkiewing, 1986; Corrodini and Melone, 1987; Corrodini et al., 1986), (2) conceptual models, (3) non-linear storage models, and (4) transfer function models. O'Connel and Clark (1979) and Reed (1984) have reviewed some of these models. In this paper, the transfer function model is used to simulate the process of rainfall-runoff and storage-runoff for flood forecasting separately. Rainfall-runoff forecasting model

Powell (1985), Owens (1986), and Cluckie and Owens (1987) have demonstrated that rainfall-runoff process can be satisfactorily simulated by the transfer function model, Q(t)

= ajQ(t -1) +a2Q(t -2) +"+apQ(t -p) +bjR(t -1) +b2R(t -2) + ... +bqR(t -q) +e(t),

(1)

where

Q(t), Q(t -1),··· = discharge at times t, t -1,···, R(t)" R(t -1) ... = rainfall at times t , t -1 "... e(t)= noise,

p and q = model order, ~,

... ,ap and hI' · .. ,hq

=

model parameters.

When (1) is applied for runoff forecasting, a rainfall forecasting model is required to forecast the rainfall in the future. The model to forecast rainfall in this paper is:

(2) Storage-runoff forecasting model

In the conceptual rainfall-runoff models according the the storage approach, the runoff at the outlet of catchment is commonly simulated as a function of storage, S, (for example: Q = KS). Based on this hydrologic phenomena, a storage-runoff forecasting model is developed, in which the runoff at the present time is assumed to be a function of previous runoff and catchment storage. The major difference from the rainfall-runoff forecasting model as described in the previous section is that the storage replaced the rainfall as model input. The forecasting models include storage-runoff forecasting model,

89

TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS

Q(t) =

~Q(t

+

-1) +~Q(t -2) +"+apQ(t -p) +bIS(t -1) +b2 S(t -2)

... +bqS(t -q) +e(t)

(3)

and storage forecasting model,

(4) As the storage over the catchment area cannot be directly measured, the storage at present time is computed from the mass balance between the rainfall input and the discharge output, S(t) = [R(t) - Q(t)]

+ S(t -1).

(5)

PARAMETER ESTIMATION

Either rainfall-runoff forecasting model or storage-runoff forecasting model can be written in the general form,

O(t) = ap(t -1)

+~O(t

-2)

+ .. +apO(t -p) +b/(t -1) +bi(t -2) +

... +b/(t"-q) +e(t),

(6)

in which the O(t),O(t -1), ... ,OCt -p) are system outputs (i.e. discharge). 1(t), 1(t 1), ... , I(t - q) are system inputs, which is the rainfall in the rainfall-runoff forecasting model or the storage in the storage-runoff forecasting model; p and q are model orders and ai' ... ,ap , bl' ···,bq are model parameters, which are calibrated by using historical input and output data. If the historical data have N observations and m is the larger of p and q, (6) can be written as:

[om+' ~m+2 ] = [am ~m+l ON

i.e.,

0N-l

Om_I

°m-p+1

1m

1m- I

Om

°m-p+2

1m +1

1m

°N_2

°N_P

IN-

1

I N _2

m,+,] ..

nr']

I~m-q+2 X I N _q

~

~m~

bq

eN

" ' .

(7)

90

P.-s. YU ET AL.

O(N.m)Xl

=

Z(N-m)X(p+q) • C(p+q) xl +E(N-m)Xl·

(8)

Based on the minimum square error between the observed and simulated output data, the optimum parameters are determined according to the following equation:

(9) The model order can be decided based on the minimum value of SBC (Schwartz's Bayesian Criterion) (Schwartz, 1978),

SBC(p,q) = NIna! +(p +q)lnN.

(10)

a! is the variance of the residuals (the difference between observed discharge, OCt), and simulated discharge,

OCt), ~2

(f

e

= (0 -0)' (0 -0) N _(p + q)

-'--_---C---'-_-'-

(11)

CASE STUDY Sixteen storm events over the Fei-Tsui reservoir catchment in northern Taiwan were collected for a case study and eight of the storm events are used for calibration to determine the model orders and parameters. The other eight storm events are used to verify the model performance based on criteria of eight kinds (Habaieb et aI., 1991; Wang et aI., 1991; Abraham and Lendalter, 1983), which is divided into two groups, statistical and hydrologic indexes. Statistical index (a) Mean Absolute Deviation (MAD) 1

MAD = N where

N = No. of observation, Q(t) = observed flow,

L N

t=)

IQ(t) -Q(t)l

(12)

TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS

91

Q(t) = forecasted flow.

(b) Mean Square Error (MSE) (13)

(c) Revised Theil Inequality Coefficient (RTIC)

RTIC =

N[Q(t) [I~

Q(tW

INI~ [Q(tW ]112

(14)

(d) Correlation Coefficient (CC)

where Q = the mean value of observed flow,

Q= the mean value offorecasted flow. Hydrologic index

(e) Coefficient of Efficiency (CE) CE =1 -

[It

[Q(t) -Q(tW

IIt

[Q(t) -Q

r]

(16)

(t) Error of Peak Discharge (EQp) (17)

where

92

P.-s. YU ET AL.

Qp = the peak value offorecasted flow, Qp

=

the peak value of observed flow.

(g) Error of Time to Peak (ETp)

(18) where Tp

=

the time to forecasted peak flow,

Tp

=

the time to observed peak flow.

(h) Error of Total Volume (EV)

EV = t~ [Q(t) -Q(t)] N

/

N t~ Q(t)

(19)

RESULTS

Based on the criteria of SBC in (10), the optimal models calibrated by using eight storm events are

Rainfall-Runoff Forecasting Model: Q(t) = 1.1 194Q(t-l) - 0.4162Q(t-2) + 0.1551Q(t-3) + 0.0736R(t-l) + O.0491R(t-2),

(20)

Rainfall Forecasting Model: R(t) = O.7891R(t-l) - O.0363R(t-2) + 0.0015R(t-3) + O.0448R(t-4) + O.0646R(t-5),

(21)

Storage-Runoff Forecasting Model: Q(t) = 1.2285Q(t-l) - 0.4146Q(t-2) + O.1259Q(t-3) + O.0516S(t-l) O.0129S(t-2) - 0.0356S(t-3),

(22)

TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS

93

Storage forecasting Model: Set) = 1.4148S(t-l) - 0.3717S(t-2) - 0.0485S(t-3).

(23)

Both rainfall-runoff forecasting model and storage-runoff forecasting model are applied to eight storm events to verifY the model performance. One to six hours ahead forecast hydrographs are compared with the observed hydrographs as shown in Figure 1 and Figure 2, in which only two of eight storm events are shown. It is found that the forecast hydrographs have a time problem in both models. The r-hour ahead forecast hydrograph is shifted by nearly r-hours to compare with the observed hydrograph, therefore a backward shift operator is applied to correct the time problem, (24) where r is the lead time. Figure 3 and Figure 4 are the forecast hydrographs corrected by Eq(24). It seems that the time problem in forecast hydrographs is significantly reduced. Figure 5 presents the comparison of model performance based on eight kinds of criteria. From the analytical results of eight storm events one may conclude that the storage-runoff forecasting model has better performance than rainfall-runoffforecasting model. CONCLUSIONS

Rainfall-runoff forecasting model and storage-runoff forecasting model are developed and compared by using 16 storm events over Fei-Tsui reservoir catchment in northern Taiwan .. It is found that the forecast hydrographs have a time problem in both models. After both models are corrected by a backward shift operator, based on the comparison between the forecast hydrographs (one to six hours ahead) and eight kinds of criteria, the storagerunoff forecasting model has better model performance than the rainfall-runoff forecasting model. More researches are required to confirm this conclusion . REFERENCES

Abraham, B. and Ledolter, J. (1983) Statistical Methods for Forecasting, John Wiley & Sons, Inc., New York. Bobinski, E. and Mierkiewicz, M. (1986) ''Recent developments in simple adaptive flow forecasting models in Poland", Hyd. Sci. J. 31,263-270. Chander, S. and Shanker, H. (1984) "Unit hydrograph based forccast model", Hyd. Sci. J. 31, 287-320. Cluckie, I. D. and Owens, M. D. (1987) ''Real-time Rainfall Runoff Models and Use of Weather Radar Information", Chap. 12, Weather Radar and Flood Forecasting, ed. by V. K. Collinge and C. Kirby, Wiley.

94

P.-s. YU ET AL.

Corradini, C., Melone, F. and Uvertini (1986) "A semi-distributed model for real-time flood forecasting", Water Resource Bulletin 22,6, 1031-1038. Corradini, C. and Melone, F. (1987) "On the structure of a semi-distributed adaptive model for flood forecasting", Hyd. Sci. J. 32,2,227-242. Einfalt, T. (1991) "Inaccurate rainfall forecasts: hydrologically valuable or useless?", New Technologies in Urban Drainage UDT '91, ed. by C. Maksimovic, Elsevier Applied Science, London. Habai'eb, H., Troch, P. A. and De Troch, F. P. (1991) "A coupled rainfall-runoff and runoff-routing model for adaptive real-time flood forecasting", Water Resources Management 5, 47-6l. O'Connel, P. E. and Clark, R. T. (1981) "Adaptive hydrological forecasting - a review", Hyd. Sci. Bulletin 26,2, 179-205. Owens, M. (1986) Real-Time Flood Forccasting Using Weather Radar Data, Ph. D. Thesis, University of Birmingham, Department of Civil Engineering, u.K. Powell, S. M. (1985) River Basin Models for Operation Forecasting of Flow in RealTime, Ph. D. Thesis, University of Birmingham, Department of Civil Engineering, UK Reed, D. W. (1984) A Review of British Flood Forecasting Practice, Institute of Hydrology, Report No 90. Schwartz, G. (1978) "Estimating the dimension of a model", Annuals of Statistics 6,461464.

Wang, R. Y. and Tan, C. H. (1991) "Study on the modified tank model and its application to the runoff prediction of a river basin", Taiwan Water Conservancy 39, 3, 1-23. (in Chinese) Wei, W. W. S. (1990) Time Series Analysis: Univariate and Multivariate Methods, Addison-Wesley Publishing Company, Inc., New York.

95

TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS

Storm Event

Forecasting Model

7021

.... ....

1000

--Reo!

..... """F.

8DII

Rainfall

...

RainfallRunoff

... ... ...

'100

- - - - - bF.

.. __

--Ro" 1t1rF.

. _.0.

---··2ttrF• •• _.' 3hrF . · - - · · ..tlrF.

.• -., StirF. •

_0 • •

IhrF .

. .... hF.

"'"

•••

.... ,.

••• - - 2hrF.

8DII

7916

. n ..

--Reo! - - - - - fhrF. b r. __

0

1. . .

••

.11

• •••• 3hrF.

-

- _. _. 4hrF.

•••

o····lIIrF. •.... "'F,

2"

••

Figure I. Using rainfall-runoff forecasting model to estimate one to forecasted hrdrograph.

SIX

hours ahead

Storm Event

7916

. '200

--0-,

'000 ODD

ODD

.

. .0

Figure 2. Using Storage-runoff forecasting model to estimate one to forecasted hrdrograph.

SIX

hours ahead

96

P.-s. YU ET AL.

Storm Event

Forecasting Model

7021

.... ....

,000 Real • - - - - 1hrF. - - - - - 2hrF. - •. - - 3hrF.

800

Rainfall

100

....

- - - - - .thrF.

.000

400

••••• ttlrF •

... _- "'rF. • •• _. atlrF. - . ' . - 'hrF•

••• -. ItlrF, • _ ••. 'flrF.

... "'" ... ... ...

•1

,

- - R...

. _. -''''F. ~

3GO

RainfallRunoff

--....

...

zoo

...

7916

..

200

lOG

,

Figure 3. Using corrected rainfall-runoff forecasting model to estimate one to six hours ahead forecasted hrdrograph.

Forecasting Model

Storage

----....

7021

7916

-

,,.-

-.. -StorageRunoff

Storm Event

.

,... ...

,ao•

.. '"

2ao



lit, •.

·····."r •.

--....

.

,

Figure 4. Using corrected storage-runoff forecasting model to estimate one to six hours ahead forecasted hrdrograph.

97

TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS

Storm Event

Criteria

7021

MAD

7916

"

"

"

MSE

RTIC

CC

CE

EV

r . .

:~~.n-"'Ii'l "j -0.1 -0.115

:~I1JjIiIiIJ'

.,

:: .Jl.Pi'fii

''

.0.15

.0.26

Figure 5. Comparsion of model performance based on eight kind of criteria with corrected rainfall-runoff and storage-runoff forecasting model.

SEEKING USER INPUT IN INFLOW FORECASTING

T. Tao, I. Corbu, R. Penn, F. Benzaquen and L. Lai Water Resources Department, Ontario Hydro 700 University Avenue, H9C22, Toronto, Ontario M5G lX6 CANADA

Traditionally, inflow forecasting models were used like black boxes. Users prepared the inputs at one end and received the outputs at the other end. In such an environment, the inflow forecasting models were aimed at generating inflow forecasts without any user intervention. This paper describes anew, user friendly, approach to inflow forecasting. It allows users to input their preferences, interactively, during the execution of the model and to generate inflow forecasts with which they feel comfortable. INTRODUCTION

One of the requirements for integrating the water management and operation of Ontario Hydro's hydroelectric facilities into its Grid is to generate daily inflow forecasts for up to 732 days for those facilities and their control reservoirs. It is recognized that inflow forecasts with longer term could hardly be indicative of what will occur due to the random process upon which they are based. In order to assess the risk associated with the use of inflow forecasts, it is necessary to have a variety of equally likely inflow forecasts. The model described in this paper provides such a capability, allowing users to generate inflow forecasts with any desired exceedance probabilities of volume. Each generated inflow series has three parts. The first part comprises inflows for the first four days. They are provided interactively by a user-selected inflow forecasting model. The second part includes inflows from day 5 to day N-l. These are heuristically modified historic inflows which precede those in the third part. They ensure a smooth transition from the first part to the third part. The third part contains inflows from day N, which ranges from 15 to 60, to day 732. They are part of historic inflow records. The process of generating such local inflow series is accomplished through three modules. They are called expected forecast, heuristic forecast and probabilistic forecast respectively. Users play an active role in deciding what will be produced in each module (Tao 1993). The paper describes how such user interfaces are achieved. EXPECTED FORECAST

The expected forecast generates the inflows for the first four days of the two-year time 99 K. W. Ripel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 99-103. © 1994 Kluwer Academic Publishers.

100

T. TAO ET AL.

horizon. Preliminary comparative studies have been conducted to retain the best of several conceptual and stochastic models (Tao and Lai, 1991) for the hydrologic conditions in Ontario, Canada. Three stochastic models have been selected as a result of these studies. They are ARMAX(3,O,2), ARIMA(I,I,I) and ARIMA(I,2,2). Each of these models generates a different four-day forecast. The exogenous input in ARMAX(3,O,2) reflects the contribution of the precipitation of the previous and current days to the forecasted inflows. Natural logarithms of the inflows are used in formulating these three models. This ensures that the forecasted inflows will always take nonnegative values. The forecasts are performed one watershed at a time. For each run of the model, users are required to provide the past three days of observed local inflows, the temperature and precipitation for the previous day, and forecasted temperature and precipitation for the next four days at the indicated site. The temperature and precipitation data represent maximum daily temperature and total daily precipitation respectively (see the upper half of Figure 1). The maximum daily temperature is used to decide whether or not the precipitation should be considered in ARMAX(3,O,2). If the maximum temperature is equal to or lower than zero degree celsius, the precipitation is assumed not to contribute to the runoff. Once all required inputs have been provided, three different four-day forecasts are instantly displayed on the screen (see the lower half of Figure 1). Users can select one of them or issue their own forecast based on experience with the river system, knowledge of the meteorological forecast and the output of the three stochastic forecasting models.

IIISSISSAGI ROCXY ISLAND

06/18/93 06/19/93 06/20/93 06/21/93 06/22/93 06/23/93 06/24/93

Max Teq:»er.(OC) 1m:~1 m:u=5ei$ Ttl Precip. (nm) :mmiH!5!!Hi Inflow

Forecasted

Observed !il!!l!il!l!!!!l!i

mm:::::E$=

====:: :::=--=umm::

(ems)

23.00

25.00

"i!ll,

:::::m::=== !o:::s..::a:=

IialiHII!mI!!!

rue-::mmm:i

,UMA(1.1.1 )

i5k=a5i;~

ms:r:.==:m

> 0

0

>

0 0

>

0 0

>

0

>

0

0 0

IIII!IiIIl!I!lI

:::..15=:::::-.:

~1

:==:::::::: iEiUa55ii5

Im~~

="==:::..::::::

u:mmame

24.88

22.77

20.67

18.91

I~£i

26.51

26.77

26.91

26.98

26.00

27.05 28.14 30.45 29.27 JiUMA(1.2.2) =~ ~ §::sm:ii I!ll5SI!!lI5! Note: Inflow data refer to the station: Mississagi River at Rocky Island Lake Precip. and teq:»er. data refer to the station: Mississagi OH

IE· INPUT

ISER'S FORECASTS

Figure 1 EXPECTED FORECAST: input and output screen HEURISTIC FORECAST The heuristic forecast provides a smooth transition between the single four-day forecast retained by the user, which ends on day 4, and the multiple series of daily historic

101

SEEKING USER INPUT IN INFLOW FORECASTING

inflows, which start from day N. Basically, it eliminates the discontinuity that occurs on day 4 between the expected forecast and the historical series of inflows (Figure 2). The module extends the inflow forecast from the fourth day to the Mh day. From the Mh day on, the forecast is represented by series of historic inflows. The value of N is selected by the user and must be between 15 and 60. When selecting the value of N, the user places judgement on the current state of the watershed and on how far apart this is from its historical average at the time of the forecast. The rule for a smooth transition from day 4 to day N is to reduce the differences identified on day 4 between the expected forecast and each historic series at a constant rate continuously until such difference on day N-l is only 1 % of the difference on the fourth day. The reduction rate "r" can be determined by solving equation (1):

rN-s=.OI

(1)

The heuristically calculated inflow forecasts QH,I are obtained as follows: QH" =QA" +(QE,4 -QA,J • rt-4

5 ~t
(2)

where QE 4 and QA 4 are the expected inflow and the actual historic inflow on the fourth day respeCtively. Figure 2 shows an example where N=60. 90~---------------------------------------------.

LEGEND

80 A

70 Ci)

E

EXPECTED FORECAST

-- HISTORIC RECORDS

60

........ HEURISTIC FORECAST

~ 50

~

u::

40

30

Discontinuity

201-'k------"'"'"

""

10

...., ...........,..

OLL--------------L-~------------L-------------~

04

50

60 Time (day)

100

150

Figure 2 HEURISTIC FORECAST: modification of inflows (N=6O) The heuristic forecast also takes care of the occurrence of peak inflow during the spring freshet. It is assumed that the peak inflow caused by snowmelt during the spring freshet can only happen once a year. Users are asked to indicate if the peak inflow for the

102

T. TAO ET AL.

current spring freshet has passed. If it is the case, the heuristic forecast will phase out those peak inflows of historic series which lag behind. The process is demonstrated in Figure 3 for one of the historic series. It first shifts the peak inflow of the historic series to day one. The daily inflows after the peak are continuously moved to day 3, day 5, and so on. This process is repeated nd times, where ~ represents the number of days between day one and the day when the peak of historical inflow series occurred (see Figure 3). The inflows on even days are set to be the averages of two neighbouring inflows on odd days. The shifted inflows are then modified using equations (1) and (2) to achieve a smooth transition from the fourth day to the Mh day. ~.--------------.----------------------------~

Actual inflows Forecasted inflows nd

I

LEGEND

200

HISTORIC RECORDS -+- SHIFTED RECORDS "ii)

.[

150 A

HEURISTIC FORECAST EXPECTED FORECAST

~

u:: 100

50

OL--------------LL-----------~~~--------

-50

04

50 60

__~

100

Time (day)

Figure 3 HEURISTIC FORECAST: phase-out of peak inflows PROBABll.JSTIC FORECAST The probabilistic forecast sorts out representative series based on user-specified exceedance probabilities of volume. The user has a choice of defining two such probabilities. The first one is based on the biannual volume. The second one is based on the volume corresponding to a user-specified time period, which ranges from 45 days to 366 days. The process starts with finding two sets of volumes: one set covers the inflow volumes of each series from day one until the end of a user-specified time period. Another set covers the total volume of all inflows of each series. The volumes in each set are then ranked in descending order and exceedance probabilities are calculated. Finally, each inflow series is associated with its exceedance probability. Figure 4 presents four forecasted local inflow series with representative exceedance probabilities based on annual volumes, corresponding to a user-specified time period of 366 days, at

SEEKING USER INPUT IN INFLOW FORECASTING

103

Rocky Island Lake of the Mississagi River. Forty years of inflow data were used. 260 240

........... 5% ------ 50% -

220 200 180

75% - - 95%

60 First 60 days 50 40

_1S0 If)

,[ 140 ~ 120

u:: 100 80

30 20 10

SO 40 20 100

Time (day)

200

300

Figure 4 Inflow series with representative exceedance probabilities

SUMMARIES The inflow forecasting approach introduced in this paper provides users with more than one optional scenario at each step, allowing them to make decisions interactively during the forecasting process. Every decision made by users has an effect on the final forecast. The major assumption of the new approach is that users are experienced practitioners. The new approach is designed to enhance their capability of making an acceptable forecast, and not to relieve them of doing the forecast. Figure 4 can, in fact, be viewed on screen at the end of the forecast. Users can go back and try with different inputs until they are comfortable with the inflow forecast they generate. REFERENCES Tao, T. and L. Lai (1991) Design Specifications of Short Term Inflow Forecaster, Technical Report, Ontario Hydro. Tao, T. (1993) Short Term Inflow Forecaster: User's Manual, Technical Report, Ontario Hydro.

LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY

P. R. H. SALES!, B. de B. PEREIRN and A. M. VIEIRA! ! Centrais Eletricas Brasileiras S. A. - ELETROBRAS Av. Pres. Vargas, 642, 8Q andar, Rio de Janeiro, PO Box 1639 20079-900, Brazil 2COPPEIUFRJ, Rio de Janeiro, PO Box 68507 21945-970, Brazil Linear Procedures for Time Series Modeling, recently introduced in the Brazilian Electrical Sector by ELETROBRAS through the Coordinating Group for the Interconnected Operation of the Power System - GeOI, are presented for the following model sub-classes: Univariate Autoregressive Moving Average (ARMA), ARMA Exogenous or Transfer Function (ARMAX or TF), Seemingly Unrelated or Contemporaneous ARMA (SURARMA or CARMA), Multivariate or Vectorial ARMA (MARMA or VARMA) and MARMA Exogenous (MARMAX). The methodology and the algorithms here proposed had, as a cornerstone, the works of Professor Hannan, developed alone or with other researchers after 1980 and is a real application of inflows forecasting, which takes a very important place in the Brazilian Electrical Operation Planning.

INTRODUCTION The Brazilian interconnected hydrothermal electrical generating system has 55900 MW of installed capacity in which hydroelectric plants account for 93 percent. Since 1973, GeOI, the Coordinating Group for the Interconnected Operation of the Power System, which has representatives from the 18 majors Brazilian utilities and the holding, ELETROBRAS, has been responsible to achieve the most efficient utilization of the available hydro and thermal resources in the system. GeOI activities range from operations planning for the next five years to real-time control of the power system. In its operations planning, the inflows forecasting is one of the most important point, as shown by Terry et al. (1986). 105 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 105-117. © 1994 Kluwer Academic Publishers.

106

P. R. H. SALES ET AL.

Up to now, GeOI inflows forecasting is based on YevjevichIBox - Jenkins methodologies, see Sales et al. (1986) or Salas et al. (1980). In a selection of a methodology to a particular data set we should clearly answer the following questions: (a) why has a technique been employed relative to other techniques? (b) what has guided the choice of technique in the applications case? (c) what utility have the resultant models? The nature of GeOI coordinating activities in the operation of large scale power system does not allow the use of much time in the analysis of a large number of extensive hydrological time series, each one with more than 700 observations. We believe that in this case linear automatic methods would result in a much faster action and a more efficient performance of GeOI specialists group on operational hydrology. The other available methods, such as non linear and non automatic techniques, would be less adequate for being more dependent on the analyst functions. Furthermore, automatic methods are also recommended in the case of training of new specialists who replace utilities representatives. The methodology and algorithms utilized in this paper were first proposed by Hannan and Rissanen (1982) and further developments are presented in Hannan and Deistler (1988) and were applied to several hydrological time series of monthly average natural inflows of reservoirs in Brazil. Therefore, four linear algorithms were used .for identification and estimation of the parameters of the ARMA, ARMAX, SURARMA and MARMA models, as presented in Sales (1989) and Sales et al. (1986, 1987, and 1989, a, b) and forecasts for one year ahead with standard error intervals were obtained for each one of the selected model. Theoretical properties, simulation results and applications of the identification and estimation procedures given in this paper are presented in Hannan and Deistler (1988), Poskitt (1989), Koreisha and Pukkila (1989, 1990, a, b) and Pukkila et al. (1990) and references therein. The purpose of this paper is to present a first known real application of some of these methodologies.

TIME SERIES MODELS. LINEAR APPROACHES A great number of univariate and multivariate time series models have been recently proposed in hydrology, and they can be classified according to the dependency and relationship previously mentioned.

LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY

Consider m time series represented by W

-t

= (WIt'

107

W 2t' ... , W m ) t

The MARMA (p,q) model is written as cl>p(B)Z

-t

= 9 q (B)a

(1)

-t

where i)

cl>p(B) and 9 q (B) are respectively the autorregressive and moving average polynomial matrices in the backward shift operator B. It is assumed that all the roots of the determinantal equations IcI>p (B)I= 0 and 19 q (B)I= 0 are outside the unit circle.

ii) Zt ia a suitable transformation of the time series Wt. In our applications we first use the Box-Cox transformation and them standartized by the monthly means and standartized by the monthly means and standard deviations, since both were periodic. As concerns the complete muItivariated model expressed by (4), the following remarks can be made:

(i)

during the model building process the degrees of the operators cl>ij(B) and 9ij(B) can be adjusted so that the ARMA models for the individual time series accurately describe the behavior of each series;

(ii)

the SURARMA model is the result of cl>ij(B) and 9ij(B) coefficients being null for i :# j, that is, the parameters matrices are diagonal;

(iii)

the ARMAX model is obtained when the coefficients cl>ij(B) and 9ij(B) are null for i < j or, in other words, when the parameters matrices are triangular;

(iv) the MARMAX is obtained when is deleted one or more rows from the autoregressive matrix, cl>p(B), and one or more columns of the moving average parameters matrix, 9q(B). Basically, the procedures for the linear algorithms consist in linearizing the estimates of the innovations in (i = 1, ... , k), where is an initial estimate of the

a , Po.

-t

1

Po.

1

(i = 1, ... , k), obtained previously in the Initial Identification and Estimation Stage. Thus, the ~i estimator expression can be written as parameter vector,

~i

P. R. H. SALES ET AL.

108

(2)

a

where i is the partial derivative of -t

a

-t

with respect of ~. . 1

The asymptotical variance estimator of ~i is given by

(3)

and the covariance estimator between ~i and ~ j , by

(4)

If the model does not have moving average terms, a is linear in J3 and the solution is simply given by

-t

-

(5)

Otherwise, a new parameters vector given by (5) is used in place of J3 and the linearization process is repeated until final convergence.

-0

LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY

109

APPLICATIONS

In order to get the forecasts, the proposed algorithms were used in eight series of natural monthly average flows rates of the reservoirs of Furnas, on the Grande River, ltumbiara on the Paranaiba River, Dba Solteira on the Parana River, Barra Bonita on the Tiete River, Jurumirim on the Paranapanema River, Tres Marias and Sobradinho on the Sao Francisco River as well as the incremental series of Sobradinho. Each one of the hydrological time series analyzed has 648 observations. The data cover the period from January 1931 to December 1984 and were obtained from Centrais Eletricas Brasileiras SA ELETROBRAS, Brazil- see Figure 1.

Flow Rates (~/s) Max Min Average 1. FURNAS

3650

196

837

2. ITUMBIARA

5320

254

1379

3. I.SOLTEIRA

15293

1280

4583

4. B.BONITA

2104

42

335

5. JURUMIRIM

1539

51

203

6. T.MARIAS

3859

92

632

7. SOBRADINHO 15364

640

2569

8. Incr.SOBRAD. 12514

504

1926

Figure 1. Location of the time series used in the linear algorithms.

P. R. H. SALES ET AL.

110

Application of the Box-Cox transfonnation, Table 1 shown that all of them were of a natural logarithmic type. TABLE 1. Box & Cox transformation selected to the monthly natural average flow rates of site developments of Furnas, Itumbiara, Dba Solteira, Tres Marias, Sobradinho and the Intermediate Basin SERIES

Al

A2

FURNAS ITUMBIARA I.SOLTEIRA B.BONITA JURUMIRIM T.MARIAS SOBRADINHO INTERMEDIATE BASIN

0 0 0 0 0 0 0 0

-179 -179 -1238 -35 -45 -90 -529 -468

Univariate ARMA (p,q) model to Furnas series

From the initial estimates of the parameters of the ARMA (1,1) model identified in the = 0.4351 we moved to final previous stages. ~1 = 0.8426, (h = -0.2278 and stage of the proposed algorithm. The final estimates shown on Table 2 were obtained after ten iterations with accuracy of I x 10-4.

&;

0';

TABLE 2. Final estimates of the parameters cl>1' 01 and of the ARMA(1,I) model fitted to the transformed series of Fumas site development PARAMETER

ESTIMATE 0.8421 -0.2398 0.4343

STANDARD ERROR 0.0237 0.0426

111

LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY

Forecasts for one year ahead with two standard error intervals are shown in Figure 2. 4 3,5 3 2,5 -!!

-

"'s

2

"b 1,5 0,5 0

• +

-

J

•+

1985

+



+

~

• • • •

FMAMJJA

+

---

• •



SON

+ forecast • observed

D

Figure 2. Forecasts for 1985 with two standard error. ARMA model- Furnas.

ARMAX (p,r,s,q) model to Tres Marias, Sobradinho and incremental to Sobradinho series The identified ARMAX model was: p = 1, r = 2, s = 0 and q = 1. The initial estimate of the parameters were Cl =0.8075, 1 =0.6227, 2 = -0.5202, £'1 =-0.4130 and

a

&; =0.2946.

a

After the initial estimates of the previous stages of the algorithm, we moved to the final stage. The final estimates were obtained after eight iterations with accuracy of 1 x 10-4 and are summarized in Table 3 with the corresponding standard errors.

1,

0;

TABLE 3.Final estimates of the cl> d d2, fl and parameters of the ARMAX model fitted to Tres Marias, Sobradinho and Intermediate Basin series

PARAMETER ESTIMATE STDERROR

SERIES

VARIABLE

SOBRADINHO

zt-l

cl

0.8469

0.0866

TRESMARIAS

Xt-l Xt-2

d1 d2

0.5996 -0.4626

0.0358 0.0742

~-l

fl 2 0a

-0.3536 0.2939

0.0899

RESIDUAL

112

P. R. H. SALES ET AL.

Ex-ante forecasts for one year ahead with two standard error intervals are shown in Figure 3. 14 12 10 8 ~ 6 "'a "b 4 2 0

-

, •

1985

+

J

F

M

L; A

M

J

i ._.-t....!--

J

A

S

N

0

+ forecast • observed

D

Figure 3. Ex-ante forecasts for 1985 with two standard error. ARMAX modelSobradinho, input Tres Marias.

SURARMA (p,q) model to Dha Solteira, Barra Bonita and Jurumirim series The algorithm considered, in iterative way, the estimates of the obtain vector 13 estimates and the corresponding standard errors.

n

matrix in order to

Table 4 resumes the results of the convergence of the algorithm after four iterations with accuracy of 1 x 10-4. TABLE 4. Final estimates of the SURARMA(I, 1) model fitted to Dha Solteira, Barra Bonita and Jurumirim series SERIES I. SOLTEIRA

B. BONITA

JURUMIRIM

FINAL ESTIMATES ~1 = 0.7954 (0.0234)

81 = - 0.1610 (0.0404) ~1 = 0.7616 (0.0188)

81 = - 0.3824 (0.0245) ~1 = 0.6588 (0.0231)

~2 O'a

=04099 .

&; = 0.5151 &; = 0.5421

113

LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY

Forecasts of one year ahead for the three series with two standard error intervals are shown in Figure 4. 25 20

1985

... 15 ........

-

+ forecast

a

"b1O

• observed

5

°

J

FMAMJJA

SON

D

Figure 4a. Forecasts for 1985 with two standard error. SURARMA model- TIha Solteira. 1,4 1,2 1 ...~ 0,8 "S 0,6 0,4 0,2

°

1985

• J

F

M

• • •• • • i A

M

J

J

A

S

0

+

• N

..

+

+ forecast • observed

D

Figure 4b. Forecasts for 1985 with two standard error. SURARMA model- B. Bonita. 0,8 0,6 :.!'!:!

0,4

"'s "S 0,2 0

1985

• •

+ +

• J

F

M

• • • •• i +

A

M

J

J

A

S

+ + +

••• 0

N

+ forecast • observed

D

Figure 4c. Forecasts for 1985 with two standard error. SURARMA model- Jurumirim.

P. R. H. SALES ET AL.

114

MARMA (p,q) to Fumas, Itumbiara and Tais Marias series Using the residuals of the three series which were obtained previously, several multivariate MARMA (p,q) models were estimated. With the BIC (p,q) criterion the MARMA (1,1) model was identified. A careful analysis of the results permitted the maximal use of the proposed algorithm in Furnas, Itumbiara and Tres Marias series. First, the iterative process of the final stage considered the complete multivariate model, that is, with no restriction imposed on its parameters. The final estimates were obtained after five iterations with accuracy of 1 x 10-4. After this, restrictions were imposed to the parameters of the MARMA (1,1) model. In other words, the hypothesis that not all parameters of the model differed significantly from zero was considered consistent. In fact, the SURARMA model seems suitable here, but for illustration of the MARMA algorithm we deleted only parameter within one standard error. The final estimates of the parameters of the restricted MARMA (1,1) model were obtained after four iterations in the final stage of the proposed algorithm with 1 x 10-4 accuracy. Table 5 summarizes the principal results of the final convergence process. Standard errors of the estimates are shown in parenthesis. TABLE 5.Final estimates for the restricted MARMA(1,1) model parameters fitted to Furnas, Itumbiara and Tres Marias serlesof the SURARMA(1,I) model fitted to Ilha Solteira, Barra Bonita and Jurumirim series

e MATRIX

cb MATRIX

SERIES

FURNAS ITUMBlARA T.MARIAS FURNAS ITUMBIARA

1'RES MARIAS

0.8685 -0.0551 (0.0206) (0.0354)

-

-0.7380 (0.0333)

-

-

0.0459 -0.0665 0.8202 (0.0215) (0.0369)_ JO.0243)

RESIDUAL

FURNAS ITUMBlARA T.MARIAS

-

0.4299

0.0971 (0.445)

0.4478

0.1093 -0.2219 (0.0562) (0.431)

0.4497

-0.3397 0.0827 (0.0363) (0.0363)

-

-

VARIANCE

0.2239 (0.0565)

115

LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY

Forecasts for the three series for one year ahead with two standard error intervals are shown in Figure 5.

Figure Sa. Forecasts for 1985 with two standard error. MARMA model- Furnas. 5

4 -!!

-

1985

3

"'s 2

"b

+

+

0





.............

1 J

F

M

A

M

J

J

• A

S

0

••

+ forecast • observed

~ N

D

Figure 5b. Forecasts for 1985 with two standard error. MARMA model- ltumbiara. 5

4 -!!

-

1985

3

"'s 2

"b

0

-• +

1 J

F

M

A

M

J

J

A

S

0

N

+ forecast • observed

D

Figure 5c. Forecasts for 1985 with two standard error. MARMA model- Tres Marias.

P. R. H. SALES ET AL.

116

FINAL COMMENTS Some general comments can now be made. i)

Since variances of dry periods are smaller than wet periods in all graphs of 12 steps ahead forecasts, this becames apparent whem we transform back to the original variables.

ii) Since SURARMA models, from the physical point is most sensible in most applications we obtained smaller standart errors for parameters and residual variances than with ARMA models (eq. Table 2 and 5 for Furnas). Whenever ARMAX where conveniant the residual variance where smaller than for ARMA models

iii) Currently, at ELETROBRAS, forecast comparison are being made among the automatic methodology of this paper with the Box-Jenkins methodology. The results seems promising for the automatic methodology. iv) The computer time on an ffiM 4381 R14 were respectively 3.06 sec. for the ARMA, 10.52 sec. for the ARMAX, (4.67+3x3.06) sec. for the SURARMA and 19.78 sec. for the MARMA application. Currently we are working on a microcomputer version with more efficient numerical algorithms. v) Theoretical properties of the identification and estimation procedures given in this paper are presented in Hannan and Deistler (1988) and references therein. Simulation results and applications on this and related work are given in Newbold and Hotopp (1986), Hannan and McDougall (1988), Poskitt (1989) and Koreisha and Pukkila (1989, 1990 a,b) and Pukkila et al (1990). ACKNOWLEDGMENT The authors are grateful to the late Professor E. J. Hannan for his encouragement and for making available many of his, at the time, unpublished papers and an anonimous referee for his usefull suggestions. REFERENCES Hannan, EJ. and McDougall, AJ. (1988) "Regression procedures for ARMA estimation", Journal of the American Statistical Association, Theory and Methods, 83, 490-498. Hannan, E.J. and Rissanen,l. (1983) "Recursive estimation of mixed autoregressivemoving average order", Biometrika, 69, 81-94. Correction, Biometrika, 70, 303. Hannan, S. and Deistler, M. (1988) The Statistical Theory of Linear Systems, John Wiley & Sons, New York.

LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY

117

Koreisha, S. and Pukkila, T. (1989) "Fast linear estimation methods for vector autoregressive moving-average models", J. of Time Series An. 10,325-329. Koreisha, S. and Pukkila, T. (1990 a) "Linear methods for estimating ARMA and regression models with serial correlation", Comun. Statist. - Simula, 19, 71-102. Koreisha, S. and Pukkila, T. (1990 b) "A generalized least-squares approach for estimation of autoregressive moving-average models", J. of Time Series, An. 11, 139151. Newbold, P. and Hotopp, S.M. (1986) "Testing causality using efficiently parametrized Vector ARMA models", Applied Mathematics and Computation, 20, 329-348. Poskitt, D.S. (1989) "A method for the estimation and identification of transfer function models", J. Royal Statist. Soc. B, 51,29-46. Pukkila, T., Koreisha, S., Kallinen, A (1990) "The identification of ARMA models", Biometrika, 73, 537-548. Salas, J.D., Delleur, J.W., Yevjevich, V. and Lane, W.L. (1980) Applied Modeling of Hydrologic Time Series, Water Resources Publication. Sales, P.R.H. (1989) "Linear procedures for identification and parameters estimation of models for uni and multivariate time series", D. Sc. Thesis, COPPE / UFRJ (In Portuguese) . Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1986) "Inflows forecasting in the operation planning of the Brazilian Hydroelectric System", Annals of the II LusitanianBrazilian Symposium of Hydraulics and Water Resources, Lisbon, Portugal, 217-226 (In Portuguese). Sales P.R.H., Pereira, B. de B. and Vieira, AM. (1987) "Linear procedures for identification and estimation of ARMA models for hydrological time series", Annals of the VII Brazilian Symposium of Hydrology and Water Resources, Salvador, Bahia, 605-615 (In Portuguese). Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1989 a) "A linear procedure for identification of transfer function models for hydrological time series", Annals of the IV Luzitanian-Brazilian Symposium of Hydraulics and Water Resources, Lisbon, Portugal, 321-336 (In Portuguese). Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1989 b) "A linear procedure for identification and estimation of SURARMA models applied to multivariate hydrological time series", Annals of the IV Luzitanian-Brazilian Symposium of Hydraulics and Water Resources, Lisbon, Portugal, 283-248 (In Portuguese). Terry, L.A, Pereira, M.V.F., Araripe Neto, T.A, Silva, L.F.C.A and Sales, P.R.H. (1986) "Coordinating the energy generation of the Brazilian national hydrothermal electrical generating system", Interfaces, 16, 16-38.

PART III ENTROPY

APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS

CHAO-LIN CHIU Department of Civil Engineering University of Pittsburgh Pittsburgh, PA 15261 USA This paper describes the present status of efforts to develop an alternative approach to hydraulics, in which probability and entropy concepts are combined with deterministic, fluidmechanics principles. Some results of applying the approach in analysis and modeling of flows in pipes and open channels are also presented. INTRODUCTION

Uncertainties always exist in parameters and variables involved in hydraulic studies of flows in pipes and open channels, such as velocity distribution, discharge, shear stress, friction factor, diffusion, and transport of mass, momentum and energy. The uncertainties are due to both the inherent randomness of these parameters and variables, and man's ignorance or inability to fully understand them. Hydraulic studies under such uncertainties require an approach that has probability element. A possible approach being developed is based on probability and entropy concepts combined with the deterministic, fluid-mechanics principles. Some of the research results have been published in a series of papers (Chiu, 1987, 1988, 1989, 1991; Chiu and Murray, 1992; Chiu, etc., 1993). This paper summarizes the approach applied in analysis and modeling of flows in pipes and open channels. MODELING OF VELOCITY DISTRIBUTION

The spatial distribution of mean-flow velocity in the longitudinal direction affects the discharge, shear stress distribution, friction factor, energy gradient, diffusion, and concentration of sediment or pollutant, etc. Therefore, to study various transport processes in pipes and open channels, a reliable mathematical model of velocity distribution is 121 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 121-134. © 1994 Kluwer Academic Publishers.

C.-L.CHIU

122

needed. A system of velocity distribution equations derived by Chiu (1987, 1988, 1989)can be represented by

(1) in which u=velocity at 1;; I;=an independent variable with which u develops such that each value of I; corresponds to a value of U; I;max=maximum value of I; where the maximum velocity U max occurs; and I;o=minimum value of 1;, which occurs at the channel bed where u is zero; and

(E a N

p(u) =exp

i +1 u i )

(2)

~=o

which is the probability density function of u, maximizing the entropy function (Shannon 1948), H = -JoUmaxp(u) lnp(u) du

derived by

(3)

subject to the following constraints: flmaxp{u) du

=1

(Umaxup(u)du =

u=

Jo

(4)

Q

A

(5)

(6) and

(7) Equation (1) means that if I; is randomly sampled a large number of times within the range (1;0' ~maX> and the corresponding velocity samples are obtained, the probability of velocity falling between u and u+du is p(u)du. Equation (4) is based on the definition (condition) of a probability density function. Equation (5) is based on the condition that the mean or average velocity in a cross section must be equal to Q/A, where Q is the discharge and A is the cross sectional area of the channel. Equation (6) is based on the condition

APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS

123

that the rate of momentum transport through a cross section is pAu 2 or pA13il2 where l3 is the momentum coefficient. Equation (7) represents the condition that the rate of kinetic-energy transport through a section is pAu 3 /2 or pAail 3 /2 where a is the energy coefficient. A system of three different velocity distribution models, Models I, II and III, can be obtained by using three different sets of constraints (Chiu, 1989). If the first two constraints, (4) and (5), are used in entropy maximization, p(u) in (1) is given by (2) with N=l, and becomes (8)

Equation Model I,

(1)

can then be integrated analytically to yield

(9 )

in which M=a 2u max , a parameter called "entropy parameter" since, among other reasons, the entropy of the distribution p (u/umaxl, obtained by (3) with u replaced by u/u max and the upper limit of integration replaced by unity, is a function of only M (Chiu, 1989, 1991). Smaller values of M correspond to a more uniform pattern of probability distribution p (u/u max ) , a greater value of entropy, and a less uniform velocity distribution. By substituting (2) with N=l into (4) and (5), the following equation can be obtained: (10)

u

Z1max

(11)

Equation (10) was used in deriving (9); and equation (11) is a very useful equation that can be employed in parameter estimation and many other applications. For instance, an obvious application is to determine the entropy parameter M from the ratio of the mean velocity to the maximum velocity. It appears that an erodible channel tends to_shape the channel and velocity distribution pattern so that u/u max may fall in a range between 0.85 and 0.9 that corresponds to the value of the entropy parameter M between 6 and 10 (Chiu, 1988), as shown by the data obtained by Blaney (1937) from canals in the Imperial Valley. Very few laboratory and field data available

C.-L.CHIU

124

include u max probably because, without the probability concept, there has not been any basis or motivation to measure it. According to the probability concept, u max contains important information about the velocity. It is an important statistical parameter that defines the range of velocity, as it is known that th~ minimum velocity is zero. u max along with the mean value u and the probability density function p(u) will fully describe the probability law governing the velocity distribution in a channel cross section. The importance of u max as an important parameter or variable for characterizing a streamflow should, therefore, be emphasized in future studies. If (4)-(6) are used as constraints, p(u) in (1) is given by (2) with N=2; and the velocity distribution equation given by (1) is Model II. Similarly, if all four constraints, (4)(7), are used, N=3 in (2) and (1) yields Model III. To determine u for a given value of ~ by Model II or III, (1) can be integrated numerically to select u, the upper limit of integration, that will balance the two sides of (1). Chiu (1989) presented a discrete parameter estimation technique for these models. With the probability density function p(u), the crosssectional mean values of u, u 2 and u 3 can be obtained by taking their mathematical expectations (expected values), without integrating over the physical plane. This is an attractive feature of the analytical treatment in probability domain, especially when the channel cross section has an irregular and complex geometrical shape. For instance, if (8) is used to represent p (u), the cross-sectional mean of u, can be obtained as the mathematical expectation of u (Chiu, 1988) as expressed by (11) in ratio of u to u max as a function of M. The expected values of u 2 and u 3 give the momentum and energy coefficients also as functions of only M (Chiu, 1991). Equation (1) indicates that (~-~o) / (~max-~O) is equal to the probability of velocity being less than or equal to u. This provides guidance in selecting a suitable form of equation for ~. Flows through pipes and various open channels can be studied by selecting a suitable equation for ~ in (1).

a,

FLOW IN CIRCULAR PIPE

An axially symmetric flow in a circular pipe can be studied by defining ~ in (9) to be (12)

In which r=radial distance from the pipe center; and R=radius of the pipe (Chiu, etc.,1993). ~ as expressed by (12) is the ratio of the area in which the velocity is less than or equal to u, to the total cross sectional area of the pipe. with ~

APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS

defined by (14),

~o

=0

i

~max=l;

and (9)

125

(Model I) becomes (13)

This is the new velocity distribution equation proposed by Chiu, etc. (1993) for a pipe flow. In contrast, a widely-used form of Prandtl-von Karman universal velocity distribution law for a pipe flow is

tImax -u u.

(14)

Equation (13) satisfies the boundary conditions that u=O at r=R and du/dr=O at r=O, but (14) ("universal law") does not. Furthermore, unlike (14), (13) does not give the velocity gradient that approaches infinity at the pipe wall. Therefore, (13) is applicable in the entire flow field. Figure l(a) exhibits a system of velocity distributions, with u/u max given by (15) plotted against 1-r/R in the physical plane, for a range of values of M. It correctly shows the velocity gradient of each of the velocity distributions to be zero at the pipe center (where 1-r/R=1). Figure l(b) shows the same velocity distributions, but has u/u max plotted against ~ or 1-(r/R)2. By equating the sum of the pressure and gravity forces with the frictional resistance, the wall shear can be written as (15)

in which p=fluid density; Rh=hydraulic radius equal to D/4i U.= shear velocity equal to (gRhS f ) 1/2; and Sf=energy gradient that can be expressed as hf/L. Based on a balance between the shear stress and diffusion of momentum at the pipe wall, 1: 0 =pt: o(-

ddU)

r

r=R

(16)

In (16), Eo is the momentum-transfer coefficient at the wall, which is equal to the kinematic viscosity v of the fluid if the flow is laminar, or if the flow turbulent with a viscous sub-layer (i. e., the pipe is hydraulically "smooth").

C.-L. CHID

126

0.9 0.8 0.7

cr:

-

0.6

0;: 0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

ulu.u

(a) I

0.9 0.8 0.7 0.6

l;0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

O.S

ulu_

(b)

Figure 1. Velocity distribution given by (13). For turbulent flows in "rough" pipes, Eo is different from v and varies with pipe roughness and fluid turbulence. With the velocity distribution represented by (13), the velocity gradient can be written as

dudr

tImax M

2r (eM-l) R2

(17 )

l+{eM-l)[l-(~n

which is zero at r=O as it should be. At the wall,

(17)

APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS

becomes

(dU) dr

=_

2L1max (eM-l)

127

(18)

MR

r=R

which, unlike the velocity gradient given by (14), remains finite. Equations (15), (16), and (18) give the headloss due to friction over the pipe length L as h =32 (eM-l) ( U f

M

L1max

)-l( DU)-l(~) L Tf v v

D 2g

(19)

By comparing (19) with the Darcy-Weisbach equation, and using (11) for u/umax , the friction factor can be obtained as: (20)

in which (21)

Equation (20) gives the friction factor f as a function of the three dimensionless parameters, M, NR and to/v. The entropy parameter M represents the velocity distribution pattern and, hence, affects the transport of mass, momentum and energy. In "smooth" pipes, a viscous sub-layer exists at the wall and, hence, 8 o=V. If the flow is laminar, 8 o=V and f=64/N R, and (20) yields F(M)=2 or, from (21), M=O. As M approaches zero, (11) gi ves u max =2u according to the L Hospital rule; and (13) becomes I

(22)

which is identical to the parabolic velocity distribution obtained by applying the momentum equation to a viscous, Newtonian fluid.Results presented so far are strictly analytical. By combining these results with experimental data, Chiu etc. (1993) derived an equation that relate the entropy parameter M to the friction factor, as shown in Figure 2. Figure 3 gives a comparison of (13) anQ (14), based on velocity data from a rough pipe (Nikuradse, 1932). The two equations differ primarily near the center and the wall. The region near the wall is enlarged in Figure 3 0 help depict the difference. Figure 4 compares the velocity gradient given by the two equations. As expected, the main differences also occur near the center and the wall. The region near the center is enlarged in Figure 4 to give a better contrast.

C.-L. CHIU

128

.........

0.1

f

.......

t ' .......

0.01

0.001

o

"

2

6

&

10

12

14

M

Figure 2. Friction factor as function of M.

1.0 0 - (Ni1:undse 19)2) Nit .. 10S.000

0

0.&

Eq. 3 (M .. 6.sS) "Univasal" Law

0.6

-

~

~

0.4

.:.

0

Cd ......

0

0.2

0

0

0.2

0.1

0.2

Wu

0.4

0.3

0.6

0.8

1.0

1.2

1.4

uIU

Figure 3. Comparison between (13) and universal law.

APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS

IOC

129

IOC . 101

101

C

IC)1

'"<(!)

10

~ "0

." IC)1 dr 10

••.•••

v~

cradical

by "'UIUYCnIl-!&W

I

0.99

- - Vcloc:ity padicIIt byEq.3

~

U 0

Gi

>

lett letZ

0.2

0.4

0.6

0.11

1.0

rlR Figure 4. Velocity gradients given by (13) and (14). OPEN-CHANNEL FLOW

To study a flow in a wide open channel of width B, can be defined as

~

in (9) (23)

in which y=vertical distance from the channel bed; and D=water depth. For a two-dimensional velocity distribution in a channel which is not "wide," and the velocity distribution is affected by the two sides of the channel cross section, a suitable equation for ~ derived by Chiu and Chiou (1986) is: ~=Y(l-Z) Piexp [PiZ-Y+1]

(24)

ln which (25)

(26)

In (25) - (27) the y-axis is selected such that it passes through the point of maximum velocity; D=water depth at the y-

C.-L. CHIU

130

axis; Bi for i equal to either 1 or 2= the transverse distance on the water surface between the y-axis and either the left or right side of a channel cross section; z=coordinate in the transverse direction; y=the coordinate in the vertical direction; and h=a parameter that mainly controls the slope and shape of velocity distribution curve near the water surface. If h~O, ~ increases monotonically from the channel bed to the water surface. However, if h>O, the magnitude of h is the depth of the point of maximum velocity below the water surface; hence, ~ increases with y only from the channel bed to the point of maximum velocity where ~=~max=1 and, then, decreases towards the water surface. Figure 5 shows the coordinates chosen, along with other variables and parameters which appear in (25)-(27). cSy, cS i and Pi are parameters which vary with the shape of the zero-velocity isovel (i.e. the channel cross-section) and isovels near the boundary (bed and sides). Both cS y and cS i are approximately zero, if the channel cross section is fairly rectangular, and increase as the cross sectional shape deviates from the rectangular, as indicated by Figure 5. For a method to determine these parameters, see Chiu and Chiou (1986). The ~ curves shown in Figure 5 are orthogonal trajectories of ~ curves. The idea of using the ~-~ coordinates in modeling the two-dimensional velocity distribution in a channel cross section is similar to that of using the cylindrical coordinate system in modeling the velocity distribution and other processes in circular pipes. The velocity distribution along the z-axis that passes through the point of maximum velocity, or along a vertical at the middle of a symmetrical cross section in a straight reach, can be represented by (9) with ~ represented by

~ = ----Lexp(l - ----L-) D-h D-h

(27)

which can be obtained by making Z=O and cSy=O in (25). Figure 6 shows the performance of (9) (labeled as Model I) wi th ~ defined by (28), as compared to a set of measured data (Hou & Kuo, 1987). Also shown in Figure 6 for comparison are results of using Model II and III obtained by making N=2 and 3, respectively, in (2) and using it in (1). Model III seems to be the most accurate, but requires a relatively complex method to estimate its parameters. In contrast, the simplicity makes Model I or (9) attractive. However, Models II and III should also be examined for possible applications in certain situations, such as studies concerned with the bed scour and erosion for which an accurate estimation of velocity gradient at the channel bed may be needed.

APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS

'-:'-..----------i-%•

Channel bed

Figure 5. Velocity distribution and N~

~-~

coordinates.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~~--,

.'0·

b

131

0.388 C!)

CD

""eo

Kcasurecl Dau.

c-

....d

t

""

....d

It

Cirau ,

\:aG. 1987)

!bld Itt

!blels t

It

-z:

~: -6.97 -2 -5.89 -l. 6.32xlO -S.06xlO &l: - 7.lwn-l. &,: - -

lIt

-S.4l· -2 -7.18x10_ l 2.93x10_ s -l ••SuO

~ <-

.", "" , .... N

0.00

15

:10

U

"5

(em/s)

60

75.

Figure 6. Comparison of velocity distribution models.

C.-L.CHIV

132

If a set of velocity data is available, the velocity distribution parameters such as M and u max of Model I can be estimated easily by the method of least squares. In practical applications, a simple technique using a graph such as Figure 7, obtained analytically from (9) and (11), could also be useQ to simplify the parameter estimation. If the mean velocity u is known from the through the discharge and the crosssectional area, the parameter M can be determined quickly by simply plotting data points on the graph. The known value of u and the estimated value of M can then be used in (11) to determine the other parameter u max of (9). Also applicable to pipe flows, such a graphical method will be very useful when the umber of velocity samples is small. Figure 7 also shows that, for M greater than about 5 or 6, the (cross sectional) mean velocity occurs at ~=e-l~0.368. The actual location (as measured by y, the vertical distance above the bed) of the mean velocity depends on the relation between ~ and y. For a wide channel, ~ can be approximated by y/Di therefore, the mean velocity occurs at y /D=O . 368. However, for a channel which is not "wide" according to the width-to-depth ratio such that the velocity distribution is affected by the side walls, the maximum velocity tends to occur below the water surface and ~ must be represented by a non-linear function of y such as (23) or (25). Then, the mean velocity occurs at y/D less than 0.368 (Chiu, 1988).

IA,----------------------vnnnTTI~--~--~--~/1

0.1

0.

~

Figure 7. Entropy parameter M and velocity distribution.

APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS

133

Under the analytical framework described above, the shear stress distribution and the secondary-flow velocities can be expressed as functions of M and u max (Chiu and Chiou, 1986). A new equation for the vertical distribution of sediment concentration in open channels can also be derived under the same analytical framework (Chiu and Rich, 1992). The new equation is similar to the well-known "Rouse equation." However, the Rouse equation is based on the Prandtl-von Karman velocity distribution model which gives the velocity gradient to be infinite at the channel bed. It also gives the sediment concentration at y=O to be infinite and is, therefore, not applicable at and near the channel bed. The Rouse equation is being applied only down to a certain finite distance above the channel bed. However, without including the high sediment concentration at and near the bed, the Rouse equation tends to underestimate the mean sediment concentration on a vertical. In contrast, the new equation derived does not have that problem since it is based on (9) as the velocity distribution which gives the velocity gradient to be of a finite value at the channel bed such that the boundary conditions of both the shear stress and sediment concentration can be satisfied. The sediment characteristics (size and concentration) should have effects on the entropy parameter M and, hence, the velocity distribution and other related variables. SUMMARY AND CONCLUSION This paper presents an alternative approach to study of flows in pipes and open channels, that consists of the following elements: The probabilistic formulation of velocity distribution in fluid flows, used to lay the foundation of analytical framework. The variational principle maximizing the entropy, employed to identify the probability law governing the velocity distribution. Deterministic, fluid mechanics principles, used to provide the physical basis for establishing the linkages among hydraulic variables and entropy parameter. A geometrical technique using a special coordinate ~, to build an effective, analytical framework in both the deterministic and probabilistic components of the approach. The system of velocity distribution models derived by

134

C.-L.CHIU

using the approach is capable of describing the one- or mUltidimensional velocity distribution in the entire cross section of a pipe or an open channel, regardless of whether the pipe or the open channel is smooth or rough, and regardless of whether the flow is laminar or turbulent. Under the probability and entropy concepts, u max and M have emerged as two new hydraulic parameters that are useful in studying the flow and various transport processes in pipes and open channels. M is called" entropy parameter," a measure of entropy. u max gives the range of flow velocity in a channel cross section; and, therefore, contains an important statistical information. The relation among M, i1 and u max ' as represented by Eq. 11, is very useful in any hydraulic study. Under the analytical framework developed, it has become possible to establish linkages among various hydraulic variables, that can be used to gain insight into the unobservable interactions among the variables. REFERENCES Blaney, H. F. (1937). "Discussion of "Stable Channels in Erodible MateriaL" by E. W. Lane." Trans., ASCE, Vol. 102. Chiu, C.-L., and Chiou, J.-D. (1986). "Structure of 3-D Flow in Rectangular Open Channels." J. Hydr. Engrg., ASCE, 112(11). Chiu, C.-L. (1987). "Entropy and Probability Concepts in Hydraulics." J. Hydr. Engrg., ASCE, 113(5), 583-600. Chiu, C. -L. (1988). "Entropy and 2-D Velocity Distribution in Open Channels." J. Hydr. Engrg., ASCE, 114 (7), 738-756. Chiu, C.-L. (1989). "Velocity Distribution in Open- Channel Flow." J. Hydr. Engrg., ASCE, 115 (5),576-594. Chiu, C. -L. (1991). "Application of Entropy Concept in OpenChannel Flow Study." J. Hydr. Engrg., ASCE, 117(5), 615-628. Chiu, C.-L., and Murray, D. W. (1992). "Variation of Velocity Distribution along Non-Uniform Open Channel Flow." J. Hydr. Engrg., ASCE, Vol. 118 (7),989-1001. Chiu, C.-L., and Rich, C. A. (1992). "Entropy-Based Velocity Distribution Model in Study of Distribution of SuspendedSediment Concentration." Proc., ASCE National Conf. on Hyd. Engrg., Baltimore, Aug., 1992. Chiu, C.-L., Lin, G.-F., and Lu, J.-M. (1993). "Application of Probability and Entropy Concepts in Pipe-Flow Study." J. Hydr. Engrg., ASCE, Vol. 119(6). Nikuradse, J. (1932). "Gesetzma/Hgkeit der turbulenten Str6mung in glatten Rohren." Forschungsheft. 356. Schlichting, H. (1979). Boundary-Layer Theory, McGraw-Hill Book Co., New York, 596-621. Shannon, C. E. (1948). "A Mathematical Theory of Communication." The Bell System Technical Journal, Vol. 27, October 1948, pp. 623-656.

ASSESSMENT OF THE ENTROPY PRINCIPLE AS APPLIED TO WATER QUALI1Y MONITORING NE'lWORK DESIGN

N.B. HARMANCIOGLU1, N. ALPASLAN1, and V.P. SINGH2 IDokuz Eylul University, Faculty of Engineering Bomova 35100 Izmir, Turkey 2Louisiana State University, Department of Civil Engineering Baton Rouge, lA 70803 U.S.A.

With respect to design of water quality monitoring networks, the entropy principle can be effectively used to develop design criteria on the basis of quantitatively expressed information expectations and information availability. Investigations on the application of the entropy method in monitoring network design have revealed promising results, particularly in the selection of technical design features such as monitoring sites, time frequencies, variables to be sampled, and sampling duration. Yet, there are still certain problems that need to be overcome so that the method can gain wide acceptance among practitioners. The presented study discusses the advantages as well as the limitations of the entropy method as applied to the design of water quality monitoring networks. INTRODUCTION Despite all the efforts and investment made on monitoring of water quality, the current status of existing networks shows that the accruing benefits are low (Sanders et aI., 1983). That is, most monitoring practices do not fulfill what is expected of monitoring. Thus, the issue still remains controversial among practitioners and researchers for a number of reasons. First, there are difficulties in the selection of temporal and spatial sampling frequencies, the variables to be monitored, and the sampling duration. Second, benefits of monitoring cannot be defined in quantitative terms for reliable benefit/cost analyses. There are no definite criteria yet established to solve these two problems. The entropy principle can be effectively used to develop such criteria on the basis of quantitatively expressed information expectations and information availability. This approach is justified in the sense that a monitoring network is basically an information system. In fact, investigations on 135 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 135-148. © 1994 Kluwer Academic Publishers.

136

N. B. HARMANCIOGLU ET AL.

application of the entropy principle in water quality monitoring network design have revealed promising results, particularly in the selection of technical design features such as monitoring sites, time frequencies, variables to be sampled, and sampling duration. There are still certain difficulties that need to be overcome so that the method can gain wide acceptance among practitioners. Some of these difficulties stem from the mathematical structure of the concept. For example, entropy, as a measure of the uncertainty of random processes, has not yet been precisely defined for continuous variables. The derivation of mathematical expressions for multivariate distributions other than normal and lognormal are highly complicated. Other difficulties encountered in the application of the method are those that are valid for any other statistical procedure. As such, the entropy principle requires sufficient data on the processes monitored to produce sound results. However, it is uncertain, particularly in case of messy water quality data, to determine when a data record can be considered sufficient. Another difficulty occurs in assessing monitoring frequencies higher than the already selected ones. The presented study addresses the above difficulties as well as the merits of using the entropy principle in the design of water quality monitoring networks. The discussions are supported by case studies relating to specific design problems such as the selection of monitoring sites, sampling frequencies, and variables. ASSESSMENT OF CURRENT DESIGN METHODOLOGIES In recent years, the adequacy of collected water quality data and the performance of existing monitoring networks have been seriously evaluated for two basic reasons. First, an efficient information system is required to satisfy the needs of water quality management plans. Second, this system has to be realized under the constraints of limited financial resources, sampling and analysis facilities, and manpower. Problems observed in available data and shortcomings of current networks have led researchers to focus more critically on the design procedures used. The early and even some current water quality monitoring practices were often restricted to "problem areas" or "potential sites for pollution", covering limited periods of time and limited numbers of variables to be observed. Recently, water quality related problems and the need for monitoring have intensified so that the information expectations to assess water quality have also increased. This pressure has resulted in an expansion of monitoring activities to include more observational sites and larger number of variables to be sampled at smaller time intervals. While these efforts have produced plenty of data, they have also raised the question whether one "really" needs "all" these data to meet information requirements. Therefore, a more systematic approach to monitoring is required. As a result, various network design procedures have been proposed and used to either set up

ASSESSMENT OF THE ENTROPY PRINCIPLE

137

a network or evaluate and revise an existing one. Current methods of water quality monitoring network design basically cover two steps: (a) description of design considerations, and (b) the actual design process. Although often overlooked, proper delineation of design considerations is an essential step before attempting the technical design of the network. In other words, objectives of monitoring and information expectations for each objective must be specified first (Ward and Loftis, 1986; Sanders et aI., 1983; Whitfield, 1988; Tirsch and Male, 1984). Such design considerations are often presented as general guidelines, rather than as fixed rules to be pursued in the second step of actual design process (Sanders et aI., 1983). The technical design of monitoring networks relates to the determination of sampling sites, sampling frequencies, variables to be sampled, and the duration of sampling. It is only for this actual design stage that fixed rules or methods are proposed. Considerable amount of research has been carried out on the above-mentioned four aspects of the design problem. Sanders et aI. (1983), Tirsch and Male (1984), or Whitfield (1988) provide a comprehensive survey of research results and practices on the establishment of sampling strategies. It is also well recognized that water quality monitoring is a statistical procedure and the design problem must therefore be addressed by means of statistical methods. Accordingly, information expectations from a monitoring system must be defined in statistical terms so that the selection of sampling strategies (i.e., sampling sites, variables, frequencies, and duration) can be accomplished and justified by a statistical approach. The statistical methods employed in selection of spatial and temporal frequencies basically cover regression techniques, analysis of variance methods, standard error criteria, decision theory, and optimization techniques. Selection of variables to be sampled is, however, a more complicated issue, as there are no definite and readily acceptable criteria to guide the eventual decisions. Objectives and economics of monitoring provide the basic guidelines for an overall selection of variables, and regression and multivariate statistical analysis techniques may be used to reduce the number of required variables. Determination of the duration of monitoring is also often treated together with the problem of temporal design. However, the amount of research carried out on the analysis of this problem has been quite insufficient to bring even a reasonable resolution to the issue. Deficiencies related to current design procedures are primarily associated with an imprecise definition of information and value of data, transfer of information in space and time, and cost-effectiveness. The major difficulty associated with these current design methods is related to the lack of a precise definition for "information". They either do not give a precise definition of how information is measured, or they try to express it indirectly in terms of other statistical parameters like standard error or variance. One important consequence of failure to define information can possibly be the interchangeable use of the terms "data" and "information". Although current methods stress the distinction between the two, a direct link between them

138

N. B. HARMANCIOGLU ET AL.

has not yet been established (Harmancioglu et al., 1992b). Another difficulty with current design methods is how to define the value of data. In every design procedure, the ultimate goal is an "optimal" network. "Optimality" means that the network must meet the objectives of the data gathering at minimum cost. While costs are relatively easy to assess, the major difficulty arises in the evaluation of benefits because such benefits are essentially a function of the value of data collected. The value of data lies in their ability to fulfill information expectations. However, how this fulfillment might be assessed in quantifiable terms still remains unsolved. As in the case of information, the value of data has been described indirectly (Dawdy, 1979; Moss, 1976), often by the Bayesian decision theory (Tirsch and Male, 1984). Another criticism of the current design methods relates to how the techniques are used in spatial and temporal design. The majority of current techniques are based on classical correlation and regression theory, which basically constitutes a means of transferring information in space and time. The use of regression theory in transfer of information has some justification. However, regression approaches transfer information on the basis of certain assumptions regarding the distributions of variables and the form of the transfer function such as linearity and nonlinearity. Thus, how much information is transferred by regression under specified assumptions has to be evaluated with respect to the amount of information that is actually transferable. One may refer to Harmancioglu et al. (1986) for the definition and comparison of the terms "transferred information" and "transferable information". To summarize the above discussions, one may state that the existing methods of water quality network design are deficient because of the following specific difficulties: (a) a precise definition of "information" contained in the data and how it is measured is not given; (b) the value of data is not precisely defined, and consequently, existing networks are not "optimal" either in terms of the information contained in these data or in terms of the cost of getting the data; (c) the method of information transfer in space and time is restrictive; (d) cost-effectiveness is not emphasized in certain aspects of monitoring; (e) the flexibility of the network in responding to new monitoring objectives and conditions is not measured and not generally considered in the evaluation of existing or proposed networks. Within this context, a methodology based on the entropy theory can be used for design of efficient, cost-effective, and flexible water quality monitoring networks to alleviate many of the above shortcomings of the existing network design methods (Harmancioglu et at, 1992b; Alpaslan et at, 1992). ENTROPY THEORY AS APPLIED TO MONITORING NE1WORK DESIGN Entropy is a measure of the degree of uncertainty of random hydrologic processes. Since the reduction of uncertainty by means of making observations is equal to the

ASSESSMENT OF THE ENTROPY PRINCIPLE

139

amount of information gained, the entropy criterion indirectly measures the information content of a given series of data (Harmancioglu, 1981). According to the entropy concept as defined in communication (or information) theory, the term "information content" refers to the capability of signals to create communication. The basic problem is the generation of correct communication by sending a sufficient amount of signals, leading neither to any loss nor to repetition of information (Shannon and Weaver, 1949). Each sample collected actually represents a signal from the natural system which has to be deciphered so that the uncertainty about the real system is reduced. Application of engineering principles to this problem calls for a minimum number of signals to be received to obtain the maximum amount of information. Redundant information does not help reduce the uncertainty further; it only increases the costs of obtaining the data. These considerations represent the essence of the field of communications and hold equally true for hydrologic data sampling, which is essentially communicating with the natural system. On the basis of this analogy, a methodology based on the entropy concept of information theory has been proposed for the design of hydrologic data networks. The basic characteristic of entropy as used in this context is that it is able to represent quantitative measures of "information". As a data collection network is basically an information system, this characteristic is the essential feature required in a monitoring network (Alpaslan et al., 1992; Harmancioglu et al., 1992b). The definitions of entropy given in information theory (Shannon and Weaver, 1949) to describe the uncertainty of a single variable can be extended to the case of multiple variables. In this case, the stochastic dependence between two processes causes their total entropy and the marginal entropy of each process to be decreased. The same is true for dependent multi-variables (Harmancioglu, 1981). This feature of the entropy concept can be used in the spatial design of monitoring stations to select appropriate numbers and locations so as to avoid redundant information. On the other hand, the marginal entropy of a single process that is serially correlated is less than the uncertainty it would contain if it were independent. In this case, serial dependence acts to reduce marginal entropy and causes a gain in information (Harmancioglu, 1981). This feature of the entropy concept is suitable for use in the temporal design of sampling stations. The entropy measures of information were applied by Krstanovic and Singh (1993a and 1993b) to rainfall network design, by Husain (1989) to design of hydrologic networks, by Harmancioglu and Alpaslan (1992) to design of water quality monitoring networks, and by Goulter and Kusmulyono (1993) to prediction of water quality at discontinued water quality monitoring stations in Australia. Similar considerations were used for design of data collection systems (Singh and Krstanovic, 1986; Harmancioglu, 1984). In these studies, the entropy concept has been shown to hold significant potential as an objective criterion which can be used in both

140

N. B. HARMANCIOGLU ET AL.

spatial and temporal design of networks. With respect to water quality in particular, the entropy principle can be used to evaluate five basic features of a monitoring network: temporal frequency, spatial orientation, combined temporal/spatial frequencies, variables sampled, and sampling duration. The third feature represents an optimum solution with respect to both the time and the space dimensions, considering that an increase in efforts in one dimension may lead to a decrease in those in the other dimension (HarmanciogIu and Alpaslan, 1992). To determine variables to be sampled, the method can be employed, not to select from a large list of variables but to reduce their number by investigating information transfer between the variables (Harmancioglu et al., 1986, 1992a and b). Assessment of sampling duration may be approached in a number of ways. If station discontinuance is the matter of concern, decisions may be made in an approach similar to that applied in spatial orientation. The problem is much simpler when a sampling site is evaluated for the redundancy of information it produces in the time domain. If no new information is obtained by continuous measurements, sampling may be stopped permanently or temporarily. ADVANTAGES OF THE ENTROPY METHOD IN·DESIGN OF MONITORING NE'IWORKS

The basic role of the entropy method The studies carried out so far show that the method works quite well for the assessment of an existing network. It appears as a potential technique when applied to cases where a decision must be made to remove existing observation sites, and/or reduce frequency of observations, and/or terminate sampling program. The method may also be used to select the numbers and locations of new sampling stations as well as to reduce the number of variables to be sampled (HarmanciogIu and Alpaslan, 1992). On the other hand, the entropy method cannot be employed to initiate a network; that is, it cannot be used for design purposes unless a priori collected data are available. This is true for any other statistical technique that is used to design and evaluate a monitoring network. In fact, the design process is an iterative procedure initiated by the selection of preliminary sampling sites and frequencies. This selection has to be made essentially by nonstatistical approaches. After a certain amount of data is collected, initial decisions are evaluated and revised by statistical methods. It is throughout this iterative process of modifying decisions that the entropy principle works well. Its major advantage is that such iterations are realized by quantifying the network efficiency and cost-effectiveness parameters for each decision made.

ASSESSMENT OF THE ENTROPY PRINCIPLE

141

Measure of information and usefulness of data One of the valuable aspects of the entropy concept as used in network design is its ability to provide a precise definition of "information" in tangible terms. This definition expresses information in specific units (i.e., napiers, decibels, or bits) so that it constitutes a completely quantitative measure. At this point, it is important to note again the distinction between the two terms "data" and "information". The term "data" represents a series of numerical figures which constitute a means of communication with nature; what these data actually communicate to us is "information". This distinction means that availability of data is not a sufficient condition for availability of information unless those data have utility, and the term "information" describes this utility or usefulness of data. Among the various definitions of information proposed to date, the entropy measure appears to be the only one that gives credence to the relevance or utility of data. The value of data can also be expressed in quantitative terms since it is measured by the amount of information the data convey. This observation implies that monitoring benefits may eventually be assessed on the basis of quantitative measures rather than indirect descriptions of information. In comparison with the current methods, the entropy method develops a clearer and more meaningful picture of data utility versus cost (or information versus cost) tradeoffs. 'This advantage occurs because both the information and the costs can be measured in terms of quantitative units. For example, if cost considerations require data to be collected less frequently, the entropy measure describes quantitatively how much information would be risked by increasing the sampling intervals (Harmancioglu, 1984; Harmancioglu and Alpaslan, 1992). By such an approach, it is possible to express how many bits of information would be lost against a certain decrease in costs (or in monetary measures). Similarly, it is possible to define unit costs of monitoring in such terms of the amount of dollars per bit of information. Network efficiency and flexibility Efficiency is related to objectives of monitoring in that the latter delineates "information-expected" from monitoring and the former describes "information produced" by a network. The "information produced" is a function of the technical features of a network related to variables sampled, spatial and temporal sampling frequencies, and the duration of sampling. It is plausible then to define efficiency as the "informativeness" of a network. If the design of the network is such that this information is maximized, then the requirement of efficiency is satisfied. The entropy theory can be used to test whether the supplied information is optimal or not thereby ensuring system efficiency (Harmancioglu and Alpaslan, 1992). A network, once designed and in operation, has to be evaluated for efficiency,

142

N. B. HARMANCIOGLU ET AL.

particularly if the monitoring objectives have been changed or revised. The entropy method may again be used to assess the data collected to determine how much information is conveyed by the network under present conditions. If revisions and modifications are made, their contnbution to an increase in information can be measured by the same method. Within this respect, the entropy theory also serves to maintain flexibility in the network since each decision regarding the technical features can be assessed on objective grounds. Information-based design strategy The entropy theory may be used to set up an information-based design strategy. As noted earlier, the approach for developing the design strategy for efficient and cost-effective networks encompasses two steps: (a) delineation of design considerations, and (b) technical design of the network. The first step is to define objectives of monitoring and information needs associated with each objective. The entropy method can be employed basically for the second step to be combined with cost considerations to realize both informativeness and cost-effectiveness (Harmancioglu et aI., 1992b). This two-stage process can permit the design procedures to be developed to match the information expected from monitoring. Such an approach covers both the "demand" (objectives of monitoring) and the "reaction" (monitoring practices) parts of the problem in an integrated fashion. Both parts can then be defined in terms of "information" as "information needed" and "information supplied". Efficiency and effectiveness of the network can be realized. by matching these two aspects. The "demand" part of the design problem can be addressed by specifying the information expected of each objective of monitoring. The "reaction" portion of the problem covers the more specific questions of any design procedure, such as the selection of variables to be sampled and the selection of temporal and spatial frequencies. Solution of the problems associated with this "reaction" step requires: (a) an extraction information from available data, and (b) a transfer of information among water quality variables with respect to time and space. These two steps are shown to be effectively accomplished by entropy-based measures (Harmancioglu et al., 1986). Cost-effectiveness A major difficulty underlying both the design and the evaluation of monitoring systems is the lack of an objective criterion to assess cost-effectiveness of the network. In this assessment, costs are relatively easy to estimate, but benefits are often described indirectly in terms of other parameters, using optimization techniques, Bayesian decision theory or regression methods (Schilperoort et aI., 1982;

ASSESSMENT OF THE ENTROPY PRINCIPLE

143

Tirsch and Male, 1984). Thus, a realistic evaluation of benefit/cost considerations cannot be achieved since benefits are not directly quantified. Actually, benefits of monitoring can only be measured by means of the information conveyed by collected data; that is, they are a function of the value or worth of data. The concept of entropy can also be used to quantify the benefits of monitoring since it describes the utility of data. Here, benefits of monitoring are expressed as the information supplied which is quantified in tangible units by entropy measures. Costeffectiveness can be evaluated by comparing costs of monitoring versus information gained via monitoring. The issue is then an optimization problem to maximize the amount of information (benefits of monitoring) while minimizing the accruing costs. The technical features of design can then be evaluated with respect to cost effectiveness (Harmancioglu and Alpaslan, 1992). Space-time sampling frequencies and selection of variables

The entropy method measures the information content of available data (extraction of information) and assesses the goodness of information transfer between temporal or spatial data points (transfer of information). These two functions constitute the basis of the solution to the design problems of what, where, when, and how long to observe. Such a solution is based on the maximization of information transfer between variables, space points, and time points, respectively. The amount of information transfer used in such an analysis can be measured by entropy in specific units. The selection of each technical design factor can be evaluated again by means of entropy to define the amount of information conveyed by the data collected by each of the selected monitoring procedures. These evaluations may eventually provide the ability to make quantitatively based rational decisions on how long a gauging should be operated (Alpaslan et at, 1992). Harmancioglu and Alpaslan (1992) demonstrated the applicability of the entropy method in assessing the efficiency and the benefits of an existing water quality monitoring network with respect to temporal, spatial and combined temporal/spatial design features. They described the effect of each feature upon network efficiency and cost-effectiveness by entropy-based measures. For example, the effect of extending the sampling interval from monthly to bimonthly measurements for three variables investigated leads to a loss of information in the order of 20.4% for DO, 32.8% for Cl, and 68.7% for EC, as shown in Fig.l. Here, the selection of an appropriate sampling interval is made by assessing how much information the decision-maker would risk versus given costs of monitoring. A similar evaluation can be made with respect to the number and locations of required sampling sites as in Fig.2, where changes in rates of information gain are investigated with respect to the number of stations in the network. Harmancioglu and Alpaslan (1992) further combined both spatial and temporal frequencies as in Fig.3 to assess the variation

N. B. HARMANCIOGLU ET AL.

144

(0J0)

1.0

0.80

--

00 CI

,;t060

:z:::

.

.... I

><,"0.40

><

EC

I-

0.20

o~~----~----~----~---

234 ot(months)

Figure 1.

Effects of sampling frequency upon information gain about three water quality variables (Harmancioglu and Alpaslan, 1992).

of information with respect to both space and time dimensions. The results of these analyses have shown the applicability of the entropy concept in network assessment. LIMITATIONS OF THE ENTROPY METHOD IN NE1WORK DESIGN

The above-mentioned advantages of the entropy principle indicate that it is a promising method in water quality monitoring network design problems because it permits quantitative assessment of efficiency and benefit/cost parameters. However, some limitations of the method must also be noted for further investigations on entropy theory. As the situation holds true for the majority of statistical techniques, a sound evaluation of network features by the entropy method requires the availability of sufficient and reliable data. Applications with inadequate data often cause numerical difficulties and hence unreliable results. For example, when assessing spatial and temporal frequencies in the multivariate case, the major numerical difficulty is related to the properties of the covariance matrix (Harmancioglu and Alpaslan,

ASSESSMENT OF THE ENTROPY PRINCIPLE

145

DO

EC

O~~2----~3----~4----5~--~6-­

no. of stations

Figure 2.

Changes in rates of information gain (cumulative transinformation/joint entropy) with respect to number of stations (Harmancioglu and Alpaslan, 1992).

.......

...III

.a." 4.700

"

c

~

4.600

~

-2

2

4

3

At(months) Figure 3.

~I

Variation of information with respect to both alternative sampling sites (numbered 2 to 6) and sampling frequencies (Harmancioglu and Alpaslan, 1992).

1992). When the determinant of the matrix is too small, entropy measures cannot be determined reliably since the matrix becomes ill-conditioned. This often occurs when the available sample sizes are very small. On the other hand, the question with respect to data availability is "how many

146

N. B. HARMANCIOGLU ET AL.

data would be considered sufficient". For example, Goulter and Kusmulyono (1993) claim that the entropy principle can be used to make "sensible inferences about water quality conditions" but that sufficient data are not available for a reliable assessment. The major difficulty here arises from the nature of water quality data, which are often sporadically observed for short periods of time. With such "messy" data, application of the entropy method poses problems both in numerical computations and in evaluation of the results. Particularly, it is difficult to determine when a data record can be considered sufficient. With respect to the temporal design problem, all evaluations are based on the temporal frequencies of available data so that, again, the method inevitably appears to be data dependent. At present, it appears to be difficult to assess smaller time intervals than what is available. However, the problem of decreasing the sampling intervals may also be investigated by the entropy concept provided that the available monthly data are reliably disaggregated into short interval series. This aspect of entropy applications has to be investigated in future research. Another important point in entropy applications is that the method requires the assumption of a valid distribution-type. The major difficulty occurs here when different values of the entropy function are obtained for different probability distribution functions assumed for the same variable. On the other hand, the entropy method works quite well with multivariate normal and lognormal distributions. The mathematical definition of entropy is easily developed for other skewed distributions in bivariate cases. However, the computational procedure becomes much more difficult when their multivariate distributions are considered. When such distributions are transformed to normal, then uncertainties in parameters need to be assessed. Another problem that has to be considered in future research is the mathematical definition of entropy concepts for continuous variables. Shannon's basic definition of entropy is developed for a discrete random variable, and the extension of this definition to the continuous case entails the problem of selecting the discretizing class intervals 4x to approximate probabilities with class frequencies. Different measures of entropy vary with Ax such that each selected 4x constitutes a different base level or scale for measuring uncertainty. Consequently, the same variable investigated assumes different values of entropy for each selected Ax. It may even take on negative values which contradict the positivity property of the entropy function in theory. One last problem that needs to be investigated in future research is the development of a quantifiable relationship between monitoring objectives and technical design features in terms of the entropy function. As stated earlier, an information-based design strategy requires the delineation of data needs or information expectations. To ensure network efficiency, "information supplied" and "information expected" must be expressed in quantifiable terms by the entropy

ASSESSMENT OF THE ENTROPY PRINCIPLE

147

concept. At the current level of research, if one considers that the most significant objective of monitoring is the determination of changes in water quality, then the entropy principle does show such changes with respect to time and space. However, future research has to focus on the quantification of information needs for specific objectives (e.g., trend detection, compliance, etc.) by means of entropy measures. CONCLUSION Fundamental to accomplishment of an efficient and cost-effective design of a monitoring network is the development of a quantitative definition of "information" and of the "value of data". Within this context, application of the concept of information in entropy theory has produced promising results in water quality monitoring network design problems because it permits quantitative assessment of efficiency and benefit/cost parameters. However, there are still certain difficulties associated with the entropy theory that need to be overcome so that the method can gain wide acceptance among practitioners. The majority these difficulties stem from the mathematical structure of the concept. Other difficulties encountered in application of the method are those that are valid for any other statistical procedure. These problems need to be investigated further as part of future research on design of networks so that the validity and the reliability of entropy theory can be accepted without doubt. REFERENCES Alpaslan, N.; Harmancioglu, N.B.; Singh, V.P. (1992) "The role of the entropy concept in design and evaluation of water quality monitoring networks", in: V.P. Singh & M. Fiorentino (eds.), Entropy and Energy Dissipation in Water Resources, Dordecht, Kluwer Academic Publishers, Water Science and Technology Library, pp.261-282. Dawdy, D.R. (1979) "The worth of hydrologic data", Water Resources Research, 15(6), 1726-1732. Goulter, I. and Kusmulyono, A. (1993) "Entropy theory to identify water quality violators in environmental management", in: R.Chowdhury and M. Sivakumar (eds.), Geo-Water and Engineering Aspects, Balkema Press, Rotterdam, pp.149-154. Harmancioglu, N. (1981) "Measuring the information content of hydrological processes by the entropy concept", Centennial of Ataturk's Birth, Journal of Civil Engineering, Ege University, Faculty of Engineering, pp.13-38. Harmancioglu, N. (1984) "Entropy concept as used in determination of optimum sampling intervals", Proceedings of Hydrosoft '84, International Conference on Hydraulic Engineering Software, Portoroz, Yugoslavia, pp.6-99 and 6-110. Harmancioglu, N.B., Yevjevich, V., Obeysekera, J.T.B. (1986) "Measures of

148

N. B. HARMANCIOGLU ET AL.

information transfer between variables", in: H.W.Shen et al.(eds), Proc. of Fourth Int. Hydrol. Symp. on Multivariate Analysis of Hydrologic Processes, pp.481-499. Harmancioglu, N.B., Alpaslan, N. (1992) "Water quality monitoring network design: a problem of multi-objective decision making", AWRA, Water Resources Bulletin, Special Issue on "Multiple-Objective Decision Making in Water Resources", vol.28, no.1, pp.1-14. Harmancioglu, N.B.; Singh, V.P.; Alpaslan, N. (1992a) "Versatile uses of the entropy concept in water resources", in: V.P. Singh & M. Fiorentino (eds.), Entropy and Energy Dissipation in Water Resources, Dordecht, Kluwer Academic Publishers, Water Science and Technology Library, pp.91-117. Harmancioglu, N.B.; Singh, V.P.; Alpaslan, N. (1992b) "Design of Water Quality Monitoring Networks", in: RN. Chowdhury (ed.), Geomechanics and Water Engineering in Environmental Management, Rotterdam, Balkema Publishers, ch.8. Husain, T. (1989) "Hydrologic uncertainty measure and network design", Water Resources Bulletin, 25(3), 527-534. Krstanovic, P.F. and Singh, V.P. (1993a) "Evaluation of rainfall networks using entropy:I.Theoretical development", Water Resources Management, v.6,pp.279-293. Krstanovic, P.F. and Singh, V.P. (1993a) "Evaluation of rainfall networks using entropy: II.Application", Water Resources Management, v.6,pp.295-314. Moss, M.E. (1976) "Decision theory and its application to network design", Hydrological Network Design and Information Transfer, World Meteorological Organization WMO, no.433, Geneva, Switzerland. Tirsch, F.S., Male, J.W. (1984) "River basin water quality monitoring network design", in: T.M. Schad (ed.), Options for Reaching Water Quality Goals, Proceedings of 20th Annual Conference of A WRA, AWRA Publ., pp.149-156. Sanders, T.G., Ward, Re., Loftis, J.e., Steele, T.D., Adrian, D.D., Yevjevich, V. (1983) Design of networks for monitoring water quality, Water Resources Publications, Littleton, CO, 328p. Schilperoot, T., Groot, S., Wetering, B.G.M., Dijkman, F. (1982) Optimization of the sampling frequency of water quality monitoring networks, Waterloopkundig Laboratium Delft, Hydraulics Lab, Delft, the Netherlands. Shannon, e.E. and Weaver, W. (1949) The Mathematical Theory of Communication, The University of Illinois Press, Urbana, Illinois. Singh, V.P. and Krstanovic, P.F. (1986) "Space design of rainfall networks using entropy", Proc., International Conference on Water Resources Needs and Planning in Drought Prone Areas, pp.173-188, Khartoum, Sudan. Ward, Re., Loftis, J.e., (1986) "Establishing statistical design criteria for water quality monitoring systems: Review and Synthesis", Water Resources Bulletin, AWRA, 22(5), 759-767. Whitfield, P.H. (1988) "Goals and data collection designs for water quality monitoring", Water Resources Bulletin, A WRA, 24(4), 775-780.

COMPARISONS BETWEEN BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE J. N. KAPUR!, H. K. KESAVAN 2, and G. BACIU3 1 Jawaharlal Nehru University New Delhi, INDIA 2Department of Systems Design Engineering, University of Waterloo Waterloo, Ontario, CANADA N2L 3Gl 3Department of Computer Science, HKUST Clear Water Bay, Kowloon, Hong Kong Four methods of statistical inference are discussed. These include the two well known non-entropy methods due to Fisher and Bayes and two entropic methods based on the principles of maximulI} entropy and minimum cross--entropy. The spheres of application of these methods are elucidated in order to give a comparative understanding. The discussion is interspersed with illustrative examples.

INTRODUCTION Maximum entropy and minimum cross-entropy principles provide methods distinct from the classical methods of statistical inference. In this context the following questions naturally arise: • What is statistical inference? • What are the classical methods of statistical inference? • How do these methods compare with entropic methods of statistical inference? • When should one use entropic rather than non-entropic methods? The answers to these questions are related to the age old controversy arising from the two methods of non-entropic statistical inference: (1) Bayesian and (2) non-Bayesian. There are strong arguments for and against both of these methods of inference. The object of the present paper is to shed some light on these fundamental questions, from the vantage point of the entropic methods of inference.

What is statistical inference? The scope of this vast subject is summarized in the following categories for purposes of highlighting the entropic methods of inference: • it is concerned with drawing conclusions on the basis of noisy data, i.e., data which is influenced by random errors. • since it is probabilistic, its nature depends upon our concept of probability itself. • procedures are inductive, and as such, they depend on the axioms that are introduced to make deductive reasoning possible. • it deals with methods of inference when only partial information is available about a system. Since data can be made available in many different forms, since we have the subjective and objective concepts of probability, since statistical inference is inductive, and since we can assume many different axioms of induction, it is not surprising that we have many different methods of statistical inference. Each method tries to capture some aspect of statistical truth contained in a given set of data. The 149 K. W. Hipel etal. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 149-162. © 1994 Kluwer Academic Publishers.

150

J. N. KAPUR ET AL.

different approaches can. however. supplement one another to enlarge the scope of investigation. It is th<'fefore essential that users of statistical inference, irrespective of the discipline they belong to, understand the differences and similarities between the various types of statistical inference, without being overly conccrned about the doctrinaire cont roversies that beset the different groups. Over the course of its evolution the discipline of statistics has becB divided into two views: flayt:;wn. and lIoII-Baytsian or the frequcnllsl vicw. 1\lore reccntly, however. st.atistical inference has been enriched within the framework of the principles of maximum elltropy or mmlmum crossentropy. These principles provide a more unified foundation for the problem specification and the criteria for obtaining a meaningful solution. Since the Bayesian and the frequentist schools are well established, the newer methodologies tend to be classified with respect to these two, either as Bayesian or as non-Bayesian. However, entropic methods represent a distinct class of statistical reasoning. Nevertheless, the followers of the entropic methods feel closer to the Bayesians despite t.heir claim for a separate identity. In the next sections, the similarities and differences of the following four types of statist.ical inference are discussed: (i) classical, traditional, or orthodox stat.istical inference; (ii) Bayesian inference; (iii) inference based on maximum entropy principle (MaxEnt); and (iv) inference based on minimum cross-entropy principle (MinxEnt). Furthermore, the conditions under which each is most appropriate, and the circumstances under which these can supplement one another are considered.

Classical statistical inference This approach is based on the frequentist view of probability, and on the use of sampling distribution theory. Every observed value is regarded as a random sample from some population, and the object is to draw conclusions about the population from the observations given in the data. For instance, one may specify that the population is normal with two parameters, namely the mean and variance, and may construct functions of observations Xl, X2, ... , Xn to estimate these parameters. Since different functions can be used to estimate the same parameter, some well-established criteria are needed for good estimators, such as consistency, efficiency, and sufficiency. Furthermore, one needs a deductive procedure such as the method of maximum likelihood to provide good estimators. The parameters are regarded as fixed numbers and confidence intervals are constructed based on observations, in which the parameters are expected to lie with different degrees of certainty. We even form null hypotheses about the parameters which are then accepted or rejected. This decision is taken on the basis of observations. In this process, either a correct hypothesis may be rejected or a wrong hypothesis may be accepted resulting in an error of first or second type, respectively. We design estimation procedures which minimize both types of errors, or minimize the second type of error when the first type of error is kept at a fixed level. We also wish to design experiments to give us data on which statistical analysis can be performed effectively and efficiently. Sometimes, the population is not specified with a parametric density function and non-parametric methods of statistical inference are developed in order to estimate the density function directly from the data.

Bayesian inference This approach differs from the classical approach in the sense that, in addition to specifying a population density f(x, 0), where X and () may be scalars or vectors, and having observations Xl, X2, ... , X n , it also assumes some a priori distribution for the parameter O. It assumes that there is a specific a priori distribution which the parameter follows. This method then proceeds to use the observations Xl, X2, ... , Xn to update the knowledge of the a priori distribution resulting in an a posteriori distribution which incorporates the knowledge of the data. If more independent observations are obtained, the a posterIOri distribution, obtained as a result of the first set of observations, may be treated as an a priOri distribution for the next set of observations resulting in a second a posteriori distribution.

BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE

151

The assumption of an a priori distribution of (J and the continuous updating of this distribution in the light of independent observations are essential features of Bayesian inference. The traditionalists assert that (J is a fixed number and consequently, its probability distribution IS an inadmissible consideration. The Bayesians counter the objection by saying that the probability distribution of (J is not to be understood in the relative frequency sense; in fact, it is not the true value of (J that is being discussed, but rather our perception of this value. This perception changes as our knowledge based on observations increases. The probability distribution of (J depends on the observations and it is our objective to find this distribution. In fact, not assuming any probability distribution for (J is very often equivalent to assuming that all values of (J are equally likely, or that (J has a uniform, regular or degenerate, distribution. By not making a statement about the a priori distribution, we may in fact be making a statment about it. In this process, we may restrict our choice drastically. If our knowledge of the situation, prior to our getting the data, warrants an a priori distribution for e, it should be used. In some sense, we may say that, as the amount of Bayesian a priori information decreases, the Bayesian inference methodology approaches the classical inference methodology. From this point of view, the classical inference may be regarded as a limiting form of Bayesian inference. Bayesian and non-Bayesian methods of statistical inference are conceptually quite different, although in many situations, they may give the same results. There are many situations in which Bayesian methods are better geared to provide answers than the classical methods. Bayesian inference uses informative priors, while classical inference uses noninformative priors. The uniform distribution gives a non-informative prior, but other non-informative priors do exist also.

Maximum entropy statistical inference In this approach, there is no assumption of any a priori knowledge about the distribution. Inferences about probability density functions are made on the basis of knowledge of moments of the distribution. There may be many probability distributions consistent with the information given in the moments. The distribution sought is that which has the maximum entropy, or uncertainty. This approach differs from the traditional approach in the sense that instead of presuming the knowledge of Xl> X2, ••• , X n , it presumes the knowledge of expected values of some functions like n

Ep;gr(X;),

;=1

r

=1,2, .. . ,n.

(1)

It can be shown however, that if we assume a density function !(X, e) and further assume only the knowledge of the random sample Xl> X2, .•. , Xn , then the principle of maximum entropy leads to

the principle of maximum likelihood for estimating e. In this special case, it should lead to the same results as in the traditional theory. On the other hand, the traditional theory is of no help if knowledge is available in the form of moments (or expected values of some functions) only. This differs from the Bayesian approach in the sense that it assumes no prior distribution either for the parameter or for the random variable. In fact, one of its goals is to generate a priori distributions or density functions which can later be used in tandem with Bayesian inference.

Minimum cross-entropy statistical inference This approach differs from the classical approach in that knowledge of moments is used, rather than knowledge of sample values Xl, X2,"" Xn · It also presumes knowledge of an a priori distribution. It is different from the Bayesian approach due to the use of knowledge of moments. Although it presumes an a priori distribution, this distribution is not the parameter distribution, but rather of the density function itself. However, like the Bayesian method, it continuously updates the knowledge of the density function. It is unlike the maximum entropy approach in the sense that an a priori knowledge of the density function is presumed. Recall that in the maximum entropy approach, either no a priori density function is assumed to be known, or equivalently, the prior distribution is taken to be the uniform distribution. Thus, it can be observed that, as the a priori distribution approaches the uniform distribution, the minimum cross-entropy distribution approaches that of the maximum entropy distribution.

J. N. KAPUR ET AL.

152

SOME EXAMPLES OF BAYESIAN INFERENCE Some examples of the use of Bayesian inference are given below, and then these examples are used to illustrate the similarities and differences between various methods of statistical inference. In Bayesian inference, the following are given: (1) the a priori density function for a population parameter, and (2) a random sample from the population. Bayes's theorem is used to find the a posteriori probability distribution of the parameter, on the basis of the information provided by the random sample. In the following examples notes are provided for each example to bring out the,finer points of the method. These notes are relevant to our discussions of the comparative roles of the various methods of statistical inference. Problem 1: Given that the mean, m, of a distribution is an N(mo, (15) variate, and given a random sample Xl, X2,"" Xn from the population N(m, (12), where mo, (15, and (12 are known, find the a posteriori distribution for the mean of the distribution. Solution 1: Using Bayes's theorem, viz, the a posteriori density is proportional to the product of the a priori density and likelihood function, the a posteriori probability density function is obtained n l(m-mo)2 - -lL: (Xi-m)2] Cexp [ -2 (12 2 (12

°

;=1

n) + m (mo =C exp [12(1 --m -(12 + -(12 -er + -niX)] , 2 er ° ° I

2

2

(2)

so that the a posteriori probability distribution is N (ml' ern, where

mo

ml=

iX

-+-er5 er 2 /n 1 1 -+-er5 er 2 1n

and

lin

-. err -- -er5+(12

(3)

Remarks: • As era -- 00, i.e., as the a priori distribution tends to the degenerate uniform distribution, the influence of the a priori distribution tends to disappear and ml -- iX and er5 __ er2 In, which are the classical results. This result illustrates equation the limiting case of the minimum cross-entropy when the a priori distribution is the uniform distribution. • On the other hand, if ero is small, then the a priori distribution dominates and the data make only relatively small contributions to ml and err. In fact, it is this fear of 'dominance of the a priori' which deters many people from using Bayesian methods. • If a further random sample, a posteriori distribution has

Yl,

Y2, ... , Yr, is obtained from the distribution, then the final

(4)

and

lin

r

--+er~ - (15 er 2+er-2 .

(5)

Thus, the final distribution is the same as that obtained from N(mo, (15) with a random sample Xl, X2, ... , Xn; Yl, Y2,' .. , Yr· This is an important feature of Bayesian inference, viz, that the final result is independent of the number of stages in which updating is done, so long as the total information in all stages combined is the same. Problem 2: Given that the parameter p of a binomial distribution is a B( a, b) variate and n independent Bernoullian trials have resulted in r successes, find the a posteriori probability distribution distribution for p.

BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE

153

Solution 2: Using Bayes's theorem, an a posteriori distribution is obtained n

Gp o-l(l_ p)b-l IIp'''(I- p)l-.. " s=1

where

Xi

(6)

= 1 or 0, according to whether there is a success or a failure in the ith trial, so that Xl

+ X2 + ... + Xn

=r,

(7)

and the a posteriori density is

1 po+r-l(1 _ p)b+n-r-l. B(a+r,b+n-r)

(8)

If m additional independent Bernoulli trials are performed and s successes are obtained, then the a posteriori density would be

1 po+r+'-l(1 _ p)b+n+m-r-.-l B(a+r+s,b+n+m-r-s) ,

(9)

which again verifies that the final result is the same as if m + n trials had been done at the same time. Again, if a 1 and b 1, i.e., if the a priori distribution is the regular uniform distribution, equation (8) gives 1 r(1 )n-r r(1 )n-r (10) B(r+l,n-r+l)P -p =ncrP -P ,

=

=

which is the classical result. Remarks:

• The elegance of the results in these two examples depends upon our ability to make an a priori choice of "conjugate a priOM" distribution. However, the following results are independent of this choice: 1. The Bayesian result would approach the classical result as the a priori distribution ap-

proaches the uniform distribution. 2. The final Bayesian a posteriori probability distribution is the same whether observations are done in stages or are all done simultaneously. • However, if a distribution like the Poisson distribution is considered as the a priori distribution, it will not be easy to get the classical results in the limit, since the Poisson distribution does not approach the uniform distribution for any value of its parameter. Problem 3: Let the observation equation be

y=Ax+e,

(11)

where y is the m x 1 observed vector, A is a known m x n matrix, X is an n x 1 vector to be estimated, and e is an m x 1 error vector which follows an N(O, R) distribution, where R is a non-singular m x m matrix. Let X have an a priori N(mo, Eo) distribution. Find its a posteriori distribution in the light of the observation vector y. This is the well-known state estimation problem which has many engineering applications.

Solution 3: The a posteriori density function is proportional to exp [-~(X - mo?Eol(x - roo) -

= G' exp{ -~ [XT(Eol

~(AX - y)T R-l(AX - y)j

+ AT R- l A)x - xT(Eolmo + AT R-ly)

- (m;rEol + yT R- l A)x+ rooEolmo + yT R-lyj},

(12)

J. N. KAPUR ET AL.

154

so that the new distribution is N(mb Ed, where

Ell and

= EOl + AT R- l A

(13)

=

(14)

z = Bx+f,

(15)

Ellml Eolmo + AT R-ly. If a further observation equation is available where f is an error vector with a N(O, S) distribution, the following is obtained

E2l and

E2lm2

= Ell +BTS-1B = EOl +ATR-1A+BTS-1B

(16)

=Ellml + BTS-1B =Eolmo + ATR-ly+ BTS-1Z,

(17)

which again illustrates the updating property. Remarks:

• As R- l goes to the zero matrix, El - Eo and ml - mo. As EOl goes to the zero matrix, Ell _ AT R- l A and Ellml _ AT R-ly, which are the classical results. • If m #; n, then AT R-l A is a singular matrix, and El does not exist in the strict sense, although it may exist in the generalized inverse matrix sense. However, EOl +AT R-l A is not a singular matrix. In this case, the Bayesian solution exists, but the non-Bayesian solution does not even exist except in a special sense. This example shows the need for the Bayesian approach to the solution of this important problem.

=

• This example gives us the Bayesian solution to the problem of state-space estimation. If m n, the solution is obtained without assuming any a priori distribution. Of course, the solution is also obtained by assuming an !l priori distribution. The solution is well-known. • However, the present approach also gives the solution when m < nor m > n. Of course, the solution will depend upon the a priori distribution assumed. Its influence can be reduced by taking the a priori distribution to be nearly uniform, but not exactly uniform. This example also gives a method of updating the estimates as more observations become available. • As a special case when m = 1 and A = (l/n,l/n, ... ,I/n)T, it gives a method of estimation when only the mean is prescribed. • The model, equation (11), is called the general linear model, and includes models associated with polynomial regression, multiple regression, analysis of variance, randomized block designs, incomplete block designs, and factorial designs. • Equation (11) can be written as i= 1,2, ... ,m. :1:1, :l:2,···,:l:n are regression coefficients, ail, ai2, ••. , ain is the ith input, and output. If rank A n and R (T2 I, then the a posteriori density is

=

=

(~) n (T-n exp [- 2!2 (y -

and It then follows that

Yi

is the ith

AX)T(y - AX)]

= (~r (T-n exp [- 2!2 [(X where

(18)

X)T AT A(x - x) + IIS21] ,

(19)

lI=m-n

(20)

y=Ax.

(21)

BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE

155

1. ic is sufficient for x, if 0"2 is known. 2. ic and S2 are jointly sufficient for (x, 0"2).

3. ic is a N [x, 0"2 (AT A)-I] variate. 4. I/s 2 /0"2 is distributed independently ofic as X2 •

We have discussed the more general case when R is any m x m non-singular matrix and the rank of A is not necessarily equal to n.

EXAMPLES OF MINIMUM CROSS-ENTROPY STATISTICAL INFERENCE Kullback's principle of minimum cross-entropy is advantageous when there is not enough information to determine an a posteriori probability distribution uniquely from the moments alone. In Bayesian inference, Bayes's theorem yields a unique a posteriori distribution in the first instance. However, both the a priori and a posteriori probability distributions in the minimum cross-entropy method are distributions of the variates themselves, rather than of the parameters as in Bayesian inference. We now illustrate the similarities and the difference with the following examples. Problem 4: Given that a random variate has an a priori distribution N(mo, 0"2) and that its mean is m, find its a posteriori distribution. Now, given in addition that its variance is also known and is 0"2, find its new a posteriori distribution. Given further that E[(x - m)4] b4, find the third a posteriori distribution and calculate entropies and cross-entropies.

=

Solution 4: Here go(x) =

=1

y27r0"0

1

Minimizing

00

subject to

i:

-00

f(x)dx

results in

f(x)

[ 1 (x - mo)2] exp --2 2 0"0

(22)



f(x)

f(x) In -(-) dx,

i:

(23)

go x

=1

and

= gl(X) = ~

y27r0"0

exp

xf(x)dx

[_~ (x 2

= m,

;n)2] .

0"0

(24)

(25)

Thus our first a posteriori distribution is N(m,0"5). Again, minimizing

1

00

-00

I(x) I(x) In - (-) dx, gl

(26)

X

subject to and results in

I(x)

(27)

=g2(X) = _1_ exp [_~.!.-(x_--::m,-')~2] v"f; 2 0"

0- 2

(28)

so that the second a posteriori distribution is N (m, 0"2). It can be observed that this distribution is independent of both mo and 0"5. Again, minimizing

1

00

-00

I(x) I(x) In - (-) dx, g2 X

(29)

J. N. KAPUR ET AL.

156

subject to and we obtain

f(x)

(30)

= Aexp [-i (x ~;n)2 -

A(X - m)4] ,

(31)

where A and A are determined by equations (30). This distribution will be different from the normal distribution unless b4 = 3114 • Remarks: • In example 1, the a priori distribution of m was given; here, the a priori distribution of x is given. • In example 1, Bayes's theorem was used; here, the principle of minimum cross-entropy is used. • In example 1, information was supplied in terms of random values Xl, X2,"" Xn ; here information is supplied in terms of knowledge of the mean, variance, or other moments. • Continuous updating is done in both cases. • Example 4 shows that, as more and more observations are obtained, the uncertainty about. the mean of the population decreases. In fact, even with a single set of observations, as n -> 00, 111 -> 0 and entropy goes to its minimum value of -00. It also demonstrates that if information is given in the form of different values for the moments already given in the a priori distribution, then the uncertainty may decrease, or may in fact increase. However, if the information is about different moments, then those involved in the a priori distribution, the uncertainty is likely to decrease and will certainly not increase. • Some cross-entropies of interest are

1 1

00

-00

00

-00

gl(X) In gl«X» dx go x

=

g2(x)ln g2«X» dx go x

=

~(m-mo?

2

1 (m - mo)2

-2

(32)

115

2

110

112 -

115

+--2-' 110

(33)

In spite of the additional information, the second cross-entropy may be either smaller or larger than the first. Some entropies of interest are

-1: -1:

-1:

go(x)lngo(x)dx

1 = 21n (21TeI10)

(34)

gl(X) Ing1(X) dx

1 = 2 1n(21TeI10)

(35)

g2(X) Ing2(x) dx

1 = 2 In (21Tel1).

(36)

Thus the entropy does not change when the mean alone is changed, but it does change when the variance is changed, although it may either increase or decrease depending on whether 11 is greater than or less than 110. On the other hand, entropies of the distributions obtained in example 1 are and and

(37)

(38)

so that, in this case, the entropy goes on decreasing as more and more information becomes available.

157

BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE

=

Problem 5: Let X be an N(rno, Eo) variate and let it be given that E[Ax] y and that the covariance matrix of Ax is R. Find the minimum cross-entropy distribution for x. Solution 5: We must minimize the cross-entropy,

J J J

subject to

I(x) I(x) In g(x) dx,

(39)

I(x)dx= 1,

(40)

(Ax - Y)/(x) dx

J

and

= 0,

(Ax - y)(Ax - y)T I(x) dx

(41)

= R.

(42)

This gives I(x)

=

exp [-~(X

- molEol(x -

rno) - Ao

1

T - AT (Ax - y)I - 2(AX - y) D- I (Ax - y) ,

(43)

where Ao is a scalar, A is an m x 1 vector, and D is an m x m matrix. These must be determined using equations (40), (41), and (42). These equations are sufficient to determine all the Lagrange multipliers. This distribution is still multivariate normal, but it is different from the one obtained by Bayesian inference in example 3 because the distribution there does not satisfy the constraints used here. The problems solved are different, the methods of attack are different, and as such the solutions are bound to be different in spite of the fact that the a priori probability distributions are the same.

Remarks: • In examples 3 and 5, though the problems look similar, these are not exactly the same. The methods of attack are different and the solutions are different. • In example 3, it is required that the standardized likelihood satisfies the given constraints. • In example 5, on the other hand, we want the a posteriori distribution to satisfy these constraints. • The standardized likelihood is a conditional probability distribution while the a posteriori distribution is not. • The constraints give a unique likelihood and a unique Bayes's a posteriori distribution. However, the constraints alone do not give a unique a posteriori distribution and the minimum cross-entropy principle must be used in order to obtain a unique probability distribution.

Problem 6: A box contains n + 1 balls which can be either white or black, but there is at least one ball of each color. As such, there can be n possible hypotheses, HI, H2, ... , Hn where Hj is the hypotheses that there are i white balls and N + 1 - i black balls. The a priori probabilities of these (qlo Q2,"" qn). Now, let a ball hypotheses being true are given by the probability distribution q be drawn from the box and let it be white. Find the a posteriori probabilities of the n hypotheses, {P(Hj IE), i 1,2, ... , n}, where E is the event that the ball drawn is white.

=

=

J. N. KAPUR ET AL.

158

Solution 6: Using Bayes's theorem,

P(H;)P(E I H;)

= L:i=l P(Hj )P(E IHj)'

P(H; I E) Now,

P(H;)

= q;

i=1,2, ... ,n

i P(EIH')=• n+l'

and

(44)

(45)

so that

(46) Remarks: • We may consider a parameter, (J, which takes values 1,2, ... , n according to which of the hypotheses 1,2, ... , n are true. Given the a priori probability distribution of (J, its a posteriori probability distribution is found. • Here, the information given was sufficient to determine the conditional probabilities and a posteriori probabilities. Problem 1: In the previous example, it is given that the a priori probabilities are ql, q2, ... , qn, and that the mean number of white balls observed is m. Find the a posteriori probability distribution. Solution 1: If p

= (pi, P2, ... , Pn) is the a posteriori probability distribution, then it is given that n

Lip; =m.

and

(47)

;=1

Unlike the situation in example 7, this information is not sufficient to determine p uniquely. We therefore appeal to the principle of minimum cross-entropy and minimize a suitable measure of entropy subject to equations (47). Minimizing the Kullback-Leibler measure, we get i

= 1,2, ... ,n,

and

a Liq;b;

(48)

where a and b are determined by the equations n

a Lq;b;

n

=1

;=1

= m.

(49)

i=l

This result gives an a posteriori distribution different from equation (46). However, minimizing the Havrada-Charvat measure of cross-entropy of second order,

(50) subject to equation (47), we get

Pi

= q;(c + di),

(51)

where c and d are determined by n

c+dLiq; i=l

=1

and

C

L iq; + d L i q; = m. n

n

i=l

;=1

2

(52)

If

(53)

159

BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE

then

c=O

and

d=

(

n

~iq; .=1

so that P;

=iq; / t

;=1

)-1 ,

(54)

(55)

iq;,

which is the same as equation (46).

Remarks: • In example 6, there is enough information to find p uniquely. • In example 7, the knowledge of the mean alone does not give an unique a posteriori distribution. However, using the minimum cross-entropy principle, the same a posteriori distribution as that given by Bayes's theorem can be obtained, provided we 1. take the prescribed value of the mean to be the value given by Bayes's theorem, and 2. use the Havrada-Charvat measure of cross-entropy of second order.

• In general, using the generalized minimum cross-entropy principle, we can say that we can always find an appropriate measure of cross-entropy and an appropriate set of constraints such that when this measure is minimized subject to these constraints, the same a posteriori distribution is obtained as is given by Bayes's theorem. • If the Kullback-Leibler measure is required, the geometric mean can be prescribed so that the given information is n

L:p; ;=1

=1

n

and

L:p;lni = lnq.

(56)

;=1

Minimizing the Kullback-Leibler measure subject to constraints (56), we obtain

p; where e and

f

=q;eiJ ,

(57)

are determined by n

e L:q;iJ ;=1

=1

n

and

e L:q;iJ Ini ;=1

=lng,

(58)

so that (59) If the distribution given by equation (46) is required, then

~q;ilni/ ~iqi =lng, which is the mean of the distribution obtained using Bayes's theorem.

(60)

J. N. KAPUR ET AL.

160

AN OUTLINE OF MAXIMUM ENTROPY STATISTICAL INFERENCE Here it is not assumed that any a priori knowledge exists and we must make inferences about the probability density function on the basis of knowledge of some moments of the distribution. There may be many probability distributions consistent with the information of the moments. We seek that distribution which has the maximum uncertainty, or entropy. Maximum-entropy statistical inference is a special case of the minimum cross-entropy principle, when the a priori distribution is uniform, regular, or degenerate. However, conceptually, this is a different principle, since it is based on the concept of uncertainty, rather than on the concept of "distance" or "divergence" from an a priori distribution. The entropy of the a posteriori distribution is always less than the entropy of the a priori (i.e., uniform) distribution. This inference has an updating property. i.e., if we are given some moments, (61) n

=

r=I,2, ... ,m LPi9r(Zi) ar , i=1 then by maximizing the entropy subject to these constraints, we get

(62)

(63)

Now, suppose that we are given the additional moment constraints, n

LPi9r(Zi) i=1

=ar ,

r= m+ l,m+2, ... ,m+ 11:,

(64)

then the maximum-entropy distribution Pi

=

exp[-/Jo -/J191(Zi) -/J292(Zi) - ... -/Jm9m(Zi) -/Jm+l9m+l (Zi) - ... -/JmH9mH(Xi)],

(65)

is obtained, where the /J's are determined using by the constraints (61), (62), and (64). Next, we shall start with equation (65) as the a priori distribution and use constraints (61) and (64) to get the a posteriori distribution

PI =

exp[-Ao - A191(Zi) - A292(Zi) - ... - Am9m(Zi)] x exp [-11m - l Im+19m+l(Zi) - l Im+29m+2(Zi) - ... -

IImH9mH(Zi)],

(66)

where the A's and the II'S are obtained using constraints (61), (62), and (64). Since equations (65) and(66) are of the same form, and since the multipliers are determined by the same constraints, the final probability distribution is the same so that Pi PI for all i. Also, the entropy of the final distribution, (Pt.P2, ... ,Pn) is less than or equal to the entropy of the intermediate distribution, (P1, P2, ... , Pn), since (P1, P2, ... , Pn) has greater entropy than any other distribution which satisfies these constraints, and since (Pt. P2,"" Pn ) is one such distribution. As such, the principle of gain in information continues to hold for both maximum entropy and minimum cross-entropy principles. However, there will be information gain when old constraints continue to hold and additional constraints, linearly independent of the earlier ones, are imposed. Under these conditions, there will be a positive information gain and the uncertainty will increase. If additional independent constraints are not imposed, and we only give information changing the values of the moments given earlier, the entropy can, in fact, increase.

=

BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE

161

CONCLUSIONS A Bayesian approach to statistical inference implies an initial "opinion" in the form of an a priori probability distribution. Then, it uses the available "evidence" in the form of knowledge of a random sample or of some moments to obtain a final "opinion" in the form of a posterior probability distribution. In this sense, all our first three methods follow the Bayesian approach. In Bayesian inference we are given a density function f(z, 9). We start with an a priori distribution for 9, use the values of the random sample to construct a likelihood function, and then, use Bayes' theorem to obtain the a posteriori probability distribution for 9. In the minimum cross-entropy inference, we are given the a priori distribution for the random variates. We use the evidence in the form of values of some moments to get the a posteriori probability distribution via Kullback's minimum cross-entropy principle [Kullback and Leibler, 1951, Kullback, 1959]. In the maximum entropy inference, we are not given any initial opinion about the probability distribution. We use Laplace's principle of insufficient reason and assume that the a priori probability distribution is uniform. Then, we proceed as in the minimum cross-entropy approach. In the Bayesian approach, only the density function f( z, 9) is considered and no other prior opinion. The evidence is in the form of a random sample for the population and the final opinion is in the form of a Dirac delta function for 9. Thus, there is a great deal of commonality between the four methods. The principles of maximum entropy and minimum cross-entropy have been explored in detail in [Kapur, 1989, Kapur and Kesavan, 1989] [J.N.Kapur and H.K.Kesavan, 1990, Kapur and Kesavan, 1992], [H.K.Kesavan and J.N.Kapur, 1989], and [H.K.Kesavan and J.N.Kapur, 1990b, H.K.Kesavan and J.N.Kapur, 1990a] where some other aspects of statistical estimation are also given. Methods of estimating non-parametric density function by using maximum entropy principle have been discussed by [Theil and Fiebig, 1984] for both the univariate and multivariate cases. Earlier, the discussion of [Campbell, 1970] on the equivalence of Gauss' principle and Minimum Discrimination for estimation of probabilities illustrates the interaction between entropic and nonentropic methods of inference. The principle of maximum entropy can also be used to derive maximum entropy priors for use in Bayesian estimation. In a given a density function f(z,9), maximum entropy prior is that density function f(x, 9) for which the entropy

-JJ

P(9)f(x, 9) In(P(9)f(x, 9»d9dz

(67)

is maximum. The principle is closely related to maximum data information priors discussed by [Zellner, 1977].

ACKNOWLEDGEMENTS This work was possible due to the financial support in the form of grants from the Natural Sciences and Engineering Research Council of Canada and the Province of Ontario's Centres of Excellence Programme.

REFERENCES Burg, J. (1972). "The Relationship between Maximum Entropy Spectra and Maximum likelihood Spectra". In Childrers, D., editor, Modern Spectral Analysis, pages 130-131. M.S.A. Campbell, L. L. (1970). "Equivalence of Gauss's Principle and Minimum Discrimination Estimation of Probabilities". Ann. Math. Stat., 41, 1011-1013. Cramer, H. (1957). "Mathematical Methods of Statistics". Princeton University Press. Fisher, R. (1921). "On the Mathematical Foundations of Theoretical Statistics". Phil. Trans. Roy. Soc., 222(A), 309-368. Fougere, P., editor (1990). "Maximum Entropy And Bayesian Methods Proceedings of the 9th MaxEnt Workshop". Kluwer Academic Publishers, New York.

162

1. N. KAPUR ET AL.

Goel, P. K. and Zellner, A., editors (1986). "Bayesian Inference and Decision Techniques". NorthHolland, Amsterdam. Grandy, W. T. J. and Schick, L. H., editors (1991). "Maximum Entropy and Bayesian Methods". Kluwer Academic Press, Dordrecht. Havrda, J. H. and Charvat, F. (1967). "Quantification Methods of Classication Processes: Concept of Structural a Entropy". Kybematika, 3, 30-35. H.K.Kesavan and J.N.Kapur (1989). "The Generalized Maximum Entropy Principle". IEEE Trans. Syst. Man. Cyb. 19, pages lO42-lO52. H.K.Kesavan and J .N.Kapur (1990a). Maximum Entropy and Minimum Cross Entropy Principles: Need for a Broader Perspective. In Fougere, P. F., editor, Maximum Entropy and Bayesian Methods, pages 419-432. Kluwer Academic Publishers. H.K.Kesavan and J.N .Kapur (1990b). "On the Family of Solutions of Generalized Maximum and Minimum Cross-Entropy Models". Int. Jour. Gen. Systems vol. 16, pages 199-219. Jaynes, E. (1957). "Information Theory and Statistical Mechanics". Physical Reviews, 106,620630. J.G.Erickson and C.R.Smith, editors (1988). "Maximum Entropy and Bayesian Methods in Science and Engineering, vol 1 (Foundations), vol 2 (Applications)". Kluwer Academic Publishers, New York. J .H.Justice, editor (1986). "Maximum Entropy and Bayesian Methods in Applied Statistics". Cambridge University Press, Boston. J.N.Kapur and H.K.Kesavan (1990). Inverse MaxEnt and MinxEnt Principles and their Applications. In Fougere, P. F., editor, Maximum Entropy and Bayesian Methods, pages 433-450. Kluwer" Academic Publishers. J.N.Kapur and Seth, A. (1990). A Comparative Assessment of Entropic and Non-Entropica Methods of Estimation. In Fougere, P. F., editor, Maximum Entropy and Bayesian Methods, pages 451-462. Kluwer Academic Publishers. J .Skilling, editor (1989). "Maximum Entropy and Baysean Methods". Kluwer Academic Publishers, Doedrecht. . Kapur, J. (1989). "Maximum Entropy Models in Science and Engineering". Wiley Eastern, New DeIhL Kapur, J. and Kumar, V. (1987). "A measure of mutual divergence among a number of probability distributions". Int. Jour. of Maths. and Math. ScL, 10,3, 597-608. Kapur, J. N. and Kesavan, H. K. (1989). "Generalized Maximum Entropy Principle (with Applications)". Sandford Educational Press, University of Waterloo. Kapur, J. N. and Kesavan, H. K. (1992). "Entropy Optimization Principles and their Applications". Academic Press, San Diego. Kullback, S. (1959). "Information Theory and Statistics". John Wiley, New York. Kullback, S. and Leibler, R. (1951). "On Information and Sufficiency". Ann. Math. Stat., 22, 79-86. Rae, C. R. (1989). "Statistical Data Analysis and Inference". North-Holland, Amsterdam. Renyi, A. (1961). "On Measures of Entropy and Information". Proc. 4th Berkeley Symp. Maths. Stat. Prob., I, 547-561. Seth, A. K. (1989). "Prof. J. N. Kapur's Views on Entropy Optimization Principles". Bull. Math Ass. Ind., 21,22, 1-38,1-42. Shannon, C. E. (1948). "Mathematical Theory of Communication". Bell System Tech. Journal, 27,1-4,379-423,623-656. Smith, C. and W.T. Grandy, J., editors (1985). "Maximum-Entropy and Bayesian Methods on Inverse Problems". D. Reidel, Doedrecht, Holland. Theil, H. and Fiebig, D. (1984). "Exploiting Continuity: Maximum Entropy Estimation of Continuous Distributions". Ballinger, Cambridge. Tribus, M. (1966). "Rational Descriptions, Decisions and Designs". Pergamon Press, Oxford. Wilks, S. S. (1963). "Mathematical Statistics". John Wiley, New York. Zellner, A. (1977). "Maximal data information prior distributions". In Aykac, A. and Brumat, C., editors, New Developments in the Applications of Bayesian Methods. North Holland, Amsterdam.

AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE

N B. HARMANCIOGLU Dokuz Eylul University Faculty of Engineering Bornova 35100 Izmir, Turkey In the design and operation of hydrologic data collection networks, the question of how long data gathering should be continued is a problem that is not often addressed. The current situation is that no definite criteria have been established to decide upon when and where to terminate data collection. Entropy-based measures of information, as used in this study, present convenient and objective means of assessing the status of an existing station with respect to information gathering. Such an assessment can be realized by evaluating the redundancy of information in both the time and the space domains. Accordingly, a particular site that either repeats the information provided by neighboring stations or produces the same information by successive measurements can be discontinued. The presented . study shows that the entropy concept can be effectively employed to evaluate the spatial and temporal redundancy of information produced by a hydrologic network. The application of the method is demonstrated on case studies comprising water quality and quantity monitoring networks in selected Turkish river basins. INTRODUCTION

There are four basic issues to be considered in the design and operation of hydrologic data collection networks: what to measure, where, when, and for how long. Among these, the last issue of how long data collection should be continued is not often addressed even though monitoring agencies, often due to budgetary constraints, would like to know whether continuous data collection practices are required or not. The current situation is that no definite criteria have been established to decide upon when and where to terminate data collection. Maddock (1974) was among the first to address the problem; he considered station discontinuance based on correlation links with other stations in the network. Wood (1979) used the sequential probability ratio test, where the decision whether to discontinue a station is dependent on statistical considerations that include the 163 K. W Hipel etal. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 163-176. © 1994 Kluwer Academic Publishers.

164

N. B. HARMANCIOGLU

error probabilities of accepting a model when it is incorrect as well as rejecting it when it is correct. Wood's approach is directed at stations whose primary purpose is to provide hydrologic information for design; in particular, he considered flood protection as the design purpose and demonstrated the method in case of flood frequency curve determination. Lane et a1. (1979) considered termination of hydrologic data collection at experimental watersheds in New Mexico and Arizona. Their decision-making procedure is modelled in terms of the theory of "bounded rationality", where a regional analysis and the Bayesian decision theory indicated the closure of the watersheds with respect to research objectives. With respect to the station discontinuance problem, two controversial approaches exist among hydrologists. Some claim that "the more data, the better", considering that long records of data increase our chances of understanding the natural phenomena. Others differentiate between "data" and "information" and consider that more data does not necessarily imply more information. This latter approach is adopted as the basic idea underlying the presented study. That is, it is claimed here that a station which does not convey any new information about the process observed should be discontinued. In this case, the problem is to determine the amount of information provided by a station with respect to both space and time. Entropy-based measures of information, as used in this study, present convenient and objective means of assessing the status of an existing station with respect to information gathering. To solve the problem in the space domain, spatial orientation of stations within a network has to be evaluated for redundancy of information so that a particular site that repeats the information provided by other stations can be discontinued. New observational sites may be considered where additional information is needed. Entropy-based measures can be effectively employed here to assess the spatial redundancy of information produced by the network. The problem is similar in the time domain. A monitoring site is again evaluated for redundancy of information, this time with respect to the temporal frequency and the duration of observations. If a station produces the same information via continuous monitoring, data collection may be terminated permanently or temporarily. Two approaches may be adopted here. The first one involves the investigation of temporal frequencies by the entropy method to assess whether the station, with its selected monitoring frequencies, repeats the "same information" by successive measurements. In this case, the results may indicate either a decrease in time frequencies or a complete termination of data collection at the site. The second approach focuses on the change of information conveyed by a station with respect to record length. Here, one may decide to discontinue the observations if, after a certain duration of monitoring, no new information is obtained by additional measurements. Again, the entropy method can be easily used to delineate the change of information conveyed with respect to length of data records.

165

AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE

The presented study shows that entropy measures of information constitute effective tools in evaluating station discontinuance in both space and time dimensions. The application of the method to the problem of station discontinuance is demonstrated on case studies comprising water quality and quantity monitoring networks in selected Turkish river basins. STATION DISCONTINUANCE IN mE SPACE DOMAIN

Entropy principle applied to multi-variables in the space domain In the multivariate case, the total entropy of M stochastically independent variables ~ (m=l, ... ,M) is (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992): M

H(Xl'X2, ... ,XM) =

(1)

~ H(~)

m=l

where

H(~)

represents the marginal entropy of each variable

~

in the form of:

N H(~)

(2)

= K ~ p(xn) log [1/p(~)] n=l

with K=l if H(~) is expressed in napiers for logarithms to the base e. Eq.(2) defines the entropy of a discrete random variable with N elementary events of probability P n = p(~) (n= 1, ... ,N) (Shannon and Weaver, 1949). For continuous density functions, p(~) is approximated as [f(xn).llx] for small Ax, where f(xn) is the relative class frequency and ~x, the length of class intervals (Amorocho and Espildora, 1973). Then the marginal entropy for an assumed density function f(~) is:

x.n

+00

H(x",;Ax) -

f

f(x) log [l/f(x)J dx

+ log[lI~J

(3)

_00

In the above, the selection oUx becomes a crucial decision as it affects the values of entropy (Harmancioglu and Alpaslan, 1992; Harmancioglu et aI., 1986). If significant stochastic dependence occurs between M variables, the total entropy has to be expressed in terms of conditional entropies H(~!Xl''''~) added to the marginal entropy of one of the variables (Harmancioglu, 1981; Topsoe, 1974):

N. B. HARMANCIOGLU

166

M

H(Xl'X2,···,XM )

= H(X1) + l: H(~ IXl'···'~_l)

(4)

m=2

Since entropy is a function of the probability distribution of a process, the multivariate joint and conditional probability distribution functions of M variables need to be determined to compute the above entropies (Harmancioglu, 1981): +00 +00

H(X"X., ...,XM) --

J.. J ..00

f(x,,···,xM)Jog f(X,,···,xM)·
(5)

..00

r· J

+00 +00

x,.,...,x...,) --

H(XM 1

f(x" ...,x,.)Jog f(... 1"".......-,). dx,

..00

dx,........

(6)

..00

The common information between M variables, or the so-called transinformation T(X1, ... ,XM ), can be computed as the difference between the total entropy of Eq.(l) and the joint entropy of Eq.(5). It may also be expressed as the difference between the marginal entropy H("m) and the conditional entropy of Eq.(6). It follows from above that the stochastic dependence between multi-variables causes their marginal entropies and the joint entropy to be decreased. This feature of the entropy concept can be used in the spatial design of monitoring stations to select appropriate numbers and locations so as to avoid redundant information (Harmancioglu and Alpaslan, 1992). An important step in the computation of any kind of entropy is to determine the type of probability distribution function which best fits the analyzed processes. If a multivariate normal distribution is assumed, the joint entropy of X (the vector of M variables) is obtained as (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992):

H(X) = (Ml2) In 2 + (1/2) In IC I + M/2

(7)

where M is the number of variables and I C I is the determinant of the covariance matrix C. Equation (7) gives a single value for the entropy of M variables. If logarithms of observed values are evaluated by the above formula, the same equation can be used for lognormally distributed variables. The calculation of conditional entropies in the multi-variate case can also be realized by Eq.(7) as the

AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE

167

difference between two joint entropies. For example, the conditional entropy of variable X with respect to two other variables Y and Z can be determined as: H(X IY,Z)

= H(X,Y,Z) - H(Y,Z)

(8)

Application to Assessment of Station Discontinuance

An investigation of station discontinuance in the space domain requires the assessment of reduction in the joint entropy of two or more variables due to the presence of stochastic dependence between them. In this case, the reduction is equivalent to the redundant information in the series of the same hydrologic variable observed at different sites. Application of the entropy principle within this context is demonstrated here on water quality and runoff data collected in two different basins in Turkey. Available monthly dissolved oxygen (DO) and electrical conductivity (EC) data from six sampling stations in the Porsuk river basin (numbered 010, 011, 013, 015, 016, and 019, consecutively in the downstream order) are used to investigate information transfer in space. Here, the common period of observation at all sites covers 38 months, as all stations except 010 have sporadic observations. Series of DO and Ee data collected at the six stations are assumed to be normally distributed. Joint entropies are computed by Eq.(7) for M=2, ... ,6, which can be used to determine the conditional entropies by Eq.(8). Next, transinformations are computed for M=2, ... ,6. For each variable, the joint entropy of simultaneous observations at all 6 stations represents the total amount of uncertainty that may be reduced by observations at each station. Increasing the number of stations contributes to this reduction so that the total uncertainty is decreased. Results of such computations are shown in Table 1, where the joint entropy of 6 stations represents the total uncertainty about the variable considered. The number of stations is increased by starting at the downstream station and successively adding to the list the next station in the upstream direction (Harmancioglu and Alpaslan, 1992). For DO, the first four stations (019, 016, 015, and 013) reduce 95% of the total uncertainty of 14.733 napiers so that the last stations produce redundant information in this combination. Thus, it appears that if stations 010 and 011 were discontinued, one would still recover 95% of the information available about DO. In the case of Ee, however, all 6 stations disclose only 35% of the total uncertainty. This implies that 65% of the information has to be obtained by the addition of new stations. This result appears to be physically justified for Ee since it is a significant indicator of nonpoint source pollution, which dominates water quality in the basin (Harmancioglu and Alpaslan, 1992). Next, different combinations of sampling sites are considered for DO to

168

N. B. HARMANCIOGLU

TABLE 1. Reduction in total uncertainty about water quality by increasing the number of stations (Harmancioglu and Alpaslan, 1992). Variable

DO

Joint entropy (napiers)

14.733

No.of stations 2 3 4 5 6

EC 36.543

Transinformations (napiers) 4.658 9.326 14.025 18.724 23.424

2.510 5.050 7.626 10.207 12.812

investigate additional reductions in total uncertainties. Results of three alternatives and the best combinations are shown in Table 2. The last combination of sampling sites provides the optimum solution in obtaining almost 100% of information. Accordingly, it appears that the two stations 013 and 019 do not contribute significantly to the reduction of uncertainty about DO levels in the Porsuk river. Thus, one may infer here that these two stations may discontinue DO observations. The assessment of station discontinuance in the above example is based on transinformations of various combinations of stations. An increase in these values is accomplished by either adding new stations or excluding some of the existing ones. In this case, costs of adding new sites or decreases in sampling costs by discontinuing some stations are to be evaluated in comparison with rates of information gain which is expressed as the ratio of transinformation of a particular combination of stations to the total uncertainty described by all stations in the network. When entropy measures indicate a need for new stations as in the case of EC, the same principle can be used to determine the numbers and locations of these sites. Such an investigation is not presented here since the major purpose of this study is to assess station discontinuance. Previously, Husain (1989) addressed this problem by proposing the information decay concept for network expansion purposes. Husain determines the location of new sites by analyzing the variation of entropy with distance. Similar analyses are carried out for 5 years of daily runoff data of three streamgaging stations, called Kagizman (Xl)' Cobandede (X2) and Mescitli (~) in the down- to upstream order along the Aras river basin in eastern Turkey. Assuming that the three variables are normally distributed, joint entropies and

169

AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE

TABLE 2. Transinformations and rates of uncertainty reduction for alternative combinations of stations in case of DO (Harmancioglu and Alpaslan, 1992). Stations

019, 016, 015, 013 019, 015, 013, 011 016, 015, 011, 010

Transinformations (napiers)

Rate of reduction (percent)

14.025 14.009 14.344

95 95

97

transinformations are computed as in Table 3. It may be observed here that neither a combination with two stations nor one with three stations is sufficient to reduce the total uncertainty of 14.162 napiers. Furthermore, the two upstream stations and ~ produce a significant amount of redundant information so that both combinations practically result in the same amount of uncertainty reduction. That is, when two stations reduce uncertainty by 8%, addition of the third station increases this rate only to 10%. This result is also confirmed by the conditional entropies H(XI/Xz) and H(XI /Xz,X3 ) of Table 3, which indicate that the presence of the third station does not help to further reduce the uncertainty of 5.536 napiers about Xl. In this case, one may consider the termination of Mescitli (X3 ) as it contributes a smaller amount of information as compared to Cobandede (X z) when two conditional entropies, H(XI/Xz) and H(XI~)' are considered. It may be preferred to investigate new sites that will be more informative than Mescitli (X3). The final decision whether Mescitli should be terminated or continued has to be based on costs of sampling versus the information provided by this station.

Xz

STATION DISCONTINUANCE IN THE TIME DOMAIN Assessment of station discontinuance with respect to record length The analysis of station discontinuance in the time domain is based on the assessment of temporal information transfer between successive measurements. By using the entropy principle, a monitoring site is evaluated for redundancy of information with respect to two factors: length of data records and sampling frequency. Investigation on the basis of the first factor focuses on the change of information conveyed by a station with respect to record length N. Here, one may decide to terminate a station if, after a certain duration of monitoring, no new information is obtained by additional measurements. The entropy method can be used to delineate the change of information conveyed by data on the basis of the record length N.

N. B. HARMANCIOGLU

170

TABLE 3. Assessment of runoff data for station discontinuance. Joint entropy: 14.162 napiers

Marginal entropy of Xl: 5.536 napiers

Stations

Conditional Rate of reduction (percent) entropy of Xl (napiers)

Transinformation (napiers)

1.258 1.495

4.278 4.263

8.9 10.0

The definition given in Eq.(3) for the marginal entropy of X involves the term "log (l/Ax)" which essentially describes a reference level of uncertainty according to which the entropy (uncertainty or information conveyed) of the process is evaluated. In this case, the entropy function assumes values relative to the reference level described by "log (l/Ax)" (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992; Harmancioglu et a!., 1992). On the other hand, some researchers propose the use of a function m(x) such that the marginal entropy of a continuous variable is expressed as: +00

H(X) = -

J

fix) In «x)/m(x)] dx

(9)

where m(x) is often considered to be a priori probability distribution function (Jaynes, 1983; Harmancioglu et a!., 1992). If a uniform distribution is assumed to be the prior distribution used to describe maximum uncertainty about the process prior to making any observations, then the marginal entropy of Eq.(3) becomes:

I

+00

H(X) = or:

H(X) =

fix) log [l/f(x)] dx

I

+ log[l/N]

«x) log [llf(x)] dx -logiN]

(10)

(11)

AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE

171

where "log N" becomes the reference level of uncertainty. Another property of this term is that, for a record of N observations, "log N" represents the upper limit of the entropy function (Harmancioglu, 1981; Topsoe, 1974). Using this property and rearranging Eq.(ll), one may write: +00

H'(X) = log[N] -

J

f(x) log [11f(x)] dx

(12)

..0()

where H'(X) now describes the change in information conveyed by data compared to the reference level of "log N". Here, the absolute value of the H'(X) function has to be considered as it defines the difference between the upper limit "log N" and the information provided by data of N records. Since "log N" represents the maximum amount of uncertainty, H'(X) measures the amount of information gained by making observations. This amount, or the rate of change of information conveyed, changes as N is increased. The particular point where H'(X) stabilizes or remains constant with respect to N indicates that no new information is gained by additional observations. If this point is reached within the available record of observations, then one may decide to discontinue sampling, as further observations will not bring further information. The approach described above is applied to nine years of daily runoff data of the three stations in Aras basin, which were analyzed in the previous section for station discontinuance in the space domain. Figure 1 shows the H'(X) values for each station in terms of the rates or changes of information gain with respect to record length N. These values are computed with daily data of each station for N years, starting with one year (365 observations) and consecutively increasing the length of record to 9 years by 365 x N. The curves obtained show that the rate of information gain is high for the first three years. The rate of change decreases after the fourth year, indicating a relatively smaller amount of information conveyed by data beyond this period. However, none of the stations have yet reached the point where this change is negligible; that is, the H'(X) values have not yet become constant with respect to N in anyone of the stations. Thus, one may infer that none of the stations have reached the point of termination. Assessment of temporal sampling frequencies If a monitoring station is found not to have reached the point of discontinuance, one may investigate further whether temporal sampling frequencies may be decreased or not. Again, entropy measures can be used to analyze this problem by evaluating

172

N. B. HARMANCIOGLU

c

3 b a

...... I/)

.... III

2

.a. IU

c::

X

........ -:J:

0

2

4

6

8

10 N( Years)

Figure 1. The change of information gain with respect to length of data records: (a) Kagizman, (b) Cobandede, (c) Mescitli . redundancy of information with respect to sampling frequencies. In this case, temporal frequencies have to be investigated to assess whether the station, with its already selected monitoring frequencies, repeats the "same information" by successive measurements. In this case, the results may indicate either a decrease in time frequencies or a complete termination of data collection at the site. It was stated earlier that the stochastic dependence between two processes causes their marginal and joint entropies to be decreased. The same is true for a dependent process when it is considered as a single variable. The marginal entropy of a single process that is serially correlated is less than the uncertainty it would contain if it were independent. If the values that a variable assumes at a certain time t can be estimated by those at times t-l, t-2, ..., the process is not completely uncertain because some information can be gained due to the serial dependence present in the series. In this case, stochastic dependence again acts to reduce entropy and causes a gain in information (Harmancioglu, 1981). This feature is suitable for use in the temporal design of sampling stations. Sampling intervals can be selected so as to reduce the redundant information between successive measurements (Harmancioglu and Alpaslan, 1992). For a single process, the marginal entropy as defined in Eq.(3) represents the total uncertainty of the variable without having removed the effect of any serial dependence. However, if the ith value of variable X, or Xi is significantly correlated to values xi _k ' k being the time lag, knowledge of these previous values xi _k will make it possible to predict the value of~, thereby reducing the marginal entropy of X.

173

AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE

To analyze the effect of serial correlation upon marginal entropy, the variable X can be considered to be made up of series ~, ~.l' .., ~.k' each of which represents the sample series for time lags k=O,I, ...,K and which obey the same probability distribution function. Then conditional entropies such as H(~ I~.l)' H(Xi I~.l' ~.2' ...' ~.k) can be calculated. If ~.k (k= 1,... ,K) are considered as different variables, the problem turns out to be one of the analysis of K + 1 dependent multivariables; thus, formulas similar to Eq.(6) can be used to compute the necessary conditional entropies (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992):

..00

..00

For a serially correlated variable, the relation: (14)

exists between the variables ~.k (k=O, ... ,K). Thus, as the degree of serial dependence increases, the marginal entropy of the process will decrease until the condition: (15)

is met for an infinitesimally small value of Eo It is expected that the lag k where the above condition occurs indicates the degree of serial dependence within the analyzed process (Schultze, 1969; Harmancioglu, 1981). The above approach is applied to the same three series of nine years of daily runoff data analyzed in the previous section. Figure 2 shows the change of their marginal entropies with respect to time lag k. It is observed here that, for all stations, only the first time lag, or the first order serial dependence, is effective in reducing the uncertainty of each station. The following lags do not contribute significantly to this reduction so that nonnegligible uncertainty still remains in the processes at time lags beyond k= 1. This result indicates that successive measurements do not repeat the same information in the time domain. Thus, the stations should not be discontinued. The fact that the first lag in each station produces the highest reduction of uncertainty raises the question whether the temporal frequencies may be extended fromAt=1 day toAt=2 days or to larger time frequencies. This problem is investigated by evaluating the transinformations (common information) between successive measurements for different At sampling

174

N. B. HARMANCIOGLU

--....

G

'Ci ftI c:

5

I II CLI

--oX

xI

...... xx.....

4 3

,---________________________________ a

~===========================~

2

~

0

2

4

6

10 k (time lag)

Figure 2. Reduction in marginal entropies of runoff data in the form of conditional entropies at successive time)ags: (a) Kagizman, (b) Cobandede, (c) Mescitli. frequencies. Transinformations T(~'~.l) are described here as percentages of the marginal (or total) information H(~). Figure 3 shows the changes in these ratios with respect todt sampling frequencies. It is observed here that the extension of the frequency to 2 days retains only about 25% of information, thereby leading to an information loss of about 75% for each station. Information is almost completely lost when the frequency is extended to three months as the ratio of 25% drops down to as low as 0.2 . 0.4% Even for At of 2 days, 75% is a significant percentage of information loss. Thus, one may decide here to continue with daily observations at each station. CONCLUSION

The study presented addresses the problem of station discontinuance on the basis of the information provided by a gage in both the space and the time domains. The entropy measures of information are used to quantitatively describe the contribution of a sampling site to the reduction of uncertainty about the processes observed. The approach used considers a particular gage as part of a gaging network, the purpose of which is not to serve particular design objectives but to gather information about a hydrologic process. Within this context, the application of the entropy principle to cases of observed water quality and quantity data shows that the method can be effectively used to

AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE

( Dfo)

175

to

Figure 3. Effects of sampling frequencies upon information gain about runoff processes at the three stations investigated. assess station discontinuance as it defines the information provided by a gaging network in quantitative terms. Thus, it is possible to measure such concepts as "information gain", "information loss", "redundant information, or "change of information" in specific units. Accordingly, assessment of station discontinuance can now be based on concrete grounds. One problem that has to investigated further is the assessment of a sampling site in a combined spatial/temporal framework to merge the two separate approaches (spatial and temporal) applied in this study. The entropy method still entails some limitations most of which are of a mathematical nature. To name a few, entropy, as a measure of the uncertainty of random processes, has not yet been precisely defined for continuous variables. The derivation of mathematical expressions for multivariate distributions other than normal and lognormal are highly complicated. Other difficulties encountered in application of the method are those that are valid for any other statistical procedure. These and other similar problems associated with the method (Harmancioglu et aI.,

176

N. B. HARMANCIOGLU

1993) will have to be solved as part of future research so that the entropy principle can be used more effectively in assessment of station discontinuance. REFERENCES Amorocho, J.; Espildora, B. (1973) "Entropy in the assessment of uncertainty of hydrologic systems and models", Water Resources Research, 9(6), pp.1551-1522. Harmancioglu, N. (1981) "Measuring the information content of hydrological processes by the entropy concept", Centennial of Ataturk's Birth, Journal of Civil Engineering, Ege University, Faculty of Engineering, pp.13-38. Harmancioglu, N.B., Yevjevich, V., Obeysekera, J.T.B. (1986) "Measures of information transfer between variables", in: H.W.Shen et al.(eds), Proc. of Fourth Int. Hydrol. Symp. on Multivariate Analysis of Hydrologic Processes, pp.481-499. Harmancioglu, N.B., Alpaslan, N. (1992) "Water quality monitoring network design: a problem of multi-objective decision making", A WRA, Water Resources Bulletin, Special Issue on "Multiple-Objective Decision Making in Water Resources", vol.28, no.1, pp.1-14. Harmancioglu, N.B.; Alpaslan, N.; Singh, V.P. (1993) "Assessment of the entropy principle as applied to water quality monitoring network design", International Conference on Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Waterloo, Canada, June 21-23, 1993. Harmancioglu, N.B.; Singh, V.P.; Alpaslan, N. (1992) "Versatile uses of the entropy concept in water resources", in: V.P. Singh & M. Fiorentino (eds.), Entropy and Energy Dissipation in Water Resources Dordecht, Kluwer Academic Publishers, Water Science and Technology Ubrary, pp.91-117. Husain, T. (1989) "Hydrologic uncertainty measure and network design", Water Resources Bulletin, 25(3), 527-534. Jaynes, E.T. (1983) Papers on Probability, Statistics and Statistical Physics (ed. by R.D. Rosenkrantz). Dordrecht, D. Reidel, vo1.158. Lane, L.J.; Davis, D.R.; Nnaji, S. (1979) "Termination of hydrologic data collection (a case study)", Water Resources Research, vo1.15, no.6, pp.1851-1858. Maddock, T.III (1974) "An optimum reduction of gauges to meet data program constraints", Hydrologic Sciences Bulletin, 19, pp.337-345. Shannon, C.E. and Weaver, W. (1949) The Mathematical Theory of Communication, The University of Illinois Press, Urbana, Illinois. Schultze, E. (1969) Einfuhrung in die Mathematischen Grundlagen der Informationstheorie. Berlin, Springer-Verlag, Lecture Notes in Operations Research and Mathematical Economics, 116 pp. Topsoe, F. (1974) Informationstheorie. Stuttgart, B.O. Teubner, 88p. Wood, E.F. (1979) "A statistical approach to station discontinuance", Water Resources Research, vol.15, no.6, pp.1859-1866.

ASSESSMENT OF TREATMENT PLANT EFFICIENCIES BY THE ENTROPY PRINCIPLE

N.ALPASLAN Dokuz Eylul University Faculty of Engineering Bomova 35100 Izmir, Turkey

The inputs and outputs of water and wastewater treatment plants (TP) are significantly variable so that the design and operation of TP require an assessment of such random fluctuations. In practice, however, this variability is often inadequately accounted for, and mean values (or maximum values) of the input and output processes are used in either designing the TP or evaluating their operational performance. The study presented introduces the use of the entropy concept in assessing the uncertainty of input and output processes of TP within an informational context. In particular, the entropy measures of information are employed to define a "dynamic efficiency index" (DEI) as the rate of reduction in input uncertainty or entropy to arrive at a minimum amount of uncertainty in the outputs. Besides describing the performance of TP, the definition of such an index has further advantages, the most significant one being in sensitivity analyses of process parameters. The approach described is demonstrated in the case of an existing TP of a paper factory, for which input and output data on BOD, COD, and TSS concentrations are available. INTRODUCTION The inputs and outputs of water and wastewater treatment systems fluctuate significantly with time so that the proper design and operation of treatment plants (TP) require an assessment of such variability. Inputs to a water treatment plant are provided by sources such as surface waters, groundwater, or reservoirs whose output processes are basically random variables. The outputs from the TP also have a variable character depending upon the operation of the plant, which essentially produces such variability (Weber and Juanico, 1990). Inputs to wastewater TP show more fluctuations as they are constituted by domestic and industrial wastewaters. Again, the operation of the TP inevitably produces variable outputs. 177 K. W. Ripel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 177-189. © 1994 Kluwer Academic Publishers.

178

N.ALPASLAN

It is not easy to accurately quantify these random fluctuations in both the inputs and the outputs of TP. This is often due to poor knowledge of the various sources of variability that can affect such processes. Consequently, in practice, the random nature of inputs and outputs are inadequately accounted for, and their mean or maximum values are used in either designing the TP or in evaluating its operational performance. With respect to design of TP, the most important factor that affects the selection of design parameters is the variability and thereby the uncertainty of the input processes. Use of mean or maximum values, which is often the case in practice, to describe the inputs may lead to overdesign or underdesign. At this point, one needs to properly assess the variability of the inputs so that the design parameters can be selected accordingly. With respect to the operation of TP, the performance of the treatment system has to be evaluated. This is often realized by defining the "operational efficiency" of the TP in terms of both the inputs and the outputs. Efficiency is simply described as: (1) where E is the efficiency, and Si and Se are the time variant random input and output (effluent) concentrations, respectively. In practice, the TP system is assumed to reach a level of steady state conditions. That is, no matter how variable the inputs are, the TP is expected to produce an effluent of uniform and stable characteristics. This implies a reduction in input variability to obtain smooth effluent characteristics. The assumption of steady state conditions is made for the sake of simplicity both for design purposes and for assessment of operation. In that case, the above definition of efficiency is given either for a single time point or by using the means of Si and Se data within the total period of observation. In reality, the operation of a TP reflects a dynamic character; yet, due to the assumptions of steady state conditions, there exists no definition of efficiency that accounts for the dynamic or random character of the inputs and outputs of the TP. There are only a few studies that focus on the variability of input/output processes of TP in evaluating its performance. Weber and Juanico (1990) describe in statistical terms that the effluents from a TP may be as variable as the input raw water or sewage. They claim that the coefficient of variation is a better indicator of input/output variability than the standard deviation as a representative statistic. Ersoy (1992) and Tas (1993) arrived at similar results in comparing various statistical parameters to describe random fluctuations of these processes. They also investigated the relationships between input and output variables to express operational efficiency so as to account for the dynamic character of TP. However, none of the above studies have arrived at a precise relationship between the

ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE

179

variability of input/output processes and the efficiency of the TP. Tai and Goda (1980 and 1985) define "thermodynamic efficiency" on the basis of thermodynamic entropy and show that the efficiency of a water treatment system can be described by the rate of decrease in the entropy of polluted water. They also relate the reduction in thermodynamic entropy to entropy of discrete information conveyed by the output process of a TP. The study presented considers the use of the informational entropy concept as a measure of uncertainty in evaluating the variability of both the input and the output processes of a TP. Furthermore, the entropy concept is also proposed here to define a "dynamic efficiency index" (DEI) as the rate of reduction in input uncertainty or entropy to arrive at a minimum amount of uncertainty in the outputs. In the ideal case, the TP is expected to produce outputs that comply with an effluent standard, the value of which is generally constant. In terms of entropy, this constant indicates a value of zero entropy. In practice, though, the outputs will fluctuate around the standard, and the TP is expected to reduce the variability of the output (effluent) so that such fluctuations are kept below the standard value. In entropy terms again, this indicates that the uncertainty of the effluents should be minimum or that it should approach zero. Then the performance of the TP can be evaluated by means of the DEI which measures the rate of reduction in uncertainty (entropy) of the inputs so that it approaches zero, or in practical terms, a minimum value for the entropy of the outputs. The term "dynamic" is used here to indicate that efficiency is expressed on the basis of variability of input/output processes by using entropy measures of uncertainty. The approach described above is demonstrated in the case of an existing TP of a paper factory, for which input and output data on BOD, COD, and TSS concentrations are available. The results of the application indicate the need for further investigations on the subject, particularly in relating the informational entropy concept to the dynamic processes of treatment. APPLIED METHODOLOGY

The informational entropy concept Entropy is a measure of the degree of uncertainty of random hydrological processes. Since the reduction of uncertainty on the observer's side by means of collecting data is equal to the amount of information gained, the entropy criterion indirectly measures the information content of a given series of data (Shannon and Weaver, 1949; Harmancioglu, 1981). The entropy of a discrete random variable X with N elementary events of probability P n = p(xn) (n= 1, ... ,N) is defined in information theory as (Shannon and Weaver, 1949):

N.ALPASLAN

180

H(X)

=K

N

p(x.,) log [l/p(x.,)]

~

(2)

n=l

with K =1 if H(X) is expressed in napiers for logarithms to the base e. H(X) gives a single value for the information content of X and is called the "marginal entropy" of X, which always assumes positive values within the limits 0 and log N. When two random processes X and Y occur at the same time, stochastically independent of each other, the total amount of uncertainty they impart or the total amount of information they may convey is the sum of their marginal entropies (Harmancioglu, 1981): H(X,Y)

= H(X) + H(Y)

(3)

When significant dependence exists between variables X and Y, the concept of "conditional entropy" has to be introduced as a function of the conditional probabilities of X and Y with respect to each other:

H(XIY) = -K

N

N

~

~

p(xn'Yn).log p(xn IYn)

(4)

~ H(YIX) =-K ~ p(xn'Yn).log P(Yn Ixn) n=l n=l

(5)

n=l n=l N

N

where p(x."yn) (n = 1,... ,N) define the joint probabilities and p(x.,1 yn) or p(yn Ixn) the conditional probabilities of the values x., and Yn. The conditional entropy H(X I Y) defines the amount of uncertainty that still remains in X, even if Y is known; and the same amount of information can be gained by observing X. If the variables X and Yare stochastically dependent, the total entropy is expressed as (Schultze, 1969; Harmancioglu, 1981): H(X, Y)

= H(X) + H(Y I X)

(6)

+ H(X I Y)

(7)

H(X, Y) = H(Y)

The total entropy H(X,Y) of dependent X and Y will be less than the total entropy if the processes were independent:

ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE

181

H(X,Y) < H(X) + H(Y)

(8)

In this case, H(X, Y) represents the joint entropy of X and Y and is a function of their joint probabilities:

N

H(X,Y)

=K~

N ~

n=l n=l

p(xn'Yn) log [l!p(xn'Yn)]

(9)

The difference between the total and the joint entropy is equal to another concept of entropy called "transinformation": T(X,Y) = H(X)

+ H(Y) - H(X,Y)

(10)

When stochastic dependence exists between X and Y, the total uncertainty is reduced in the amount of T(X,Y) which is common to both processes, since transinformation represents the amount of information that is repeated in X and Y (Amorocho and Espildora, 1973; Harmancioglu, 1981). By replacing the term H(X,Y) in Eq.(9) with its definition given in Eqs.(6) or (7), transinformation can be formulated as: T(X,Y) = H(X) - H(Y IX)

(11)

T(X, Y) = H(X) - H(X IY)

(12)

Transinformation, like the other concepts of entropy, always assumes positive values and equals 0 when two processes are independent of each other. For continuous density functions, p(xn) is approximated as [f(xn). Ax] for small t.x, where f("n) is the relative class frequency and /lx, the length of class intervals (Amorocho and Espildora, 1973). Then the marginal entropy for an assumed density function f(xn) is:

+00

H(X;dx) =

J

f(x) log [1If(x») dx + log[I!.!.x)

_00

and the joint entropy for a given bivariate density function f(x,y) is:

(13)

N.ALPASLAN

182

IJ

+00 +00

H(X, Y;&x,&y)-

f(x,y) log [l/f(x,y)] dx.dy

+ log [I/AxAy]

(14)

-00 -00

Similarly, the conditional entropy of X with respect to Y is expressed as a function of f(x,y) and the conditional probability distribution f(x Iy):

H(X IY; Ax)

=

JJ

f(x,y) log [I/f(x Iy)] duly

+ 10g[I/Ax]

(15)

.00 .00

The transinformation T(X,Y) is then computed as the difference between H(X;4x) and H(X IY; Ax). In the above, the selection of 4x becomes a crucial decision as it affects the values of entropy (Harmancioglu et aI., 1986; Harmancioglu and Alpaslan, 1992). Application of the concept to assessment of TP input/output variability The variability and, therefore, the uncertainty of input and output processes of a TP can be measured by the entropy method in quantitative terms. This can be realized by computing the marginal entropy of each process by Eq.(13) under the assumption of a particular probability density function. Representing the input process by ~ and the output process by Xe, marginal entropies H(Xi ) and H(Xe) can be expressed in specific units to describe the uncertainty prevailing in such processes. It must be noted here that the level gf uncertainty obtained is relative to the selected discretizing interval x of Eq.(13). Such relativity in entropy measures may appear to be inconvenient in assessment of uncertainty (Harmancioglu et aI., 1993). However, since the objective here is to compare the uncertainty of inputs with that of the outputs, the problem is sufficiently handled when the same x is used for both processes. In this case, uncertainty of both processes is rated with respect to the same reference level of uncertainty defined by log (1/ x). Likewise, marginal entropies of different water quality variables can be compared with each other by keeping the reference uncertainty level constant for all variables. As stated earlier, a treatment system is expected to reduce the variability of inputs to produce an effluent with stable characteristics. This implies that the most efficient operation of a TP is realized when maximum reduction in the uncertainty or variability of the inputs is achieved to arrive at a minimum amount of uncertainty

ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE

183

in the outputs. In entropy terms, the uncertainty of the effuents Xe, or H(Xe)' should be minimized. Accordingly, DEI can be defined as the rate of reduction in uncertainty (entropy) of the inputs, H(Xi ), so that it approaches a minimum value for the entropy of the outputs, H(Xe). Such a measure can be expressed in (%) as: DEI = [H(~) - H(Xe)] / H(Xi) x 100%

(16)

In essence, the above definition involves the first requirement for efficient treatment, namely that the difference between input/output variability must be maximized. This difference is actually an indicator of treatment capacity such that if it reflects low efficiency, then the process or design parameters of the TP may have to be changed to increase the DEI. Furthermore, calculation of the DEI for different values of different process parameters can help to identify those parameters which significantly affect the treatment system. The TP may be considered sensitive to those values of particular param6ters which lead to maximum reductions in H(Xi ). Then, such parameters will need to be more strictly observed throughout monitoring procedures for both the design and the operation of the TP. The above approach to assessment of efficiency appears to comply with the thermodynamic efficiency definition given by Tai and Goda (1980 and 1985). As mentioned earlier, their description of efficiency refers to the decrease in the thermodynamic entropy of polluted water, where the treated media moves from a state of disorder to order. The terms "order" and "disorder" are analogical in both the thermodynamic and the informational system considered. In the former, "disorder" refers to thermodynamic disorder or pollution, which can be measured by the thermodynamic entropy of the system. In the latter, "disorder" indicates high variability in the system, again quantified by entropy measures albeit in an informational context. Accordingly, the two efficiency definitions, one given by Tai and Goda (1980 and 1985) and the other presented here, are similar in concept; the major difference between them is that the former is given in a thermodynamic framework, whereas the latter is presented on a probabilistic basis. The second requirement for effective treatment is recognized as the insensitivity of effluents with respect to the inputs. That is, the correlation between the inputs and the outputs is required to be a minimum for a reliable treatment system. Such a requirement may also be considered as an indicator of TP efficiency. Entropy measures can again be employed to investigate the relationship between input/output processes. In this case, conditional entropies in the form of H(XJXi) have to be computed as in Eq.(15). The condition H(XJXi)=H(Xe) indicates that the outputs are independent of the inputs and, consequently, that the TP is effective in processing the inputs. Otherwise, if inputs and outputs are found to be correlated with H(XeIXi)IH(Xe), this implies that the effluents are sensitive to the inputs and that the treatment system fails to effectively transform the inputs.

184

N.ALPASLAN

Another entropy measure of correlation between the input and the output processes is transinformation T(Xi,Xe). If transinformation between the two processes is zero, this indicates that they are independent of each other. In the case of complete dependence, T(Xi,Xe) will be as high as the marginal entropy of one of the processes.

APPLICATION The above described methodology is applied to the case of Seka Dalaman Paper Factory treatment plant in Turkey, for which input and output data on BOD (biochemical oxygen demand), COD (chemical oxygen demand), and TSS (total suspended solids) concentrations are available. The Seka treatment plant, with its physical and biological treatment units, is designed to process wastewaters from the factory at a capacity of 4500 m3/day. Data on daily BOD, COD, and TSS concentrations were obtained for the period between January 1989 and October 1991. The input variables were monitored at the main entrance canal to the treatment plant, and the output concentrations were observed at the outlet of the biological treatment lagoon. The available data sets have a few missing values as monitoring was not performed on days of plant repair and maintenance. First, input/output processes of the three variables are analyzed for their statistical properties. Table 1 shows these characteristics together with the classical efficiency parameter computed by Eq.(I) using the mean values of the inputs and the outputs. According to these computations, the TP appears to process TSS more efficiently than the other two variables; in fact, the efficiency drops down to 75% for COD. It is also interesting to note in Table 1 that both the input and the output processes of TSS reflect more uncertainty than BOD and COD if the coefficient of variation, Cv' is considered as an indicator of variability. Furthermore, for high levels of efficiency, the Cv of outputs is higher than that of the inputs as in the case of BOD and TSS. Next, the same data are investigated by means of entropy measures to assess their variability and the efficiency of the TP. Table 2 shows the marginal entropies H(~) and H(Xe) of the input/output processes, the conditional entropies H(XJXi) of the effluents with respect to the inputs, transinformations T(Xi,Xe), joint entropies H(~,Xe)' and finally the DEI of Eq.(16) for each variable. These entropy measures are computed by Eqs.(13) through (15), assuming normal probability density function for each variable. It may be observed from this table that the input TSS has the highest variability (uncertainy or entropy) followed by COD and BOD. With respect to the outputs, COD still shows high variability whereas the uncertainty (entropy) of BOD and TSS are significantly reduced. Likewise, the joint entropy of inputs and outputs is the highest for COD. These results show that the treatment processes applied result in TSS having the highest reduction in input uncertainty and COD

ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE

185

TABLE 1. Statistical properties of input/output data of Seka treatment plant

Variable

Sample size N

(mg/lt)

(mg/lt)

BOD input

Mean

843

170.95

Standard Deviation (mg/lt)

54.11

Coefficient of Variation

Efficiency (%)

0.32 91

BOD output

843

15.83

9.37

0.58

COD input

819

760.17

343.33

0.45

COD output

819

188.17

52.39

0.28

TSS input

880

458.58

304.09

0.66

TSS output

880

19.26

14.41

0.73

75

96

having the lowest reduction. This feature is also reflected by the dynamic efficiency index which reaches the highest value for TSS and the lowest for COD. It is interesting to note that when the classical efficiency measure of Eq.(I) gives values in the order of 96%, 91%, and 75% respectively for TSS, BOD, and COD, the efficiencies defined by entropy measures on a probabilistic basis result in the respective values 50%, 36%, and 25%. Although the two types of efficiencies described do not achieve similar values, their relative values for each variable are in the same order. That is, both types of efficiencies are the highest for TSS and the lowest for COD with BOD in between. On the other hand, the DEI values comply with the thermodynamic efficiency definition given by Tai and Goda (1980 and 1985). The DEI rates obtained here reflect the highest reductions in input uncertainty or entropy under the prevailing operational procedures. The only difference between the DEI presented here and the thermodynamic efficiency of Tai and Goda (1980 and 1985) is that the DEI are expressed on the basis of a probabilistic measure of entropy rather than on a thermodynamic basis. The entropy measures shown in Table 2 also reveal the relationship between inputs and outputs for each variable. For TSS, the conditional entropy of outputs H(XA) is equal to the marginal entropy H(Xe),which indicates that the outputs are

N.ALPASLAN

186

TABLE 2. Assessment of input/output variability by entropy based measures. BOD

COD

TSS

H(Xi)

5.4424

7.2110

7.1357

H(Xe)

3.4837

5.4264

3.5662

H(XJXi)

3.4586

5.3996

3.5660

H(~,Xe)

8.9011

12.6106

10.7017

T(~,Xe)

0.0251

0.0268

0.0002

35.9

24.8

50.0

Entropy Measures (in napiers)

DEI (%)

independent of the inputs. The same result is shown by the transinformation T(Xi,Xe) as its value is very close to zero. Accordingly, it is observed here that the output TSS of the treatment process is insensitive to the inputs and that the operation of the TP can be considered reliable in the case of TSS. The level of dependence increases slightly for BOD and COD, indicating some correlation between the inputs and the outputs. For these two variables, the conditional entropies are close to but not equal to the marginal entropies of the output processes. Thus, one may state that the reliability of the TP decreases for BOD and COD. Classical correlation analyses are also applied here to investigate the relationship between the inputs and the outputs. The correlation coefficients obtained are 0.009, 0.051 and 0.26 respectively for TSS, BOD, and COD. Statistical significance tests, for the available sample size, show that the coefficients for TSS and BOD are not significantly different from zero, whereas for COD, the correlation coefficient of 0.26 is significantly different from zero. These results confirm those obtained by entropy measures, regarding TP input/output dependence. It must be noted in the above entropy computations that the marginal entropies, transinformations, and joint entropies shown in Table 2 are relative to the uncertainty level represented by Ax, which is selected as 1 mgllt for all variables in this application. When Ax is changed, the above entropy measures also change;

ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE

187

however, the rates of reduction in input uncertainty, transinformations, and the relationship between the conditional and the marginal entropies remain the same. Furthermore, the results are also sensitive to the assumption of a particular probability distribution for both the inputs and the outputs. Distributions functions of best fit must be investigated and used to obtain reliable entropy values (Harmancioglu and Alpaslan, 1992; Harmancioglu et al., 1993). CONCLUSION The input and output processes of a TP fluctuate significantly with time. This variability is often insufficiently recognized in both the design and operation of treatment systems. The study presented proposes the use of entropy measures of information in the assessment of input/output uncertainty of TP. Such measures help to identify how variable or how steady the inputs or the outputs are so that both the design parameters and the operational efficiency of a TP can be evaluated. According to the approach applied, the operational efficiency of TP is assessed by means of the "dynamic efficiency index" (DEI) which represents the rate of reduction in the uncertainty of inputs to produce minimum amount of entropy or uncertainty in the outputs. In essence, two requirements are foreseen for an effective and reliable treatment system: (a) the highest reduction in input uncertainty must be obtained; (b) the outputs must be insensitive to the inputs; that is, the condition of independence must be satisfied. The entropy measures, as proposed here, can be effectively used to assess whether these two requirements are met by the operation of the TP. The definition of TP efficiency on the basis of entropy measures has further advantages, the most significant one being in sensitivity analyses of process parameters. Such parameters in biological treatment, for instance, may be maximum specific growth rate, decay rate, and yield coefficient, the values of which must be selected for the design of TP. The TP system is either sensitive or insensitive to these design parameters so that its efficiency is eventually affected by them. The effects of these parameters are already recognized; however, the degree of uncertainty they convey to the system are not well quantified. Values of these parameters for design purposes are either taken from literature or determined by laboratory analyses. Outputs from a simulation model of a TP may be observed with respect to different parameter values and in calculating DEI for the system for each case. The TP system may be considered sensitive to those values of parameters which lead to maximum reductions in H(~). Then those parameters will need to be more strictly observed throughout the data collection procedures for the design as well as for the operation of TP. As discussed earlier, the DEI definition proposed in the study is parallel to the thermodynamic efficiency description of Tai and Goda (1980 and 1985), who express

188

N. ALPASLAN

efficiency in terms of the rate reduction in thermodynamic entropy within the process of treatment. Tai and Goda (1985) have also attempted to relate thermodynamic entropy and the entropy of discrete information. It is claimed here that further investigations are needed on the subject, particularly in disclosing the relationship between the informational entropy measures and the dynamic (physical, thermodynamic, etc.) processes of treatment. Another feature of the entropy principle which has to be further investigated is the relativity of entropy measures with respect to the selected discretizing intervals I:tx when continuous density functions are used to represent the random variable. This and other mathematical difficulties associated with the method are discussed in detail by Harmancioglu et al. (1993). Once such problems are solved, the entropy measures can be effectively used to assess TP efficiencies with respect to both design parameters and operational procedures. REFERENCES Amorocho, J.; Espildora, B. (1973) "Entropy in the assessment of uncertainty of hydrologic systems and models", Water Resources Research, 9(6), pp.1551-1522. Ersoy, M. (1992) Various Approaches to Efficiency Description for Wastewater Treatment Systems (in Turkish), Dokuz Eylul University, Izmir, Graduation Project in Environmental Engineering (dir. by N.Alpaslan). Harmancioglu, N. (1981) "Measuring the information content of hydrological processes by the entropy concept", Centennial of Ataturk's Birth, Journal of Civil Engineering, Ege University, Faculty of Engineering, pp.13-38. Harmancioglu, N.B., Yevjevich, V., Obeysekera, J.T.B. (1986) "Measures of information transfer between variables", in: H.W.Shen et al.(eds), Proc. of Fourth Int. Hydrol. Symp. on Multivariate Analysis of Hydrologic Processes, pp.481-499. Harmancioglu, N.B., Alpaslan, N. (1992) "Water quality monitoring network design: a problem of multi-objective decision making", A WRA, Water Resources Bulletin, Special Issue on "Multiple-Objective Decision Making in Water Resources", vol.28, no. 1, pp.I-14. Harmancioglu, N.B.; Alpaslan, N.; Singh, V.P. (1993) "Assessment of the entropy principle as applied to water quality monitoring network design", International Conference on Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Waterloo, Canada, June 21-23, 1993. Shannon, c.E. and Weaver, W. (1949) The Mathematical Theory of Communication, The University of Illinois Press, Urbana, Illinois. Schultze, E. (1969) Einfuhrung in die Mathematischen Grundlagen der Informationstheorie. Berlin, Springer-Verlag, Lecture Notes in Operations Research and Mathematical Economics, 116 pp.

ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE

189

Tai, S. and Gada, T. (1980) "Water quality assessment using the theory of entropy", in: M.J. Stiff (ed.), River Pollution Control, Ellis Horwood Publishers, ch.21, pp.319-330. Tai, S. and Gada, T. (1985) "Entropy analysis of water and wastewater treatment processes", International Journal of Environmental Studies, Gordon and Breach Science Publishers, vol.25, pp.13-21. Tas, F. (1993) Definition of Dynamic Efficiency in Wastewater Treatment Plants (in Turkish), Dokuz Eylul University, Izmir, Graduation Project in Environmental Engineering (dir. by N.Alpaslan). Weber, B. and Juanico, M. (1990) "Variability of effluent quality in a multi-step complex for wastewater treatment and storage", Water Research, vo1.24, no.6, pp.765-771.

INFILLING MISSING MONmLY MULTIVARIATE APPROACH

STREAMFLOW

DATA

USING

A

C. GOODIER AND U. PANU Department of Civil Engineering, Lakehead University, Thunder Bay, Ontario, Canada, P7B-5El.

Water resources planners and managers use historic monthly streamflow data for a variety of purposes. Often, the data set is not complete and gaps may exist due to various reasons. This paper develops and tests two computer models for infilling the missing values of a segment. The first model utilizes data only from the series with a segment of missing values, whereas the second model utilizes data from the series with a segment of missing values as well as from other concurrent series without a segment of missing values. These models are respectively referred to as the Auto-Series (A~) model and the Cross-Series (CS) model. Both models utilize the concepts of seasonal segmentation and cluster analysis in estimation of the missing values of a segment in a set of monthly streamflows. The models are evaluated based on comparison of percent differences between the estimated and the observed values as well as on entropic measures. Results indicate that the AS model provides adequate predictions for missing values in the normal range of flows, but is less reliable during extreme (high) range of flows. Whereas, the results from the CS model indicate that the use of another concurrent set of streamflow data enhances predictions for all ranges of flows.

INTRODUCTION In the past, various approaches have been used for infilling missing values in monthly streamflow data [panu (1992)], and among them, the most commonly used are the regression approach and the multivariate approach. One multivariate approach incorporating the concept of segmentation of data series into seasonal segments (or groups) was suggested by Panu and Unny (1980) and later developed by Afza and Panu (1991). This approach utilizing the characteristics of segmented data for infilling missing values of a segment has a distinct advantage over the regression approach. The 191 K. W Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 191-202. © 1994 Kluwer Academic Publishers.

192

C. GOODIER AND U. PANU

latter approach treats each data point as an individual value, while the former approach utilizes the group characteristics of similar data values. Based on such consideration of values in groups, the missing values can be in filled as a whole group rather than as individual values. The multivariate approach and the regression approach are conceptually presented in Figure 1.

MISSING

(a) MULTIVARlATEAPPROACH

DATA HIGH FLOW ENTIRE GROUP ALLED IN ONE STEP

!"~"''''''-''''''''

0

6

12

....

42

24 18 l1ME IN MONTHS

1200 MISSING

1(XX)

if;

<

.s 3: 0

..J LL.

(b) REGRESSION APPROACH

DATA EACH POINT

8:X)

FILLED INDMDUALLY

em 400

c:J

200

6

12

18 24 11ME IN MONTHS

3J

36

42

Figure 1. Data Infilling Approaches: (a) Multivariate Approach and (b) Regression Approach. One problem with the regression approach is the diverging confidence limits for subsequent estimates as more reliance is given to the most recent estimate of a previously

INFILLING MISSING MONTHLY STREAMFLOW DATA

193

unknown value. On the other hand, the multivariate approach has constant confidence limits for the segment. The model development based on the multivariate approach follows.

MODEL DEVEWPMENT The development of the AS and the CS models is summarized in the form of a flow chart in Figure 2.

DETERMINE SEASONAL SEGMENTATION

Figure 2. Flow Chart for Model Development. The first step in developing the models is the determination of seasonal segments in the data series. These seasonal segments, as described by Panu (1992), form the pattern vectors. The correlogram and the periodogram are used to infer seasonality for both the models. The AS model requires segmentation of the data series with missing values. Whereas, the CS model requires segmentation of both, the concurrent data and the data series with missing values. The next step involves testing for multivariate normality of pattern vectors. In order to test for multivariate normality, the ranked Mahalanobis Distance is plotted against the theoretical X2 values corresponding to different probabilities. If the pattern vectors exhibit signs of non-normality, transformations are applied to the vectors until

C. GOODIER AND U. PANU

194

multivariate normality is achieved. Once the set of pattern vectors is determined to be multivariate normal, a Kmeans algorithm [Hartigan and Wong (1979)] is applied to group the similar pattern vectors into clusters. The purpose of clustering is to recognize the occurrence of pattern vectors in the data set. In turn, this information is used to develop an inter-pattern structure of the model based on the assumption that the dependence among patterns can be described by lag-one Markovian structure. For the two models, the final step in the estimation of missing values is slightly different as explained below. AS Model: This model assumes lag-one Markovian structure for the inter-pattern relationships and in turn estimates the missing values based on the pattern vector occurring immediately prior to the gap [Figure 3]. =>

SI

S2

S3

I .. ··

1t

JJ

Sk_1 ISkl Sk+1 GAP

Sk+2

I.. ·· ISn_1

Sn

Figure 3. Conceptual Seasonal Segmentation for AS Model. The missing segment in the sequence of streamflows is designated as Sk' The pattern vector in segment Sk_l is used to estimate the missing values in segment Sk' The interpattern structure takes into account the Markovian transitions from other segments, similar to Sk_l and Sk' which were identified by the clustering technique. Johnson and Wichern (1988) suggest that the conditional mean and covariance of the segment Sk can be determined given that the segment Sk-l has occurred. In this paper, the conditional mean and covariance are considered sufficient statistics to describe the missing pattern vector for Sk and their formulation is explained as follows. Let S= [Sk ISk-lf be distributed as multivariate normal and denoted by Nil' ,I:) , for d ;::: 2, where

and

The determinant of the partitioned section I:k-1,k-l must be greater than zero, i.e. II:k_l,k_ I I > O. Given that the segment Sk.l has occurred, the conditional mean and covariance of missing segment Sk' is given by:

195

INFILLING MISSING MONTHLY STREAMFLOW DATA

Mean of sl: = ~ I: + El;J:-IE;~I,J:-l (SI:_1- ~I:-l) Covariance of SJ: =

El;J: -

and

El;J:-IE;~1,J:-IEI:_l,J:

The mean of St, as given above, is considered adequate vector to represent the missing segment St. The utility of this model is limited [panu (1992)] due to its dependence solely on the information contained in the data set with the missing values. The development of the CS model overcomes this difficulty, as presented below, by using additional information contained in the concurrent data set. CS Model: This model assumes cross-correlation between the data set with missing values and the concurrent data set. Both these data sets are respectively referred to as the subject river and the base river. It is noted that the base river could be any other data set (precipitation, streamflow, etc.), but it is simply referred to as the base river. The CS model is conceptually presented in Figure 4. GAP

SIc_!

I's~

SIc+I 1t

Slc+2

I.... I Sa_1

Sa

1t

Figure 4. Conceptual Seasonal Segmentation for CS Model. The missing values in segment St from the subject river are infilled based on the observed pattern vector in segment Sbt from the base river. The cross-pattern relationships in the CS model take into account the transitions from segments with similar characteristics (identified by the clustering technique) from Sht to segment St. The conditional mean and covariance of St can be determined based on the considerations outlined above and the observed pattern vector in the segment Sbk [Johnson and Wichern (1988)]. The formulation of conditional mean and covariance for the CS model is similar to that of the AS model with the exceptions as follows. The pattern vector for the segment Sbk is substituted for Stott and all subsequent occurrences of terms with subscripts (k-l) are substituted with the similar terms from the base river, i.e. substitute

196

c. GOODIER AND U. PANU

ILb k for ILk-I; lJ>k,k for Ek-I,k-I etc. The computer models are developed with the capability of assuming the successive segment to be missing. In turn, successive computer runs are conducted to infill each such missing segment. An application of both the models to a streamflow data is presented below. APPLICATION OF AS AND CS MODELS The streamflow gauging station (05QAOO1) of English River at Sioux Lookout, Ontario, was considered to be the site with missing values. An uninterrupted set of unregulated streamflows is available for this site from 1922 to 1981. Another streamflow gauging station (05QAOO2) of English River at Umfreville is located upstream of Sioux Lookout station. Flow values for this station are available from 1922 to 1990. This station is used as the base river in the CS model. Precipitation data at Sioux Lookout Airport is available for the period of 1963 to 1990. The precipitation data is used as another base river in the CS model. Since concurrent data from 1963 to 1981 is available for all three data sources, this 18 years of data is used in the application. For the application of the AS and the CS models to the monthly streamflow data of the English River at Sioux Lookout, the seasonality was inferred, from the correlogram and the periodogram, to be two six-month seasons or one twelve-month season. The starting and ending months, respectively, were determined for the six-month dry season to be November and April, and for the wet season to be May and October. On one hand, the presence of a single twelve-month season was inferred for precipitation data at Sioux Lookout Airport. On the other hand, the presence of two six-month seasons or one twelve-month season was assessed for streamflows of the English River at Umfreville. Experimental runs were conducted for both the models using two six-month seasons or one twelve-month season. The multivariate normality of the pattern vectors was best achieved by using natural log transformation. Clustering of the segmented data was performed based on the assumption that there were two clusters in each season. Results incorporating the clustering technique showed only minor deviations from those obtained without using sub-clusters. For brevity, only the results without using sub-clusters are presented in this paper. Both the models are applied to in fill a missing segment based on the assumption that such a segment occurs sequentially over the entire length of the data series. RESULTS AND DISCUSSION Three methods of analysis were used to examine the results; graphical, statistical, and entropic. As well, a comparison was made on the results obtained by inftlling the missing values: using the mean, minimum or maximum value for each month. Plots of the results for both the models are presented in Figure 5.

197

INFILLING MISSING MONTHLY STREAMFLOW DATA

LEGEND

Ca} AUTO SEAlES MODEL

Observed Flow InfiIledFlow

Oct-71

LEGEND

(b) CROSS SEAlES MODEL USING PAECIPrrATlON

Observed Flow InIilled Flow

Nov-63

Oct-67

Oct-7'

ee} CAOSS SEAlES MODEL USING 51REAMFLOW

Oet-79

Oct-7S

LEGEND Observed Flow InfiliedAow

Oet-67

Oet-n

Figure 5: Results from AS and CS models.

Oet-75

Oct-79

C. GOODIER AND U. PANU

198

Graphical Analysis: An examination of the results from the AS model indicates that the estimated values closely follow the observed values for most years, but entail larger error in case of extreme flows. This would be expected since the estimated values are based only on the flow values that have occurred during other years in the data series. To overcome this difficulty, one could use a multivariate random number generator to estimate the missing values such that the conditional mean and covariance of the estimated values are same as those of the data series. The estimated values from the CS model, using precipitation as the concurrent data, show deviations from the observed flow in many cases. This could be due to the effects of snowfall and spring runoff. During winter months, precipitation falls as snow, and does not entail any effect in streamflows until spring-thaw. This lag period is variable, and has an influence on the structure of covariance matrix in the CS model. Future development of the model could include a procedure to account for the variability in snowmelt phenomenon. Further examination of the results from the CS model, using another streamflow data as the concurrent data, indicates that the estimated values follow very closely the observed values for the entire range of flows. This would be expected, since both sets of data are in the same watershed, and the outcome of hydrologic events could be similar at both gauging stations.

Table 1. Summary of Percent Error of Various Infilling Methods

I

Infilling Method

II

Overestimate Range

I

Underestimate Range

9.4% to 84.0%

1.4% to 52.5%

Cross-Series with Precipitation

28.0% to 198%

16.2% to 58.9%

Cross-Series with Streamflow

7.5% to 28.7%

6.3% to 28.8%

Infilled by Mean

24.6% to 152%

16.0% to 37.0%

Infilled by Minimum

7.3% to 211 %

54.3% to 81.5%

Infilled by Maximum

53.4% to 467%

2.3% to 38.2%

Auto-Series

I

Statistical Analysis: The percent differences (positive or negative) between the estimated and the observed values are given in Table 1. These results were obtained separately for two cases: with and without the use of sub-clusters in each season. The results with the use of sub-clusters exhibit only minor deviations from the results obtained without using sub-clusters. It is in this vein that only the results without using sub-clusters are

199

INFILLING MISSING MONTHLY STREAMFLOW DATA

presented in this paper. Similar statistics are also included in the table for infilling missing values by using the mean, minimum or maximum value for each month. In Table 1, the least error occurs when a concurrent set of streamflow data is used in the CS model. However, the error is much larger for this model when other concurrent data such as precipitation is used. On the other hand, in such cases, use of the AS model entails smaller error and would be the obvious choice for data infilling. Table 2. Summary of Entropic Measures

I

Entropic Measure

I

Entropy

I

Reduction from He

% Reduction

H.nax

5.375

nla

nla

H.=2

0.693

nla

nla

Helustcred

0.427

nla

nla

H.nmov (AS)

0.414

0.013

3%

H(XIY) (CS - precipitation)

0.392

0.035

8%

H(XIY) (CS - streamflow)

0.181

0.246

58%

Other currently popular methods of replacing the missing values by such ad hoc values as mean, minimum, or maximum value entail too large an error [Table 1]. The results of our analyses indicate that one should avoid [panu and Mclarty (1991)] the use of such ad hoc methods of data infilling. Both of the models are further assessed for their effectiveness in reducing the uncertainty associated with infllied data values. Entropic measures, as explained in the Appendix, are used in such an assessment. Entropic Analysis: The results obtained for various entropic measures related to both the models are summarized in Table 2. From the above results, it is apparent that the maximum reduction in entropy (58 %) occurs for the CS model, when using streamflows of English River at Sioux Lookout, and at Umfreville. A large reduction in entropy also results, when seasonal segments are grouped into sub-clusters. This reduction in uncertainty is gained due to exclusion of certain clusters, once the occurrence of a particular season is known.

200

C. GOODIER AND U. PANU

CONCLUSIONS The AS model is found satisfactory in estimation of the missing values in normal range of streamflows but performs inadequately in case of extreme (high) flows. Statistical results show that the error in the estimated values could range from -53% to +84%. Entropic analysis indicates a small reduction (three percent) in entropy, when considering the system as Markovian as opposed to random. In other words, the assumption for interpattern structure to be of Markovian type does not appear valid for infilling purposes. The CS model using precipitation as another concurrent data provides widely varying estimates of missing values. In terms of percent error, the variability has been found to range from -59 % to + 198 %. Entropic analysis for such data sets indicates that only a small reduction in entropy (eight percent) is achieved. Such a small reduction in entropy is an indicator that there exists a poor cross-correlation between the streamflow data and the precipitation data. The CS model is found to perform adequately in estimation of the missing values with the use of concurrent streamflow data from a nearby station. The estimates of missing values have been found satisfactory in average flow range as well as extreme (high) flow range. The percent error in the estimated values ranges from -29 % to + 29 %. Entropic analysis indicates that a reduction of 58% in entropy is achieved. In other words, the use of concurrent data exhibiting high cross-correlation with streamflows data having missing values, provides satisfactory estimates of the missing values. ACKNOWLEDGEMENT The financial support provided by the Natural Sciences and Engineering Research Council of Canada is gratefully acknowledged in conducting various aspects of this investigation. The computational efforts by C. Goodier are especially appreciated. REFERENCES Afza, N. and U.S. Panu (1991) Infilling of Missing Data Values in Monthly Streamflows, An Unpublished Technical Report, Dept. of Civil Engineering, Lakehead University, Thunder Bay, Ontario. Domenico, P.(1972) Concepts and Models in Groundwater Hydrology, McGraw-Hill, San Francisco, U.S.A. Hartigan, J. and M. Wong (1979) "Algorithm AS 136: A K-Means Clustering Algorithm", Applied Statistics, 28, 100-108. Johnson, R.A. and D.W. Wichern (1988) Applied Multivariate Statistical Analysis, Prentice Hall, New Jersey. Khinchin, A.I. (1957) Mathematical Foundations of Information Theory,Dover Publications Inc., New York.

INFILLING MISSING MONTHLY STREAMFLOW DATA

201

Panu, U.S, and T.E. Unny (1980) "Stochastic Synthesis of Hydrologic Data Based on Concepts of Pattern Recognition", Journal of Hydrology, 46, 5-34, 197-217, 219-237. Panu, U.S. and B. McLarty (1991) Evaluations of Quick Data Infilling Methods in Streamflows, An Unpublished Technical Report, Dept. of Civil Engineering, Lakehead University, Thunder Bay, Ontario. Panu, U.S. (1992) "Application of Some Entropic Measures in Hydrologic Data Filling Procedures", Entropy and Energy Dissipation in Water Resources, Kluwer Academic Publishers, Netherlands, 175-192. Shannon, C.E. (1948) "The Mathematical Theory of Communication", Bell System Technical Journal, 27, 379-428; 623-656.

APPENDIX: ENTROPIC MEASURES OF UNCERTAINTY IN HYDROWGIC DATA The entropy of a system is a measure of its degree of disorder. Shannon (1948) first applied the concept of entropy to measure the information content of a system. Khinchin (1957) reports on Shannon's entropy in dealing with a finite scheme, which is applicable to a hydrologic data series. Entropy (H) as measure of uncertainty of a system is defined as follows: II

H(Pt,p2,···,pIl) = -

E Pk In(Pk) k=t

Where, n is the number of states and p" is the probability of the ICh state in a fmite scheme. The maximum entropy occurs when all outcomes are equally likely. For a series of n equally likely events, the probability of each event is lin, and the maximum entropy of the system is obtained as follows. HJI/IJU. = In(n) While grouping the segmented data into clusters, an adjustment must be made to the definition of entropy (H) to account for the clusters. The entropy for clustering, He, can be computed as given below. w

Hc = -

E kal

lit

p(sJ

E P(Ck) In[p(cJ] c-l

Where, w is the total number of seasons per year, nil. is the total number of clusters in any season k, p(cJ is the probability of cluster c in season k, and p(sJ is the probability of season k. The value of entropy for clustering does not take into account the ordering of

c. GOODIER AND U. PANU

202

clusters. That is, the clusters could have occurred in any order, and still the value of entropy would be same. To further look into the effect of ordering (Le., the dependence among clusters), the entropy of a Markov chain as applicable to the AS model is examined. Entropy of a Markov Chain (AS Model): Domenico (1972) describes the entropy of a Markov chain (H...) as the average of all the individual entropies of the transitions (H0, weighted in accordance with the probability of occurrence of the individual states. It is noted that there are as many states as there are clusters. The Markovian entropy can be expressed as follows. II

-E

H .. =

i=1

PiHj

Where, n is the number of states and Pi is the probability of occurrence of the state i. A measure of reduction in uncertainty can be obtained by taking the difference between He and Hm. In other words, the clustered data (Le., the dependence among clusters) is treated as Markovian rather than random. For the CS model, the entropy of a combined system consisting of two related systems must be used rather than the Markovian entropy. Entropy of a Combined System [CS Model]: Domenico (1972) suggests that entropic measures can be applied to situations where two related systems are observed. The applicability of such an entropic measure to the CS model is apparent. That is, two sets of observed data are available for the CS model, namely, streamflow data with a missing segment, and concurrent data with no missing segment. The measure of entropy of one system (say, the system X, be the streamflow data with the missing segment), given the knowledge of the observations in the other system (say, the system Y, be the concurrent data without the missing segment),is obtained as follows.

H(XIY) = -

II

II

i=1

j-l

E E

P(Xi,y)ln[P(Xi Iy)]

Where, P(Xi IYj) is the conditional probability of system X being in state Xj, given that system Y is observed to be in state yj' and P(Xj,y) is the joint probability of Xj and Yj. The original measure of entropy of system X, [H(X)] can be computed from II.: for a clustered system. Thus, the measure of uncertainty reduced in X, after observing Y, is obtained as follows. H(X ~ Y) = where, H(X

~

He - H(XIY)

Y) represents the decrease in uncertainty in X, after observing Y.

PART IV NEURAL NETWORKS

APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION

MU-LAN ZHUI, M. FUJITAI, and N. HASHIMOT0 2 IDepartment of Civil Engineering Hokkaido University Sapporo, 060 Japan 2Hokkaido Development Bureau Sapporo, 060 Japan In this paper, a new method to forecast runoff using neural networks (NNs) is proposed and compared with the fuzzy inference method suggested previously by the authors (Fujita and Zhu, 1992). We first develop a NN for off-line runoff prediction. The results predicted by the NN depend on the characteristics of training sets. Next, we develop a NN for on-line runoff prediction. The applicability of the NN to runoff prediction is assessed by making 1-hr, 2-hr and 3-hr lead-time forecasts of runoff in Butternut Creek, NY. The results indicate that using neural networks to forecast runoff is rather promising. Finally, we employ an interval runoff prediction model where the upper and lower bounds are determined using two neural networks. The observed hydrograph lies well between the NN forecasts of upper and lower bounds.

INTRODUCTION Neural network architectures are models of the neurons, they are not deterministic programs, but learn examples. Through the learning of training sets which consist of pairs of inputs and target outputs, neural networks iteratively adjust internal parameters to the point where the networks can produce a meaningful answer in response to each input. After the learning procedure is complete, information about relationship between inputs and outputs, which may be non-linear and extremely complicated, is encoded in the network. As we know, the relationship between rainfall and runoff is non-linear and quite complex due to many related factors such as field moisture capacity and evaporation rate, etc. A mathematical definition of this kind of relationship is difficult, thus it would be attractive to try the neural network approach which accommodates this kind of problem. The runoff prediction can be classified into several cases based on accessibility of hydrological data. Table 1 shows the two cases considered in this paper, where question mark "?" denotes the unknown future runoff that we are about to forecast. If runoff information at every moment by the current time for the present flood is 205 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 205-216. © 1994 Kluwer Academic Publishers.

206

M.-L. ZHU ET AL.

available, the authors call it the on-line case. On the contrary, if this infonnation is not available for the present flood, the authors call it the off-line case. To forecast runoff for the present flood, neural networks need first to learn about previous flood events. The learning procedure is conducted using the back-propagation TABLE 1. Classification of runoff prediction present flood cases

previous flood data past

off-line

on-line

rainfall data

accessible

runoff data

inaccessible

rainfall data

accessible

runoff data

accessible

present

accessible

accessible

future inaccessible

? inaccessible

?

learning algorithm involving a forward-propagation step followed by a backwardpropagation step (Rumelhart et al. 1986). Figure 1 illustrates a fully interconnected three-layer network. The details of forward and backward propagation steps are described as follows: Xl

XII input layer

02, output layer

Figure 1. A fully interconnected, three-layer neural network. Forward-propagation step: This step calculates the output from each processing unit of the neural network by starting from the input layer and propagating forward through the hidden layer to the output layer. Each processing unit except the units in the input layer takes a weighted sum of its inputs and applies a sigmoid function to compute its output. Specifically, given an input vector [xl'~'''' ,xm] , the outputs from each layer are calculated in this way: First, the input layer is a special case. Each unit in this layer just sends the input values as they are along all the output interconnections to the units in the next layer.

APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION

i=l,2 ... m

input layer:

207

(1)

where OJ denotes the output of unit i in the input layer. For the hidden and output layers, the output calculation of each processing unit is identical: taking a weighted sum of inputs and using a sigmoid function f on the sum to compute the output. m

hidden layer:

Slj=E OJljj +01j

j=l,2 ...n

(2)

;=1

(3) output layer:

"

s2 k=E 01j2tj+02k

k=1,2 ...p

(4)

j=1

(5) where olj, 02k are the outputs from unit j in the hidden layer and from unit k in the output layer respectively. lji is the interconnection weight between i-th unit in the input layer and j-th unit in the hidden layer, and 6)2tj is the interconnection weight between j-th unit in the hidden layer and unit k in the output layer. 01j , 02k are the biases of unit j and unit k respectively. Function f, a sigmoid curve, is expressed by (6), where x is defined over (-00, +00) so that the function values result in the interval (0,+1). 1 f(x)=-l+e-x

(6)

Backward-propagation step The back-propagation step is an error-correction step which takes place after the forward-propagation step is completed. The calculation begins at the output layer and progresses backward through the hidden layer to the input layer. Specifically, the output value from each processing unit in the output layer is compared to the target output specified in the training set. Based on the difference between output and target output, an error value is computed for each unit in the output layer, then the weights are adjusted for all of the interconnections that go into the output layer. Next, an error value is calculated for all of the units in the hidden layer and the weights are adjusted for all of the interconnections that go into the hidden layer. The following equations indicate this error-correction step explicitly. For unit k in the output layer, its error value is computed as: ak=(tk -02k)*!'(s2J

where

tk=the target value of unit k

(7)

in the output layer

/(s2 k) =the derivative of sigmoid function for S2k

M.-L. ZHU ET AL.

208

The interconnection weights going into unit k in the output layer then are adjusted based on the l> k value as follows: (iJ2lq~new)=(iJ2kj(old)+T\ l>kolj

(8)

where T\ is the learning rate. And the bias Ok for unit k in the output layer is corrected as follows: (9)

For unit j in the hidden layer, its error value is computed as: p

l>j=[L l)k(iJkj] !'(slj)

(10)

k=1

The interconnection weight c.>ji which goes to unit j in the hidden layer from unit i in the input layer is then corrected as follows: (11)

and the bias for unit j in the hidden layer is corrected as: Olinew)=Oli°ld)+T\l>j

(12)

During the learning procedure, the forward-propagation and back-propagation are executed iteratively for the training set. The adjustments of the internal parameters of NN including interconnection weights and biases are continued until the produced outputs are hardly improved further. In our following calculations, the NN was initialized by assigning random numbers from the interval (-1, +1) to all parameters. APPUCATIONS OF NEURAL NETWORKS

Off-line runoff prediction In the case of off-line runoff prediction, the runoff data for the present flood is inaccessible. This limitation determines that the NN should be developed to infer the future runoff based on only rainfall data. The runoff system equation may be expressed as:

Q(t)=f fR(t-l),R(t-l-l) ...R(t-d)} (13) where R, Q, t denote rainfall, runoff and time respectively, and 1, d are two parameters reflecting the hydrological characteristic of the basin. As mentioned previously, when a NN is used to address a complex non-linear problem, such as the relationship between rainfall and runoff, it must first learn from a training set so that a desired output will be produced. The trained NN is generalized to deal with inputs which may not be included in the training set. The validity of the

APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION

209

trained NN naturally depends on the characteristic of the selected training set. In order to make accurate runoff forecasts, it is important to understand the dependence of the NN on the training set. Since it is not easy to obtain various kinds of flood data from observations to examine this dependence, we employed the storage function method to simulate flood data arbitrarily. Equation (14) shows the basic equation of the storage function method where R. Q, L denote rainfall, runoff and lag-time respectively, and K, P denote the storage coefficient and storage exponent respectively.

Kd~:l =R(t-L)-Q(t)

(14)

In the simulation, we set parameters K=60, P=O.6, L=2, and the rainfall inputs be several patterns of triangles shown in Figure 3-Figure 11. The simulated floods were then calculated by applying a neural network developed corresponding to (13). It is easy to find that 1 in (13) is equivalent to L in (14), and d in (13) is related to the values of K and P. In this paper, the value of d was selected as 41 hr in accordance to the values of K and P previously stated. The neural network developed is shown in Figure 2. It should be pointed out that the value of d =41hr is not deterministic: slight changes on the d value hardly affect the NN's performance. First we trained the NN R It -2 ) using the two simulated floods shown in Figure 3, the learning R It -3 ) rate was 1') =0.5. The solid lines in Figure 3 indicate the two R It -4- ) simulation floods used as training data and the black circles indicate the computed R It -4-1 ) output from the trained NN for these two floods. The results Figure 2. The NN developed show that the NN learned the corresponding to (13). training data extremely well. In order to examine the degree to which the NN can generalize its training to forecast floods not included in the training data, we used the trained NN to forecast various types of floods. Figures 4, 5 show two of them, where the solid lines denote the O(t) obtained from (14) and the black squares denote the computed outputs from the NN. The results shown in Figure 4 indicate that the trained NN may yield good results for the validation data whose time to peak rainfall intensity lies between the ones of training data and duration of rainfall is equal to the training ones. However, as shown in Figure 5 the trained NN works poorly for the other validation data whose duration of rainfall is larger than the training data. Furthermore, we trained the above NN by presenting another training set which consisted of the four typical simulated floods shown in Figure 6. Five flood events as shown in Figure 7-Figure 11 were provided as validation data, where the two floods shown in Figures 7, 8 are just the same as previous validation data shown in Figures 4, 5. The forecast results shown in Figure 7-Figure 11 indicate that the NN

M.-L. ZHU ET AL.

210

0 5 10

Tr=15~r)

-e

rp=10 mm/hr) Tp=8(hr)

~

~ 2

0 5 10

-e

-e

..5. II:

II:

~

~

..5.

2

2

Figure 3. The two floods as training data.

0 5 10

Figure 4. The first flood for validation.

Figure 5. The second flood for validation. ~~~~~~~~~o

Tr=15(hr) rp=10(mm/hr) Tp=8(hr)

o

10

20

30

40

50

5 10

10

I

f

If 2

r)

Figure 6. The four floods as training data.

Figure 7. The first flood for validation.

E

~

Figure 8. The second flood for validation.

0 5 10

Tr=10(hr)

~10(mm/hr)

-e

p=8(hr)

~E

Figure 9. The third floods for validation.

2

Figure 10. The forth flood for validation.

I

~

If 2

0 5 10

2

Figure 11. The fifth flood for validation.

works well for these five floods. Although the rainfall input profiles in the validation data were completely different from those used in the training data, their durations of rainfall Tr's, their peak rainfall intensities rp's as well as their time to peak rainfall intensity Tp's could be seen to fall within the ranges of rainfall inputs used in the training data. This characteristic is here defined as interpolation, while the contrary is defined as extrapolation. By comparing Figure 5 with Figure 8, we can see that the

APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION

211

performance of NN depends on the training data used. Specifically, the performance of NN depends on whether the rainfall input of the forecasted flood event is an interpolation or extrapolation of training set. In conclusion, when a NN is developed to forecast runoff off-line, its performance depends closely on the training set. It is hard to generalize a trained NN to forecast floods, unless the NN is trained using data representative of the expected flood hydrographs. Besides, when a NN is applied to an actual basin, introducing some suitable parameters reflecting the basin initial condition into the NN is necessary. Otherwise, the NN can't achieve a convergence state in most cases when it tries to learn various observed floods having different basin initial conditions. This is unlike the simulation study here where the basin initial conditions were all the same and may simply be ignored. The applications of NNs to forecast runoff off-line in actual basins are being further studied. In the following section, we introduce another method to forecast runoff on-line. For this method, the utilization of current accessible runoff data provides an effect to correct runoff forecasting at every moment through inputting the new runoff data. The method shows applicable to actual basins. On-line runoff prediction In the case of on-line runoff prediction, current runoff data is accessible for the

present flood. Therefore, we may express the runoff system equation as: ll.Q(t)=/ fR(t-l) ...R(t-m),ll.Q(t-l) ... ll.Q(t-n)}

(15)

where ll.Q(t)=Q(t)-Q(t-l) , parameters m, n can be properly chosen by taking account of the hydrological characteristics of the basin and the forecast lead-time. In this paper, we have made I-hr, 2-hr and 3-br lead-time forecasts of the runoff in Butternut Creek, New York. The simplified runoff equation at this basin was adopted as: (16) llQ(t)=/ fR(t-3),R(t-4),IlQ(t-l)}

The NN developed in response to the above runoff system equation is shown in Figure 12. It should be noted that the structure of this NN is partly interconnected since the natures of rainfall inputs and runoff inputs are distinct. Since the output from the network is the increment of runoff which may take positive or negative values, the sigmoid function described in the previous section was redefined by (17), where the range of x is also defined from -co to +co but the function values result in the interval (-1,1). 2

f(x) = - - -1 l+e-x

(17)

We have five flood events in Butternut Creek. The first two floods shown in Figure 13 were chosen as training data and the last three floods as validation data, where the flood scales of validation data are almost the same level as the ones of

M.-L. ZHU ET AL.

212

training data. We have concluded that NNs may work well for interpolation problems but poorly for extrapolation R (t - 3 I problems in subsection (1), and this is the point for choosing a suitable RIt -4 I training set. The internal parameters of the NN were tuned in training procedure and AQ (t -1 I remained fixed to make I-hr, 2-hr and 3-hr lead-time forecast for Figure 12. The NN developed validation data. The learning rate T} corresponding to (16). was set 0.15 in the training procedure. The authors have also studied forecasting floods in an adaptive forecasting environment where the internal parameters of the NN are updated at every moment when new flood information is received (Zhu and Fujita, 1993). However, there is a practical difficulty with this method because the computation time increases substantially to forecast runoff with desired accuracy. The prediction algorithm can be expressed as follows: l1Q'(t+l)=/ {R(t-2), R(t-3), l1Q(t)}

(18)

l1Q'(t+2)=/ {R(t-1), R(t-2),l1Q'(t+ 1)}

(19)

(20) l1Q'(t+3)=/ {R(t), R(t-1), l1Q'(t+2)} where the denotations "I" are added to forecasted values to distinguish from observed values. the flood in the flood in Oct. 17, 1975 The I-hr' 2-hr and 3-hr lead-time Oct. 7, 1976 10 forecast results for the flood in Oct. 20, 1976, as examples, are shown in Figures 14, 15, 16 respectively. From these figures, we can see the condition that the prediction error is gradually Figure 13. The two floods as enlarged from I-hr to 3-hr leadtraining data. time forecasts. The forecast results for the validation data are evaluated based on the relative error of peak flow (Q;-QpJIQ;, the time difference of peak flow (t; -tp"} and variance (QI_Q fl)2/n (where Q;, QI are the forecasted peak flow, the time to forecasted peak flow and the forecasted flow respectively; Q;, QD are the observed ones; n is the number of samples on the forecasted flood). The evaluation results are shown in Table 2. Figures 14, 15, 16 and Table 2 indicate that the prediction accuracy is good even though the developed NN is extremely simple. However, overestimation around peak flow is seen in the forecast results. This is because that the NN has an effect of inertia: it is unlikely to produce a negative output of l1Q'(t+ 1) with enough

i

t;,

t;,

APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION

213

magnitude under a positive input of 11Q(t) when flow turns from an increasing to a decreasing stage. Furthermore, a larger positive prediction error may be caused in the further lead-time forecasts since the predicted values are used as indicated in the prediction algorithm in (19), (20). 0

0

-c3

~

0

"C"

"C"

5i

5i

10

10

-.::3

"C"

51a: 10

~

~

0~

Figure 14. 1-hr leadtime forecast results.

Figure 15. 2-hr lead-time forecast results.

Figure 16. 3-hr leadtime forecast results.

TABLE 2. Evaluations for the results forecasted by neural network method

the flood in Oct. 20, 1976

relative error of peak flow

time difference to peak flow

variance

I-hr lead-time forecast

0.0109

1hr

1.025E-03

2-hr lead-time forecast

0.0529

Ohr

6.177E-03

3-hr lead-time forecast

0.0808

1hr

1.765E-02

I-hr lead-time forecast

the flood in 2-hr lead-time forecast Sept. 26, 1977

0.0138

Ohr

4.836E-04

0.0546

1hr

2.412E-03

3-hr lead-time forecast

0.0879

1hr

6.441E-03

I-hr lead-time forecast

0.0456

-lhr

3.512E-03

2-hr lead-time forecast

0.1012

Ohr

1.948E-02

3-hr lead-time forecast

0.1514

1hr

5.44E-02

the flood in Oct. 16, 1977

The authors have previously studied the forecasting of runoff using the fuzzy inference method. The method first establishes the fuzzy relation between rainfall and runoff based on previous flood data; then it forecasts the future runoff for the present flood through making a fuzzy reasoning based on the above obtained fuzzy relation. The method was applied to forecast the same flood events in Butternut Creek. Figures 17, 18, 19, as examples, show the 1-hr, 2-hr and 3-hr lead-time forecast results for

214

M.-L. ZHU ET AL.

the flood in Oct. 20, 1976 respectively. Table 3 presents the evaluation results for these forecasts based on the same criteria as shown in Table 2.

~~"r-~~~~O

51 -e-

Lead llme=2hr

10

Figure 17. 1-hr leadtime forecast results.

Figure 18. 2-hr lead-time forecast results.

Figure 19. 3-hr leadtime forecast results.

TABLE 3. Evaluations for the results forecasted by fuzzy inference method

the flood in Oct. 20, 1976

the flood in Sept. 26, 1977

the flood in Oct. 16, 1977

relative error of peak flow

time difference to peak flow

variance

I-br lead-time forecast

0.0297

1hr

1.763E-03

2-br lead-time forecast

0.0861

Ohr

1.08E-02

3-br lead-time forecast

0.1617

Ohr

3.258E-02

I-br lead-time forecast

0.047

Ohr

8.2E-04

2-br lead-time forecast

0.1171

1hr

5.757E-03

3-br lead-time forecast

0.2201

Ohr

1.904E-02

I-br lead-time forecast

0.0511

-lhr

3.488E-03

2-br lead-time forecast

0.1198

Ohr

1.913E-02

3-br lead-time forecast

0.1909

1hr

5.5E-02

Comparing Table 2 with Table 3 and Figures 14, 15, 16 with Figures 17, 18, 19, we can see that the prediction accuracy of the NN method shows slightly better performance than that of the fuzzy inference method. Besides, the calculation time of the NN method for forecasting runoff is extremely short, since the time-consuming job of tuning the internal parameters of the NN can be executed in advance. However, when we try to make a long lead-time forecast of runoff, predicted information for rainfall is needed. At the present level of technology , this information is provided

APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION

215

by weather radar described qualitatively as weak, medium, or strong intensity. To utilize this kind of qualitative rainfall prediction information for making runoff prediction, the fuzzy inference method appears quite useful. A successful attempt to utilize the qualitative rainfall prediction information to make a long lead-time forecast of runoff was made by the authors recently (Zhu and Fujita, 1994). The aim of our next study is to develop a new method that combines the advantages of the NN and fuzzy inference methods. Interval prediction In principle, the more training sets a NN learns from, the more accurate output it will

yield. These training sets should consist of various types of floods. On the other hand, such various types of floods as training sets make it difficult for a NN to converge to a desired state. To avoid this difficulty, we may employ a modified learning algorithm proposed by Ishibuchi which was originally developed for determining the upper and lower bounds of a nonlinear interval model (Ishibuchi and Tanaka, 1991). The modified learning algorithm is the same as the back-propagation learning algorithm, except that it introduces a coefficient C to (7)

a1 =c*(t1 -02J*!(s2J

(21)

First, in the learning procedure, if the output value 021 from NN is more than or equal to target value t1 , let C take a very small positive value a, and thus the internal parameters of NN such as interconnection weights and the biases whose corrections are based on the magnitude of error value a1 will be changed only slightly. On the other hand, If 021 is less than t1 , let C take 1, and thus the internal parameters will be corrected substantially. As a result, through presenting the training set to NN iteratively, the outputs from NN fmally will be expected to be all larger than or equal to the target values. The NN trained in this way is defined as NN·. Second, let C take its value just contrary to the above case. And thus, after the learning is completed, the outputs from NN will be expected to be all smaller than or equal to the target values. The NN trained in this way is defined as NN•. We consider that this modified learning algorithm may be applied to make an interval prediction of runoff where the lower and upper bounds of interval are provided by NN. and NN· respectively. Especially, when a NN fails to achieve a convergence state for learning various types of floods simultaneously, applications of NN. and NN· may be meaningful since NN. and NN· may automatically focus on small runoff cases and large runoff cases respectively. Again, we chose the Butternut Creek for calculation, and adopted the same runoff system equation and the same structure of neural network as stated in subsection (2). Using the above modified learning algorithm, we obtained two different neural networks NN. and NN· through learning the same training set which consists of the first two floods in this basin shown in Figure 13. In the calculation, a was set at 0.01. The trained NN. and NN· then were used to carry out the forecast task for the

216

M.-L. ZHU ET AL.

last three floods. The results of the 1-br leadtime forecast for the flood in Oct. 20, 1976 are shown in Figure 20, where the solid lines Lead Time= 1hr represent observations, the dashed lines II: represent the predictions provided by NN. and 10 the skip-dashed lines represent the results predicted by NN·. Figure 20 shows that the observed hydrograph falls within the predicted upper and lower bounds, and the other two results show the same condition. It should be pointed out that the aim of above application in Butternut Creek was to examine the validity of the Figure 20. The result of interval prediction modified learning algorithm for making interval for the flood in Oct. 20, 1976. predictions. However, the floods in this basin can be forecasted well employing a single NN as described in subsection (2).

51

CONCLUSIONS AND REMARKS Several neural networks have been developed to forecast runoff in three manners, an off-line prediction, an on-line prediction and an interval prediction. The dependence of the NNs' performance on the training set for the off-line prediction was discussed and this work helped us understand how to construct an adequate training set. However, for the off-line prediction, further study about the application to actual basins is still needed. On the other hand, the method for the on-line prediction appeared very applicable. The interval runoff prediction model, which consisted of two neural networks provided a way to estimate the upper and lower bounds' of a flood. REFERENCES Fujita, M. and Zhu, M.-L. (1992) "An Application of Fuzzy Theory to Runoff Prediction", Procs. of the Sixth IAHR International Symposium on Stochastic Hydraulics, 727-734. Rumelhart, McClelland and the PDP Research Group (1986) Parallel Distributed Processing, MIT Press, Cambridge, Vol.1, 318-362. Zhu, M.-L. and Fujita, M. (1993) "A Comparison of Fuzzy Inference Method and Neural Network Method for Runoff Prediction", Proceedings of Hydraulic Engineering, JSCE. Vol.37, 75-80. Zhu, M.-L. and Fujita, M. (1994) "Long Lead Time Forecast of Runoff Using Fuzzy Reasoning Method", Journal of Japan Society of Hydrology & water resources, Vol.7, No.2. Ishibuchi, H. and Tanaka, H.(1991) "Determination of Fuzzy Regression Model by Neural Networks", Fuzzy Engineering toward Human Friendly Systems, Procs. of the International Fuzzy Engineering Symposium'91, Vol.1, 523-534.

PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS S.P. ZHANG, H. WATANABE and Nihon Suido Consultants, Co. Ltd. Okubo 2-2-6, Shinjuku-ku, Tokyo, 169, JAPAN

R. YAMADA

In this paper a new approach based on the artificial neural network model is proposed for the prediction of daily water demands. The approach is compared with the conventional ones and the results show that the new approach is more reliable and more effective. The fluctuation analysis of daily water demands and the sensitivity analysis of exogenous variables have also been carried out by taking advantage of the ability of neural network models handling with non-linear problems.

INTRODUCTION An accurate short-term prediction (such as prediction with a lead-time one day) of daily water demands is required for the optimal operation of city water supply networks. A lot of researches dealing with the prediction methods have been reported, such as the multiple regression model (e.g., Tsunoi, 1985), the ARIMA model (Koizumi et al., 1986), and the model based on the Kalman Filtering Theory (Jinno et al., 1986). This research effort has been made because of not only a great number of the factors which make daily water demand fluctuating very widely, but also the complexity of the relations between daily water demand and exogenous variables (such as temperature, weather, etc.) and the stochastical properties of exogenous variables. In essence, the majority of the short-term water demand prediction models published have treated daily water demands as a stochastic time series, and described the relation between daily water demand and exogenous variables by use of a linear expression. However, a lot of researches have shown that the relations usually are nonlinear, then these models will no longer be adequate. In this study we shall predict the daily water demands by use of an artificial neural network model, which is expected to be able to handle with the non-linear relations. To verify the efficiency of the neural network model, it will be compared with the conventional models. The fluctuation analysis of daily water demands and the sensitivity analysis of exogenous variables will also be carried out by taking advantage of the ability of neural network models handling with non-linear problems. 217 K. W Hipel etal. (eds.). Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 217-227. © 1994 Kluwer Academic Publishers.

S. P. ZHANG ET AL.

218

NEURAL NETWORK MODEL FOR PREDICTION OF DAILY WATER DEMANDS Introduction to neural network model A neural network is a network system constructed artificially by idealizing the neurons (nerve cells), and consists of a number of nodes and lines that are called respectively units and connections (or links). By network structures, neural networks generally are classified into two types: layered network and interconnected network. It has been shown that a layered network is suitable to prediction problems due to its abilities of learning (self-organization) and parallel processing. Figure 1 shows the structure of a layered neural network, which have a layer of input units at the top, a layer of output units at the bottom, and any number of hidden layers between the input layer and the output layer. Connections exist only between two layers next to each other, and connections within a layer or from higher to lower layers are forbidden.

Xl~

X=Wl"Xl+W2"X2+W3"X3+8

X2---7w~~ @-7 Y=f(X)=l![ 1+exp( -X)]

X3~3

X unit input Y uni t output w connection weight e threshold value A,

~

Input

Inpu t Layer i Hidden Layer j Output Layer k

~

Output

,

I difference 1- -------'

TeacQsignal

Figure 1.

Structure of a layered neural network.

A neural network can be modeled as follows. For convenience' sake, we consider a neural network consisting of three layers. Let the unit numbers of input, hidden and output layers be N, M and 1, respec-

PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS

219

tively. When an input {Ii, i = 1,2,···, N} is given to the units of the input layer, the inputs and outputs of the hidden layer and output layer units are represented by

y.J

=

f(Xj)

X-J

=

0

=

Lw·I+e· 'J' J i=l f(Z)

Z where

y.J X·J

f(-)

w·· 'J

e·J

o Z

WJ

e

=

j

N

j

= 1,2,··· ,M

= 1,2,···,M

(1)

(2) (3)

M

LWj~ +8

j=l-

(4)

output from the unit j of the hidden layer. input to the unit j of the hidden layer. unit output function. connection weight of the input layer unit i and hidden layer unit j. threshold value of the hidden layer unit j. output of the output layer unit. input to the output layer unit. connection weight of the hidden layer unit j and output layer unit. threshold value of the output layer unit.

For the unit output function f(·) some ex-pressions have been proposed. The Sigmoid function is the one used most widely.

f(X)

= 1 + le_X

(5)

Theoretically, the neural network model expressed by (1),,",(5) approximates any non-linear relations between inputs and outputs with any degree of accuracy by using enough hidden layer units and setting connection weights as well as threshold values to be appropriate (Asou, 1988).

Inputs to the neural network model The inputs to the neural network model for the daily water demand prediction problem are the exogenous variables which cause the fiactuation of daily water demands. According to Zhang, et al. (1992) and Yamada et al. (1992) the main factors are the following five exogenous variables: 1) last day's delivery 2) daily high temperature 3) weather 4) precipitation 5) day's characteristics.

220

S. P. ZHANG ET AL.

The forecasts of daily high temperature, weather and precipitation are available from the weather report presented in the morning or in the last evening by the meteorological observatory. For the model user's convenience and considering the accuracy of weather report, we treated weather and precipitation as discrete variables, i.e., weather is classified as sunny, cloudy and rainy, and precipitation is classified as three ranks, i.e., [Ommjday,lmmjday), [lmmjday,5mmjday) and [5mmjd,00). For the day's characteristics, weekday and sunday (including National Holiday) are distinguished. The variable values as inputs to the neural network model are defined as follows. 1). The last day's delivery is transformed into a variable which belongs to (0,1) by

11 =

1 1 + e-a(Q-Q)

------:---==:-

(6)

where II is the transformed last day's delivery, Q the real last day's delivery, Q the mean of delivery records, and (Y a parameter for guaranteeing that the transformed daily delivery record series {Iit), t = 1,2" . " T} is a uniform distribution, which is very important to improve the accuracy of prediction. 2). The daily high temperature is also transformed in the same way to the last day's delivery, that is, 1

12 = -----=(7) 1 + e-{J(T-T) where 12 is the transformed high temperature, T the real daily high temperature, T the mean of daily high temperature records, and f3 a parameter for guaranteeing that the transformed daily high temperature record series {I~t), t = 1,2, ... ,T} is a uniform distribution. 3). Weather is quantified as follows. 0.0,

h = { 0.5, 1.0,

for sunny for cloudy for rainy

(8)

where 13 is the quantified variable corresponding to weather. For most of statistical models with a linear structure, it is very important how to quantify a non-quantitative discrete variable. However, due to the ability of neural network model handling with non-linear problem through learning (self-organization) process, the quantifying procedure here becomes very simple. 4). Similarly to weather, precipitation is quantified as follows. 0.0, 14 = { 0.5, 1.0,

RE[Ommjday, 1mmjday) RE[lmmjday,5mmjday) RE[5mmjday,00)

where 14 is the quantified variable corresponding to the daily precipitation R.

(9)

PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS

221

5). Finally, the day's characteristics are quantified as

Is

= { 0.0,

1.0,

for Sunday and National Holiday for Weekday

(10)

where Is is the quantified variable corresponding to the day's characteristics.

Composition of the neural network model Identifying the composition of the neural network model means deciding layer number of network and unit number of each layer. In this study, the input layer has five units corresponding to the five exogenous variables, and the output layer has only one unit corresponding to the prediction of daily water demand. The layer number has been set to 3 and the unit number of the only hidden layer has been decided to be 17 by the following procedure , which is based on the philosophy that a simpler structure is a better structure (Zhang et al., 1992). Step 1: Set the unit number j=l. Step 2: Train the neural network with the learning procedure (see the next section) until the difference of the outputs between successive iteretions is within a specified error. Step 3: Calculate the mean relative error of outputs. Step 4: Set j=j+l and repeat the above steps until the mean relative error is less then a specified expectation of prediction relative error.

Learning process of the neural network model As mentioned in Introduction to Neural Network Model, in order to obtain an accurate prediction of daily water demand, it is necessary to set the connection weights of units and the threshold values of each unit to be appropriate. In the case of neural network models these parameters are to be identified through "Learning". The term Learning here means the self-organization process through which the neural network model automatically adjusts these parameters to the appropriate values, when a series of samples of input-output data (called teacher data or teacher signals) are shown to the model. If we consider the information processing in neural network models as a transformation of input data to output data, then model learning can be considered to be the process through which the neural network model gradually becomes capable of imitating the transformation patterns represented by the teacher data. A lot of learning algorisms have been proposed. In this study we use the Error Back Propagation algorism, the most popular learning algorism. The Error Back Propagation algorism can be summarized as follows (Rumelhart et al., 1986). Suppose T sets of teacher data

{I~t), ... , I~), a(t)},

t

= 1,2,···, T

222

S. P. ZHANG ET AL.

are gIven. Consider an initial value

for the connection weights and threshold values, respectively. Then the outputs corresponding the inputs of the teacher data {I?), ... , t = 1,2"", T} can be obtained from (1) to (5). Let the outputs as {U(t), t = 1,2"", T} . It is easy to understand that {U(t), t = 1,2, ... ,T} are different from the outputs of the teacher data {a(t), t = 1,2, ... , T}, and an error function can be defined as follows.

IW,

T

= ~] a(t)

R(O)

(11)

_ U(t))2

t=1

It is clear that R(O) is a function of connection weights and threshold values. The Error Back Propagation algorism makes use of the connection weights and threshold values that minimize the etror function R(O), and a nonlinear programming method as well as an iteration process are applied to solve the optimization problem and obtain the optimal (sometimes suboptimal) connection weights and threshold values. In this study the steepest descent method is used. The final iteration (learning) procedures are as follows. T

(k+I) wJ

(12)

wy) -1]' l:(8(t) 'lj(t)) t=1 T

e(k+I)

=

e(k) -

1] .

2:( 8(t))

(13)

t=1 (k+l) w,

w~)

'J

T

-

1]' l:(8(t) . W)k+l) . "I)t) . I;(t))

(14)

t=1

2:( 8(t) . w;k+1) . "Ijt)) T

e(k+1)

ejk) -

J

1] .

(15)

t=1 8(t) (t) "Ij

(a(t) -

=

lj(t) .

U(t)) . a(t) .

(1 _

lj(t))

(1 -

a(t))

(16)

(17)

where the superscript (k) indicates the number of learning (iteration), and 1] is a small positive number, which indicates the step size of the steepest descent method. In order to avoid the overfitting problem, the criterion that the learning process ends when the mean relative error of the outputs is less then a specified expectation of prediction relative error is applied as the stopping criterion. In this study we have set 1] = 0.25 and the expectation of prediction relative error to 2.0% . For the initial values of the connection weights and threshold values, some random numbers within interval [-0.5, 0.5] generated by Monte Carlo method are applied. The learning of the neural network model has been carried out by taking use of the weather and daily delivery records from April, 1982 to March, 1990 as teacher data.

PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS

223

PREDICTION RESULT ANALYSIS With the learned (identified) neural network model, the daily water demands from April, 1990 to March, 1991 have been predicted and compared with the records(Figure 2). The relative error distribution of the predictions is shown in Figure .3. It can be seen that the relative error is less than .5% for .3.39 days of the year. The five days when the relative error is greater than 10% are a special holiday, the New Year day, and three typhoon-striking days. In general, the predictions by the neural network model are in an excellent agreement with the records.

CCl

o

_

310

record

prediction

?~i0!'~e
Figure 2.

Predictions of daily water demands.

3~~

25,)

200 til

"" OJ

1::~

0

1("0

50 S( 1. 4%)

0

Relative

Figure 3.

Error(~)

Relative error distribution of the predictions.

The predictions by the neural network model have also been compared wit.h those by the multiple regression model, ARIMA model and the model based OIl the Kalman Filtering Theory (for detail about the compared models, see Zhang et al., 1992). The results are shown in Table I, where the following three indexes are applied to estimate

224

S. P. ZHANG ET AL.

the fit-goodness of the predictions with the records.

TABLE 1.

Comparison of different models

HODEL

HRE(%)

CC

RRHSE

Hultiple Regression Hodel

2.90

O. 764

0.659

ARIHA Hodel

2.80

O. 794

0.623

Kalman Filtering Hodel

2.69

0.808

0.599

Neural Network Hodel

2.12

0.877

0.483

1. Mean Relative Error (MRE).The smaller the mean relative error is, the better

the predictions are.

2. Correlation Coefficient between predictions and records (CC). The larger the correlation coefficient is, the better the predictions are. 3. Relative Root Mean Square Error (RRMSE). RRMSE=O for the perfect predictions, and RRMSE=l if the predictions are equal to the mean of the records.

It can be seen that all of the indexes show the neural network model gives the best predictions of the tested four models. From the above results we can say that the neural network model is more reliable and suitable for a practical purpose.

FLUCTUATION STRUCTURE ANALYSIS OF DAILY WATER DEMANDS Although the nonlinear relations between daily water demands and exogenous variables has been recognized, most of the published methods described the relations with a linear expression, as pointed out in Section 1. It is clear that these methods can not give us correct knowledge about the fluctuation structure of daily water demands. In this meaning the neural network model greatly differs from the conventional models, that is, the neural network model has a nonlinear structure and can simulate any complex relations between input and output through learning. Figure 4 shows some simulation results by the learned (identified) neural network model. From these results we can say that the neural network model indeed is able to handle with a nonlinear problem and that a model with nonlinear structure is necessary to describe correctly the relations between daily water demands and exogenous variables. However, to identify the nonlinear relations between daily water demands and exogenous variables more simulations are needed, which is a. meaningful problem to be studied.

PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS

225

~ ~--------------~--~--~~--~~

: (Sunny, I 1 =372(103n1»:

390

3S'l

370

360 350

~.' ..... :.' ..... ~."

....:....... '~.' ..... :.' .. "':." ............ : :. . ...•.....

. . . .. ........;.. . :...... i ......:......:...... :...... ~ ..... ;. . :.::>,:<:..... ,.... . . ..........~...... .,; : : : .:.... : ; ..... : ..... ·;·'·'''·'··;·'··<··~·····r··· -:...... : .... .

r···· T'

i·····:······:······:······~·····t··~Weekday···i·····

: : : : ... : .... Sunday&hol iday

:

22

18

14

34

26

400 ~------------------~~--~--~~--~

: (Rain(15mm/day) , It =372 (103n1 »

390

: ..... : ..... : ......:...... :...... : ............:...... :.....•..... :

:

: ...

·:· ·

. .. r'" . :..... ':" .... ;..... :.......... '.: ::·.:·:·:·:·:r····1

.' .... : .....

370

:·····;·····r···· r····T·····;·····;· :.>(' "T'" ··:·····1 : ..... : ..... ~ ......:...... :..... :.::/:: ....

: 350

'"o

: : : : : . ., .. .. .. .

.

3S'l

360

13

: .: . . .. .

:

:

:

weekd~y .. ~ ..... : . ~ .., ...... : ......... Sunday&hol'iday :

26

18

14

34

400

: (Weekday , II =372(103n1):

390

; ..... : ..... : ......:...... ~ ..... ; .....•.......

3S'l

; ..... ;. .....;. .....: ....•. ;. ..... ; .....

370

360

:

.

:

.

',

.....•..•..•.•~......:~..-~.:::.:.::.:):.:..

~

>. :...;---.. .

..'.

. . . . • :'" ; ...... : .. '.'

: .... ;.. ...: " .. "..

............-:. . .;? .... ~.

.. :..

. ;.

. . ..... ;. ..... ;

....

...: ..... :

.

.'.

.. .... :::-:-:-:= Rainy:

:

.. ' .. CloudY:

..... ~ ..... ~ ............. :..... ;'"

:

.... ~ ...

Sunny:

350

~1~4--~--~18--~--~~--~--~2~6--.---3r0--~--T34

400

~~----------------~~~~~~~--~

390

: ..... : ..... ~ ......:...... ;...... : ..... ; ......;.- .... ;... ...; ..... :

3S'l

; ..... ; ......;. .....;. ..... :...... : ..... ; ......:..• ,.,-;.;.:: •• :;:" .. :

: (Sun~ay&~olid~y. 1:1 =37~ (l 03~) ~

370

360

~

~

. ........-..... ; .....

.

.:

~':

..

.,~.:.

........;...

..

~

..

i.·· ...:. ..-.::.:.:~. . .,. --~.,., . . ,.;<. <: >. .:. : ....:...... :...... ; .

.

.

.

.: "-..;" .: . ..

: ..... : ..... : ......:...... ; .

;....

.

.

~. ::::-:~ R~iny ;

Sunny; . : .. Cloudy

.. .. ~ ...

3~ ~--~--~--r-~---r--~.--~---r, --~~~~" g 22 H _ ~

~

Daily High Temperature(1:)

Figure 4.

Fluctuation of daily water demands.

S. P. ZHANG ET AL.

226

SENSITIVITY ANALYSIS OF EXOGENOUS VARIABLES A sensitivity analysis is necessary to estimate the influences of exogenous variables on daily water demands. According to definition the sensitivity coefficient of the exogenous variables can be calculated with the following equations, which are derived from (1) to ('5).

-aO aI = f(2) . [1 -

f(2)] . A· B

aO aI = {aO/aIl ,···, aO/aIN V, A = {Wij}, B = {wJ ' f(X j ). [1- f(X j )]},

where

(18) a N x 1 vector. a N x M matrix. a M x 1 vector.

Figure 5 shows the changing sensitivity coefficients of the five exogenous variables in one year, which are the monthly average values of those calculated with the 8 year's weather and daily water demand records by (18). The results can be summarized as follows. 1. The daily high temperature is a main factor of the fluctuation of daily water demands in one year except in the winter season. 2. The influences of the last day's delivery on daily water demand are also very great, especially in the fall and winter seasons, but the its range of changing with time is smaller than that of daily high temperature. 3. The influences of the other exogenous variables are far less than those of the daily high temperature and last day's delivery. 1.8 :.6

..,

c:: .~ CJ

~

U

..,» ..... ..,> ..... '"c::
~I2

1.4

..............

1.2

0.8 0.~ ~.4

.

".

~

,

.

.......

0.2 0 -::'.2 -('.4

~_""b

6

h

?i

7

I~ 11S

q

10

II

12

2

Month

Figure 5.

Sensitivity coefficients of exogenous variables.

PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS

227

SUMMARY In this study we applied a neural network model for predictions of daily water demands and verified that the model is very efficient and reliable. The neural network model differs greatly from most of the statistical model in the ability handling with a nonlinear problem. The results of the fluctuation structure analysis of daily water demands and the sensitivity analysis of exogenous variables have suggested that a neural network model can be used to identify the demand structure due to the selforganization ability of the model.

REFERENCES Jinno,k., Kawamura,T., and Ueda,T.(1986) "On the dynamic properties and predictions of daily deliveries of the purification stations in Fukuoka City", Technology Reports of the Kyushu University 59(4), 495-502. Koizumi,A., Inakazu,T., Chida,K., and Kawaguchi,S.(1988) "Forecasting of daily water consumption by multiple ARIMA model", J. Japan Water Works Association 57{ 12), 13-20. Tsunoi,M.(1985) "An estimate of water supply based on weighted regression analysis using a personal computer, J. Japan Water Works Association 54(3), 2-6. Zhang,S.P., Watanabe,H., and Yamada,R.(1992) ;'Comparison of daily water demand prediction models", Annual Reports of NSC, Vol.18, N 0.1. Asou,H.(1988), The Information Processing by Neural Network Models, Sangyo Publishers, Tokyo. Yamada,R., Zhang,S.P., and Konda,T.(1992) "An Application of Multiple ARIMA Model to Daily Water Demand Forecasting", Annual Reports of NSC, VoLl8, No.l. Rumelhart,D.E., Hinton,G.E., and Williams, R.J.(1986) "Learning Representations by Back-Propagating Errors", Nature, 323(9), 533-536.

BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING

GERSON LACHTERMACHER1 and J.DAVID FULLER2 IDepartment of Industrial Engineering, University Federale Fluminse, Rio De Janeiro, Brazil 2Department of Management Sciences, Faculty of Engineering, University of Waterloo Waterloo, Ontario, Canada N2L 3Gl One of the major constraints on the use of backpropagation neural networks as a practical forecasting tool, is the number of training patterns needed. We propose a methodology that reduces the data requirements. The general idea is to use the Box-Jenkins models in an exploratory phase to identify the "lag components" of the series, to determine a compact network structure with one input unit for each lag, and then apply the validation procedure. This process minimizes the size of the network and consequently the data required to train the network. The results obtained in four studies show the potential of the new methodology as an alternative to the traditional time series models.

INTRODUCTION Most of the available techniques used in time-series analysis, such as Box-Jenkins methods (Box-Jenkins,1976), assume a linear relationship among variables. In practice this drawback can make it difficult to analyze and predict accurately the real processes that are represented by these time series. Tong (1983) described some drawbacks of linear modelling for time series. In the last decade several nonlinear time series models have been studied, such as the threshold autoregressive models developed by Tong & Lim (1980). These are 'model-driven approaches' (Chakraborty et al., 1992) in which we first identify the type of relation among the variables (model selection) and afterwards estimate the selected model parameters. More recently, neural networks have been studied as an alternative to these nonlinear model-driven approaches. Because of their characteristics, neural networks belong to the data-driven approaches, i.e. the analysis depends on the available data, with little a priori rationalization about relationships between variables and about the models. The process of constructing the relationships between the input and output variables is addressed by certain general purpose 'learning' algorithms (Chakraborty et al., 1992). Some drawbacks to the practical use of neural nets are the long time consumed in the modelling process and the large amount of data required by the present neural network methodologies. Present methodologies, depending on the problem, can take several hours or even days in the neural network calibration process, and they require 229 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 229-242. © 1994 Kluwer Academic Publishers.

230

G. LACHTERMACHER AND J. D. FULLER

hundreds of observations. However, in order to be a practical methodology these requirements should be reduced to a few hours and to a few dozen observations. One cause of both problems is the lack of a definitive generic methodology that could be used to design a small network structure. Most of the present methodologies use large networks, with a large number of parameters ('weights'). This means lengthy computations to set their values, and a requirement for many observations. Unfortunately, in practice, a model's parameters must be estimated quickly and just a small amount of data are available. Contradictory conclusions about the forecasting performance of the neural network model, compared with the traditional methods have been reached by several authors (Tang et al.,1991). The explanation for such contradictions may be related to differences in factors such as the network structure used, the type of series (e.g. stationary, nonstationary) used in the studies and the relation of the size of the network structure and the number of entries of the time series. The goal of this research is to devise and evaluate a practical methodology for the use of neural networks in forecasting, using noisy and small real world time series. Until the present time, no method available could satisfactorily deal with this kind of problem. Our approach is a hybrid of Box-Jenkins and neural networks methods. It uses the BoxJenkins methods to identify the 'lag components' of the data that should be used as input variables, and employs a heuristic to suggest the number of hidden units needed in the structure of the model. In this way, we can define a small but adequate network, thus reducing the time used to identify and construct an appropriate model, and reducing the data requirements. In addition, the use of 'synthetic' time series (Hipel & McLeod, 1993) as the validation set further decreases the data requirements, helping to overcome the problem of short time series. The next section briefly describes the backpropagation learning procedure for neural networks (Rumelhart et al., 1986), and gives a brief literature review of the application of neural networks in time series analysis. The following section describes the methodology used in this paper and the performance tests used to compare the resultant models to other traditional time series models. Next we present the results obtained. Four time series have been studied and the performance of the neural network has been compared to the proper ARMA model and to other time series models. All series are stationary (annual river flow). The final section concludes that the forecasting performance of the neural network models is usually as good as or better then the alternative models. The methodology has proven to be a practical method to reduce the modelling time by identifying a small suitable structure that can handle the problem. Moreover, we suggest that this work should be extended to other types of series not studied here (e.g. nonstationary, seasonal and cyclic time series).

RELEVANT BACKGROUND Neural networks are composed of two primitive elements: units (processing elements) and connections ('weights') between units. In essence, a set of inputs are applied to a unit that, based on them, fires an output. Each input has its own influence on the total output.

BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING

231

In other words, each input has its own weight in the total output. The connections of several units (artificial neurons), arranged in one or more layers, make a neural network. Many network structures are arranged in layers of units that can be classified as Input, Output or Hidden layers. According to the type of connections between units, neural networks can be classified as feedforward or recurrent. In feedforward networks, the units at a layer i may only influence the activity of units at higher levels ( closer to the system's output). Also, in a pure feedforward system, the set of input units to one layer cannot include units from two or more different layers. The recurrent models are characterized by feedback connections. This type of connection links cells at higher levels to cells at lower ones. In this study we used pure fully connected feedforward neural networks. Backpropagation learning procedure (Rumelhart et al., 1986) is a gradient descent method that establishes the values of the neural network parameters (weights and biases) to minimize the output errors, based on a set of examples (patterns). This learning procedure was used in this study. The resolution of the conflicting conclusions, reported in the literature, may be connected to the fact that a neural network model's performance is highly dependent on its structure, activation function of the units, type of connections, backpropagation stopping criteria, data normalization factors and on the overfitting problem, among other things. Furthermore, no definitive established methodology exists to deal with the neural network modelling problem and no unique comparison method is used in all studies. Lapedes & Farber (1987, 1988) applied neural networks to forecast two chaotic time series, i.e. they are generated by a deterministic nonlinear process, but look like "random" time series. They concluded that the backpropagation procedure may be thought of as a particular, nonlinear, least squares method. Their results indicated that neural networks allow solution of the nonlinear system modelling problem, with excellent prediction properties, on such "random" time series when compared to traditional methods. Unfortunately, as they pointed out, their study did not include the effects of noisy real time series, and the related overfitting problem. Tang et al. (1991) compared neural networks and Box-Jenkins models, using international airline passenger traffic, domestic car sales and foreign car sales in the U.S .. They studied the influence on the forecasting performance of the amount of data used, the number of periods for the forecast and the number of input variables. They concluded that the Box-Jenkins models outperformed the neural net models in short term forecasting. On the other hand the neural net models outperformed the Box-Jenkins in the long term. Unfortunately, the stopping criteria and the overfitting problems were not investigated. Therefore, wrong conclusions may have been reached since the networks studied are very large, compared with the number of training patterns, and the network could have been overtrained. Recently, a large number of studies have tried to apply neural networks to short term electric load forecasting (EI-Sharkawi et al., 1991; Srinivasan et al., 1991; Hwang & Moon, 1991, among others). Most of the studies used feedforward neural networks with all the units using nonlinear activation functions (sigmoid or hyperbolic tangent). All these studies required a large amount of data and had large network structures. The input

O. LACHTERMACHER AND J. D. FULLER

232

data included several types of variables such as weather variables, dummy variables to represent the day of the week and variables to represent the historic pattern, among others. Almost all the studies did not discuss the overfitting problem, the relation between the size of the network and the number of patterns used in the training procedure, the stopping criteria or how the best structure was found. Most of them just presented the results without fully explaining how they were obtained. Most concluded that neural networks performed as well as or better than the traditional methods. Important work has been done by Weigend (1991) and Weigend et al. (1990, 1991). They introduced the weight-elimination backpropagation learning procedure to deal with the overfitting problem. They also presented all the relevant information that characterized the models used in the forecasting of the sunspots and exchange rate time series. They also discussed the stopping criteria in the validation procedure and compared the results with traditional time series models. They concluded that the neural network model performed as well as the TAR model (Tong, 1983,1990 and Tong & Lim, 1980) in the case of one-step-ahead prediction. In the case of multi-step prediction the neural net models outperformed the TAR model. The drawback of the weight-elimination procedure is to increase the training time by the inclusion of the penalty term and another dynamic parameter, A.. Nowlan & Hinton (1992), used an alternative approach called Soft Weight-Sharing. The results obtained are slightly better than both compared models. Besides the drawback of increasing the complexity of the modelling process, compared to the weight-elimination procedure, the authors concluded that the effectiveness of the technique is likely to be somewhat problem dependent However, 1he aulhors claim 1he advantage of a more sophisticated model is its ability to better adapt to individual problems. ME1HODOWGY In this section we describe the hybrid methodology developed for the practical application of neural networks in time series analysis, the performance analysis used to compare the neural network models to other types of time series models, and the software and hardware used in the study. In order to investigate the benefits of the methodology, four distinct stationary time series were used.

Hybrid Methodology The neural networks' main drawbacks of large data and time requirements are related to the fact that it has been a data-driven approach, because no definitive procedure is available for its use in time series modelling. The general idea of our new hybrid methodology is to use the available Box-Jenkins methodology as an exploratory procedure which identifies some important relationships in the data series. Based on the information produced in the exploratory step, we define an initial structure whose small size decreases the neural network's modelling time and reduces the number of estimated parameters and the amount of data required. This overcomes the most important practical drawbacks to the application of neural networks as a forecasting tool. Furthermore, because the BoxJenkins models are linear, we believe that the nonlinearities included in the neural networks models will help to improve the forecasting performance of the final model. I

BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING

233

Modelling Procedure: The modelling procedure consists of basically two steps. The first is called the exploratory phase and the second is called the modelling phase. In the exploratory phase, the general idea is to observe the series in order to identify its basic characteristics. In the modelling phase, the general idea is to use the information obtained in the first step to aid the design of the neural network structure and then perform the neural network modelling. In the following sections we will describe each of the methodology's phases. Exploratory Phase: In this research, our exploratory phase (Tukey, 1977, Ripel & McLeod, 1993) consists of two parts. In the first one, we use the plot of the time series and of the autocorrelation function to try to identify trends in the mean and in the variance, seasonalities and outliers of the data series. Based on this information the second part consists of the BoxJenkins modelling process. This process consists of the identification, estimation and diagnostic checking of the, appropriate ARMA model (Ripel & McLeod, 1993) for each time series. The decision of the best model to represent each time series is based on the Akaike Information Criterion (AIC) developed by Akaike (1974). Modelling Phase: An important feature gathered at the exploratory phase is what we call the lag components of the data series, i.e. the lag entries that are important to forecast a new element of a given time series in a linear context, as suggested by the autoregressive parameters of the calibrated Box-Jenkins model. A neural network structure that uses these lag entries as its input variables should easily be trained to perform the linear transformation done by the Box-Jenkins models. Furthermore, we expect that part of the "randomness" term of the linear model is in fact nonlinearity, and it can be learned and incorporated in the new model by the implicit nonlinearity of the neural network models. Learning Procedure: In all the studies performed, we used the backpropagation learning procedure, to train the networks. The additional momentum term, that on average speeds the process (Hertz et al. 1991), was also used in most of the studies. There exist several other methods that speed the training procedure. In order to make it clear that any gains in the speed of modelling are due to the hybrid methodology, we purposely used only the well known technique of backpropagation, with a momentum term, and we avoided the use of any other speeding technique. TrainingNa1idation Procedure: As mentioned before, an important issue in the application of neural networks is the relation of the size of the network (and so the number of free parameters) to the number of training examples. Like other methods of function approximation, such as polynomials, a large number of parameters will allow the network to fit the training data closely. However, this will not necessarily lead to optimal generalization properties (i.e. good forecasting ability, in the present context).

234

G. LACHTERMACHER AND J. D. FULLER

Weigend et al. (1990), suggested two methods to deal with the overfitting problem. The first one, the weight-elimination backpropagation procedure, was mentioned in the last section. The second one involves providing a large structure which contains a large number of modifiable weights but stopping the training before the network has made use of all of its degrees of freedom. As pointed out by Weigend et al. (1990,1991). the problem with this procedure is to determine when the network has extracted all useful information and is starting to incorporate the noise in the model parameters. They suggested that part of the available data should be separated to serve as a validation set. The performance on this set should be monitored and the training should be stopped when the error on the validation set starts to decrease slowly ( almost constant ) or to increase. Three problems of the latter procedure were pointed out by Weigend et al. (1990). The first one is that part of the time series available cannot be used directly in the training process. In some cases. the available series do not present more than 40 to 50 elements. so it is impractical to separate one part of the series to be used as a validation set, because all available data are needed for training and performance evaluation. The second problem is related to the pair of training set and validation set chosen. The authors found out, when studying the sunspot series, that the results depend strongly on the specific pair chosen. The last problem is related to the stopping point. They point out that is not always clear where the network is starting to learn the noise of the time series. They proposed the weight-elimination procedure to overcome these problems. However, the drawback of weight-elimination is the increase of the training time. Our training/validation procedure is based on Weigend's validation methodology. However, it deals with the overfitting and learning time problems in a different way. The general idea is to use the information obtained in the exploratory phase of the methodology to establish a small network (and so, few parameters) that would learn on the lag components of the time series. To avoid the need to use real data for validation, we generate a synthetic times series, that possesses the same statistical properties of the original time series, using the Waterloo simulation procedures (Hipel & McLeod, 1993), to be used as a validation set. The forecasting performance on this validation set is then monitored as the training proceeds, as suggested by Weigend et al. (1990,1991), and the stopping decision is based on this performance. Stmctwe of NeUJ1ll Netwolk Model: In this study, we used a fully connected feedforward network which has an input layer, one hidden layer and a single unit output layer. This type of network was suggested by Lapedes & Farber (1987) and used by Weigend (1991) and Weigend et al.( 1990, 1991). However, instead of using a linear activation function in the output layer, we are going to use a nonlinear (sigmoid) one which was originally used by Rumelhart et al. (1986). The number of units in the input and hidden layers varies according to the calibrated Box-Jenkins model and the amount of data available to train the network. The number of input units is initially determined by the number of autoregressive terms of the calibrated ARMA model. Furthermore, in order to test the adequacy of the model, a model with an additional input is also tested as a type of sensitivity analysis. The number of units in the hidden layer is based on the number of input units and the

BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING

235

number of training patterns. As said before, the relation of the number of weights and the number of patterns is important to guarantee a good generalization performance of the neural network model. Therefore, given the fact that the validation procedure (Weigend et al., 1990, 1991) suggests an initial large structure, the number of hidden units is determined in order to make the number of weights follow the heuristic rule given by :

1.1P s H(1+1) < 3P

10

10

(1)

where

p

H I

is the total number of patterns used to train the network, is the number of hidden units used in the structure, and is the number of input units used in the structure.

This heuristic is based on the relation of one weight to ten patterns suggested in the literature (Weigend et al., 1991) in order to obtain good generalization properties. The idea is to have a structure that is 1.1 to 3 times bigger than the relation of one to ten suggested in the literature. Parameters' Initialization: In this paper all the initial sets of weights and biases were randomly generated from a uniform distribution in the range from -0.25 to 0.25. The same range was used by Weigend et al. (1990,1991) in their studies of neural network modelling in order to avoid bad solutions. Learning and Momentum Rates: Several pairs of learning (11) and momentum (J.I.) rates were used in this work. The general idea was to start using the pair of 1'\ = 0.5 and J.I. = 0.9. However, in some studies, this pair of parameters drove the training process to a local minimum of very high error values (higher than 2 in a normalized scale). In order to overcome this problem, several new pairs with smaller parameters were tried. During our experiments we observed, as expected, that the local minimum problem happens more frequently when using very small structures. This observation helped to formulate the initial structure heuristic rule (1) proposed above. Penonnance Criterion: Several performance criteria have been used in neural network time series modelling. Weigend et al. (1990,1991), Weigend (1991) and Nowlan & Hinton (1992) used the Average Relative Variance (ARV). Gorr et al. (1992) used the Mean Error (ME), the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE). Coakley et al. (1992) studied several alternative performance measures such as the Mean Squared Error (MSE), the Root Mean Squared Error (RMSE) and the Mean Absolute Percentage Error (MAPE). In this study we used MSE as the cost function for backpropagation, and RMSE to determine the stopping point of training and to evaluate forecasting performance.

236

G. LACHTERMACHER AND J. D. FULLER

Patterns Nonnalization: Because of the output range of the sigmoid unit ( zero to one ), the patterns used to train the neural network should be normalized. Several normalization procedures are described in the literature. The most common one is to divide all the values by one number that is larger than the biggest value presented in the studied time series. In this paper, this number is refereed to as the normalization factor (NF). This procedure was used by Weigend et al. (1991) in their studies of the sunspot time series. In the case of stationary series, sometimes we want to forecast values which might be outside the historical range. Therefore we choose a normalization factor that ranges from 30% to 100% larger than the maximum value in the historical training patterns. The value varies according to visual inspection of the plot of the series (variance), that was done during the exploratory phase of the methodology. Stopping Criterion: The stopping criterion used in our methodology is similar to Weigend et al.'s (1991) validation procedure. During the training process we monitor the performance of the validation set (synthetic time series) using the RMSE criterion. Gorr et al. (1992) used the RMSE in their validation process,but they the authors used part of the original series as the validation set. These measurements are made in the original range of the time series. The general idea is when the RMSE starts to increase, or to decrease very slowly, the process should be stopped in order to avoid overlearning and deterioration of the generalization properties of the resultant model. Furthermore, we observe during our study that a better criterion is to observe the behaviour of the validation and training sets together. We noticed that when the training is starting to overlearn, the trends of the validation RMSE and training RMSE plots are in opposite directions. This was observed in most of the problems studied. Sensitivity Analysis: In order to check whether the neural network structure is adequate, an additional input unit is included in the initial structure suggested by the calibrated Box-Jenkins models. If the results do not present significant improvement, the first model is maintained. If the model with the extra input unit is preferred, then a new unit is added to the input layer, and the process is repeated. Additional hidden units are also tested. To be consistent with our aim to minimize data requirements, we compare "forecast" performance by RMSEs on the synthetic data. This kind of procedure has been used by Ripel & McLeod (1993) in order to test the ARMA model in relation to the overfitting problem. Petfonnance Analysis In this section we describe the types of time series studied, and the types of prediction performed in each case. Description of the TIme Series: Only stationary time series were used in this study. They were represented in this study by four annual river flow time series. The sizes of these series vary from 96 to 1SO

BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING

237

elements. These series were chosen because they were previously studied by Hipel & McLeod (1993) in a comparison study among the ARMA model and several other time series models. In order to obtain a fair comparison of the neural network models developed and the results obtained by Ripel & McLeod (1993), the same conditions (modelling and prediction data sets) were used to train the neural networks.

Types of Prediction: The forecasting abilities of the neural network models were tested over a period of time of 30 years depending on the size of the studied time series. Two types of prediction were performed, the one step ahead and the multi-step ahead. In one step ahead prediction, only the actual observed time series is utilized as the input of the models (neural networks and Box-Jenkins) to forecast the next entry of the series. Each new forecasted entry is then independent and uncorrelated from the previous forecasts. Two types of multi-step ahead prediction are reported in the neural network literature. The first type is called iterated multi-step prediction. In this type, in order to obtain several steps ahead of prediction, the results obtained in the model are used in the subsequent forecasts. In this case, the network has just one unit in the output layer, which represents the next forecast entry of the time series. This forecasting technique is similar to the one used by the Box-Jenkins models. In the second type of multi-step ahead prediction, the neural network contains several units at the output layer of the structure, each representing one step to be forecasted. However, this type of structure requires a larger time series than the first type of multi-step prediction in order to avoid generalization problems. Since the goal of this research is to develop a methodology that uses small real world time series, this technique was not used. Instead, we used iterated multi-step predictions. Computational Resources In the case of the neural networks models, we used the Hybrid Backpropagation (HBP) software developed by Lachtermacher (1991,1993). The software was rewritten in TurboPascal® utilizing Turbo Vision® (OOP library) to increase the processing speed of the software. The training was done in a 486 - 50MHz PC compatible. The training time of a model ranged from 30 minutes to 3 hours depending on the number of training patterns and the structure used.

RESULTS Hipel & McLeod (1993) compared the performance of the ARMA models against several stationary models, using the river flow time series. We expand this study with the inclusion of the results of the calibrated neural network models, using the same modelling and prediction sets. All the time series models are well described in Hipel & McLeod (1993). Table I summarizes the results obtained.

G. LACHTERMACHER AND 1. D. FULLER

238

Table L

One Step Ahead Prediction Time Series Models' RMSE Comparison

Neumunas

Gota

Mississippi

St.Lawrence

118.30·

87.58·

1508.03·

473.89·

115.80·

95.57·

1543.56·

630.55·

FDIFF

116.12·

97.66·

1574.85·

875.91·

Markov

114.70·

97.45·

1625.90·

450.85·

Nonparametric

115.40·

92.86·

1560.00·

426.90·

115.82 85.85 Neural Nets * Obtamed by Hipel & McLeod (1993)

1498.49

472.47

Model\Series ARMA "FGN

Table

n.

Time Series Models' Ranking Summary

Model\Rank

1

2

3

4

5

6

Rank Sum

ARMA

0

2

0

1

0

1

14

FGN

0

0

2

1

1

0

15

FDIFF

0

0

0

0

2

2

22

Markov

1

1

0

0

1

1

14

Nonparametric

1

1

1

1

0

0

10

Neural Nets

2

0

1

1

0

0

9

Table II presents a ranking summary of the results described in Table I. The rank sum is simply the sum of the product of the rank and the number of times the model received that rank. Therefore, models with low rank sums forecast better overall than models with higher rank sums (Hipel & McLeod, 1993). As can be seen from Table II, the calibrated neural network models presented the best overall performance in the one step ahead predictions, considering the four river flow

239

BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING

studies. Furthermore, in all studies performed using stationary series, at least one neural network model outperformed the corresponding ARMA model in the one step ahead predictions. We should note that the differences in the performances between the calibrated neural network models and the corresponding ARMA models were very small and may not justify the additional work to achieve them. The hybrid methodology proved to be a good approach to forecast stationary series well, reducing the time used to calibrate the neural network model to about two hours, typically. Moreover, the use of the synthetic time series as the validation set proved to be an efficient way to decrease the neural network data requirements in this type of time series. An important observation is that the performance of the neural networks' one step ahead predictions deteriorate with the inclusion of additional hidden and/or input units. The deterioration is more influenced by the inclusion of an unnecessary input unit than by a hidden one. Therefore, more care should be taken in the determination of the relevant number of input units than in setting the number of hidden units used in the network structure. In the case of the multi-step prediction (Table ill) both models have almost the same performance. Another important fact was noted in this type of prediction. We observed that the performance of the neural networks improves with the increase of the number of estimated parameters of the model (by increasing the number of hidden or input units) in all studies. However, there is not yet an exact explanation of these facts. Table

m

Multi-Step Ahead Prediction Time Series Models' RMSE Comparison

Neumunas

Gota

Mississippi

St.Lawrence

ARMA

101.34

101.98

1792.57

909.72

Neural Nets

102.84

101.60

1792.97

903.33

Model\Series

Nonnality of the Residuals and Forecast Bias: The residuals of the training and forecasting sets were tested for normality using normal probability plots and the skewness and kurtosis statistics. In all cases, the hypothesis of normality was not rejected at a significance level of 95%. Furthermore, there was no significant bias( 95%) indicated in the forecasts made by the neural network models. CONCLUSIONS AND FUTURE RESEARCH In this research, we developed and tested the Hybrid methodology for the use of neural networks in time series forecasting. We tailored this methodology to be applied to stationary and noncyclical series. The methodology consists of two phases, Exploratory

240

G. LACHTERMACHER AND J. D. FULLER

and Modelling. In the exploratory phase we identify the 'lag components' of the series using the traditional Box-Jenkins methods. Based on these results a small structure is suggested and then the training is performed. In the following paragraphs we describe the conclusions reached in this study. Penonnance on Stationary TIme Series In the study of the stationary series, we observed that the calibrated neural network models have a better overall performance than all the traditional time series methods used in the benchmark. However, the differences in performance were very small, compared with the corresponding ARMA models, in both one and multi-step predictions. This suggests that the additional work to create the neural network model may not be compensated by the improvement obtained in the forecasting. Derenninldion of the Neural Network Structure The use of the Box-Jenkins method to identify the 'lag components' of the time series and to suggest a suitable neural network structure was demonstrated to be effective. Of the four series studied, in no cases did the sensitivity analysis procedure point out a different calibrated model than initially suggested by the Box-Jenkins method. Synthetic Data In most of the cases, the synthetic data seems to mimic the original data sufficiently well to suggest an appropriate stopping point, to avoid overfitting. Furthermore, the use of synthetic data in the sensitivity analysis procedure usually pointed to the model with the best overall performance in both types of prediction tested. Data Requirements and Modelling TIme By reducing the overall modelling time and decreasing the data requirements (compared with earlier attempts to use neural networks in time series forecasting), the Hybrid methodology proved to be a practical way to use neural networks as a forecasting tool, in the case of stationary time series. It should be noted that in the present study, we used small time series and achieved a better performance than the corresponding ARMA model in a very short training/validation process. Future Resesarch Further research should be developed to tailor the Hybrid methodology for other types of time series, such as the seasonal and nonstationary time series. In this, some attention should be given to the use of tradional methods, such as moving average and deseazonalization factors. Furthermore, the application of the Hybrid Methodology to a multivariate time series should also be studied. In addition, the methodology should be tested on more time series in order to verify the initial results obtained for the stationary time series. Moreover, a special study should be done to establish a procedure to better identify the adequate normalization factor(s) to be used in the forecasting process.

BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING

241

REFERENCES Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on Automatic Control, AC-19, 6, 716-723. Box, G. E. P., Jenkins, G. M. (1976) Time series analysis: forecasting and control, Holden-Day, Inc. Oakland, California. Chakraborty, K, Mehrotra, K, Mohan, C. K, Ranka, S. (1992) Forecasting the behavior of multivariate time series using neuml networks, Neural Networks, 5,961-970. Coakley, J. R., McFarlane, D. D., Perley, W. G. (1992) A Itemative criteriaforevaluating artificial neuml network performance, presented at TIMS/ORSA Joint National Meeting, April. El-Sharkawi, M. A., Oh, S., Marks, R. J., Damborg, M. J., Brace, C. M. (1991) Short term electric load forecasting using an adaptative trained layered perceptron, in Proceedings of the First Forum on Application of Neural Networks to Power Systems, 3-6, Seattle, Washington. Gorr, W., Nagin, D., Szcypula, J. (1992) The relevance of artificial neuml networks to managerial forecasting; an analysis and empirical study, Technical Report 93-1, Heinz School of Public Policy and Management, Carnegie Mellon University, Pittsburgh, PA, USA. Hertz, John, Krogh, Anders, Palmer, Richard G. (1991) Introduction to the Theory of Neuml Computation, Addison-Wesley Publishing Co., Don Mills, Ontario, 1-8 and 89-156. Hipel, KW., McLeod, A.I. (1993) Time series modelling of water resources and environmental systems, to be published by Elsevier, Amsterdam, The Netherlands. Lachtermacher, G. (1991) A fast heuristic for backpropagation in neuml networks, Master's Thesis, Department of Management Sciences, University of Waterloo, Waterloo, Ontario, Canada. Lachtermacher, G. (1993) Backpropagation in Time Series Analysis, Ph.D Thesis, Department of Management Sciences, University of Waterloo, Waterloo, Ontario, Canada. Lapedes, A., Farber, R. (1987) Nonlinear signal processing using neuml networks: prediction and system modelling, Technical Report LA-UR-*&-2662, Los Alamos National Laboratory. Lapedes, A., Farber, R. (1988) How neuml nets works, in Neural Information Processing Systems, ed. Dana Z. Anderson, 442-456, American Institute of Physics, New York. Nowlan, S. J., Hinton, G. E. (1992) Simplifying neuml networks by soft weight-sharing, Neural Computation, 4, 473-493. Rumelhart, David E., McClelland James L. and The PDP Research Group (1986) Pam/lei Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, MIT Press, Cambridge, Massachusetts, USA. Srinivasan, D., Liew, A.C., Chen, J. S. P. (1991) Short term forecasting using neuml network approach, in Proceedings of the First Forum on Application of Neural Networks to Power Systems, 12-16, Seattle, Washington.

242

G. LACHTERMACHER AND J. D. FULLER

Tang, Z., Almeida, C., Fishwick, P.A. (1991) Times series forecasting using neural networks vs. Box-Jenkins methodology, in Simulations, 303-310, Simulations Councils, Inc., November. Tong, H., Lim, K. S. (1980) Threshold autoregressive, limit cycles and cyclical data, Journal of the Royal Stat. Society, series B, 42, 3, 245-292. Tong, H. (1983) Threshold Models in non-linear time series analysis, in Lecture Notes in Statistics, ed. D.Brillinger, S. Flenberg, J.Ganid, J.Hartigan and K. Krickeberg, Springer-Verlag, New York, N.Y., USA. Tukey, J. W. (1977) Exploratory data Analysis, Addison-Wesley, Reading, Massachusetts, USA. Weigend, A. S. (1991) Connectionist architectures for time series prediction of dynamical systems, PhD Thesis, Department of Physics, Stanford University, University Microfilms International, Ann Arbor, Michigan. Weigend, A. S., Rumelhart, D. E., Huberman, B.A. (1990) Predicting the future: a connectionist approach, International Journal of Neural Systems, 1, 3, 193-209. Weigend, A. S., Rumelhart, D. E., Huberman, B.A. (1991) Back-propagation, weightelimination and time series prediction, in Connectionist Models - Proceedings of the 1990 Summer School, Edited by D.S.Touretzky, J.L.Elman, T.J.Sejnowski, G.E.Hinton, Morgan Kaufmann Publishers, Inc.

PART V TREND ASSESSMENT

TESTS FOR MONOTONIC TREND

A. I. MCLEOD 1 ,2 and K. W. HIPEV,3 1 Department of Statistical and Actuarial Science, The University of Western Ontario London, Ontario, Canada N6A 5B7 2Department of Systems Design Engineering, University of Waterloo Waterloo, Ontario, Canada N2L 3G1 3Department of Statistics and Actuarial Science, University of Waterloo The monotonic trend tests proposed by Mann (1945), Abelson and Tukey (1963) and Brillinger (1989) are reviewed and evaluated. Simulation experiments using a very large number of simulations are carried out for comparing the Abelson-Tukey and Mann-Kendall tests. The advantages and disadvantages of each test are discussed and the practical implementation and usefulness of these test procedures are clearly demonstrated with some applications to environmental data..

INTRODUCTION Recently, Brillinger (1989) proposed a new test for monotonic trend. The important advantage of this test over the Mann-Kendall test (Mann, 1945) is its validity in the presence of an auto correlated error component. Brillinger demonstrated that in this case his test would have asymptotic power equal to one whereas other tests, which do not take into account the autocorrelation of the error component, would by comparison have an asymptotic relative efficiency of zero. For the situation where the errors from the monotonic trend can be assumed to be statistically independent, the method of Brillinger (1989) may be replaced with a test originally developed by Abelson and Tukey (1963). It is of interest to compare the power of the Mann-Kendall and Abelson-Tukey tests since there are important examples where the errors appear to be independent and identically distributed white noise. However, it should be noted that if it is the case that the error component is significantly correlated, then both of these tests would be expected to perform very poorly relative to Brillinger's trend test. In the next section, the Brillinger trend test and its practical implementation details are outlined. Then, in the subsequent two sections the Abelson-Tukey and Mann-Kendall tests are briefly summarized. Power comparisons of the Abelson-Tukey and Mann-Kendall tests demonstrate the usefulness of these tests.

BRILLINGER TREND TEST The basic underlying model considered can be written as Zt

where

Zt

= St + 1/1>

is the observed time series,

St

t

= 1, ... ,n,

(1)

represents a signal or trend component and

1/t stands for an auto correlated error component. Under the null hypothesis, it is 245 K. W Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 245-270. © 1994 Kluwer Academic Publishers.

A. I. MCLEOD AND K. W. HIPEL

246

1.0 0.5

~

Z

w

-------------_ ..----------.. :./*

......"....

U

0.0

u..

~





----

u -0.5.

-1.0~----~----~----~----~----~

o

40

20

80

60

100

TIME Figure 1. Plot of coefficient

Ct

in (3) against t for the Brillinger trend test.

assumed that St is a constant. The alternative hypothesis to be tested assumes that St is either a nondecreasing (St ::; St+d or nonincreasing (St 2: StH) function of time t. The test statistic Brillinger (1989) developed is given as

ZB where Ct

=

L:CtZt

est.sd.(L: CtZt)

Ct: = yt(l-;;)

J+ (t

,

t+1 1)(1- -;-),

(2)

(3)

and n is the length of the series. Under the null hypothesis of no trend, the statistic ZB is asymptotically normally distributed with mean 0 and variance 1. Large values of IZBI indicate the null hypothesis is untenable and hence there is the possibility of a trend in the series. The trend is increasing or decreasing according as ZB is > 0 or < 0, respectively. A plot of Ct versus t for n = 100 is shown in Figure 1. Notice how the function Ct contrasts the values between each end of the series so values near the beginning are given weights close to -1 while those near the other end are given weights near +1. The function Ct was originally derived by Abelson and Tukey (1963) for testing the case where the error component TJt is assumed to be independently normally distributed with mean zero and constant variance. This will be referred to as the Gaussian white noise case. It can be shown that

(4) where 111(0) denotes the spectral density function of the error component evaluated at O.

TESTS FOR MONOTONIC TREND

247

In order to estimate /,,(0), it is first necessary to estimate Tf. Assuming there are no outliers, an estimate of the trend component is given by the running average of order V defined as v 1

=L

ZHi·

(5)

• ~ 2:n1 Ej12 3=1 f," (0) -- -"L=-----' E(I-a )2

(6)

St

V

2 +1 The practitioner should normally choose a value of V to give a reasonable estimate of the trend component. To assist in verifying the choice of V, Brillinger (1989) suggests examining a plot of the trend component for the specified choice of V. It is very important that V not be too small since this can cause /,,(0) to be drastically underestimated which may easily result in a Type 2 error, that is, rejection of the null hypothesis when it is true. In some cases where there are outliers in the series, a suitable Box-Cox transformation may be used to make the data more normally distributed. The normal probability plot can be used to choose the transformation by examining the plot while trying different power transformations (see Hipel and McLeod (1994, Section 3.4.5)). After the trend component, St, has been estimated the autocorrelated error component, Tfh can be estimated by ~t = Zt - St. Then, an estimate of /,,(0) derived by Brillinger (1989) is i=-V

L

o

j=1

where •

fj

where i

=A

=

n-l-V ~.

L..J

Tft

3

exp { -

t=V+l

and a j -

2 °t 'In J} , n 0

sin{bj (2V+1)} (2V

2n

+ l)sin(¥.!)"

(7)

(8)

The parameter L determines the degree of smoothing of the periodogram component. A plot of the periodogram of the estimated autocorrelated error component, ~t, showing the bandwidth corresponding to L is suggested by Brillinger (1989) to aid in choosing L. As with the choice of V, a suitable selection of L is essential to obtain a reasonably good estimate of /,,(0). Finally est.sd·(L CtZt)

= J21r j,,(O) L c~.

(9)

In practice, the Fourier transform Ej may either be computed using the Discrete Fourier Transform (DFT) or the Fast Fourier Transform (FFT). If the FFT is employed, the series is padded with zeros at both ends until it is of length n' = 2P , where p = [log2(n)) + 1, where [e) denotes the integer part. To avoid leakage, especially when the FFT is used, Tukey (1967) and Brillinger (1981, p.54) recommend that data tapering be used. Tukey's split cosine bell data taper (Bloomfield, 1976, p.84) involves multiplying the series ~t by the cosine tapering function UtI where Ut

= "21 ( 1 -

cos

1r(t i-

!))

2 ,

for t

= 1, ... ,I,

A. I. MCLEOD AND K. W. HIPEL

248

=

1,

for t = l

= 21(1 -

cos

+ 1, ... , n' - l - 1,

1r(n'-t+~»)

i

,for t

= n' -

i, ... , n'

(10)

to form the tapered series ~; = ~tUt. The Fourier transform for the tapered series is then evaluated. The percentage of data tapered, say r, is then r = 200i/n'. Tukey recommends choosing r = 10 or 20. Hurvich (1988) suggests a data based method for choosing the amount of tapering to be done. The choice parameters V, L and r are very important in the application of Brillinger's test since a poor selection of these parameters may result in a completely meaningless test result. We have found it helpful to practice with simulated time series data in order to develop a better feel for how these parameters should be chosen.

ABELSON-TUKEY TEST In this case, under the null hypothesis of no trend, the test statistic may be written as ZA = ECtzt , (11) J(E cl)(E(zt - z)2 where z = E zt/n. Under the null hypothesis of no trend, the statistic ZA is asymptotically normally distributed with mean 0 and variance 1. Large values of IZAI indicate the null hypothesis is untenable and hence there is the possibility of a trend in the series. The trend is increasing or decreasing according as ZA is > 0 or < 0, respectively.

THE MANN-KENDALL TREND TEST The Mann-Kendall trend test is derived by computing the Kendall rank correlation (Kendall, 1975) between Zt and t (Mann, 1945). The Mann-Kendall trend test assumes that under the null hypothesis of no trend, the time series is independent and identically distributed. Since the rank correlation is a measure of the monotonic relationship between Zt and t, the Mann-Kendall trend test would be expected to have good power properties in many situations. Unlike the Brillinger test, one is not restricted to having consecutive equi-spaced observations. Thus, the observed series may be measured at irregularly spaced points in time. However, one can assume the previous notation, where Zt is interpreted as the t-th observation in data series and t = 1, ... ,n. In the general case where there may be multiple observations at the same time point producing ties in t and there may also be ties in the observations Ze, the MannKendall score is given by n

S

= E sign«zt t<.

z.) (t -

8».

(12)

This situation arises in water quality data due to repeated uncorrelated samples taken at the same time and the limited accuracy of the observations. Under the null hypothesis of no trend, the expected value of S is zero while increasing or decreasing monotonic trends are indicated when S < 0 or S > O. Valz et al. (1994a) present improved approximations to the null distribution of S in the case of ties in both rankings as well as an exact algorithm to compute its significance levels {Valz et al.,

249

TESTS FOR MONOTONIC TREND

1994b). A detailed description of nonparametric trend tests used in water resources and environmental engineering is provided by Ripel and McLeod (1994, Ch. 23). If it is assumed that there are no ties in either Zt or t then the formula for the Kendall score may be simplified (Kendall, 1973, p.27) to yield,

S = 2P - (;),

(13)

where P is the number of times that Zt2 > Ztl for all tb t2 = 1, ... , n, such that t2 > t l . Under the null hypothesis all pairs are equally likely, so Kendall's rank correlation coefficient which is defined in the case of no ties as (14) can be written as T

=

211"c -

1,

(15)

where 1I"c is the relative frequency of positive concordance (i.e., the proportion of time for which Zt2 > Ztl when t2 > tl). In the case where there are no ties in either ranking, it is known (Kendall, 1975, p.51) that under the null hypothesis, the distribution of S may be well approximated by a normal distribution with mean zero and variance, var(S) provided that n

~

=

1 18n(n -1)(2n + 5),

(16)

10.

POWER COMPARISONS The power function at a 5% test of significance, denoted by f3MK and f3AT for the Mann-Kendall test and the Abelson-Tukey tests, respectively, are estimated for various forms of the basic trend model Zt

= f(t)

+ at, t =

1, ... , n,

(17)

where Zt is the observed time series, f(t) represents a monotonic trend component and at represents an error component which is independent and identically distributed. Length of series, n = 10,20,50 and 100, are generated one million times for each of a variety of trend models and error component distributions. The proportion of times that the null hypothesis of no trend is rejected gives estimates for f3MK and {JAT. Thus, the maximum standard error in the estimated power functions, {JMK and (JAT, is 10-3 / v'2 == 0.0005. Consequently, one may expect that the power probabilities may differ by at most one digit in the third decimal from the true exact result most of the time. Three models for trend are examined. The first trend model is a linear trend so f(t) = )..t. In this case, it is known that the Mann-Kendall trend test is nearly optimal since when at is Gaussian white noise, the Mann-Kendall trend test has 98% asymptotic relative efficiency with respect to the optimal estimator which is linear

A. I. MCLEOD AND K. W. HIPEL

250

regression (Kendall and Stuart, 1968, §45.25). In the second model, f(t) is taken to be the step function

f(t) = 0,

= >.,

if t:S if t >

n/2, n/2.

(18)

Step functions such as this are often used in intervention analysis modelling (see Hipel and McLeod (1994, Oh. 19) for detailed descriptions and applications of various types of intervention models). A priori, it would be hoped that both the Abelson-Tukey and Mann-Kendall trend test should perform well for step functions. In the third model, f(t) = ACt, where Ct is defined in equation (3). For this model, the Abelson-Tukey procedure is optimal when at is Gaussian white noise. The values of the parameter A in these trend models is set to A = aJlO/n where a = 0.01,0.04,0.07,1.0. Two models for the error component distribution are used. The first is the normal distribution with mean zero and variance 0'2, while the second is a scaled contaminated normal distribution, c(z),

c(z)=0'

(1 - p)(z) + p.l(z/O'c)) (1 iT. ) , - p - pUc

(19)

where O'c = 3 and p = 0.1. The scaling ensures that the distribution has variance equal to 0'2. These particular parameters are suggested by Tukey (1960) and have been previously used in many simulation studies. The reason for o'c = 3 is that Tukey (1960) found that there are many datasets occurring in practice where this choice was suggested. We choose p = 0.1 since for this choice the variance contribution from both distributions is equal and so the contamination effect is largest. In previous simulation studies (see Tukey (1960)), it is found that this choice produces the greatest effect when a non-robust estimator is compared to a robust one. We take 0' = 0.5,1,2,4 in both the normal and contaminated normal cases. Data are generated by applying the Box-Muller transformation (Box and Muller, 1958i to uniform (0,1) pseudo-random variables generated by Superduper (Marsaglia, 1976 . The tests are applied to the same data series but a different set of random num ers is used for every model and parameter setting. The simulation results are presented in Tables lola to 3.2b. As previously noted, due to the very large number of simulations, all results presented in these tables are essentially exact to the number of decimal places given. Tables lola and LIb show the results for a simple linear trend with Gaussian white noise. For comparison of {3MK and {3AT, one can look at their ratios, {3MK/{3AT, as well as their absolute magnitudes. These ratios vary from 0.93 to 1.56. As might be expected, in no case is the Abelson-Tukey test substantially better than the Mann-Kendall test whereas there are many instances, especially for series length 100, where the Mann-Kendall is much better. This conclusion also applies to Tables 2.1a and 2.2b. In Tables 1.1a through 2.2b, the only situations where the Abelson-Tukey test has larger power is when the null hypothesis is either true or very nearly true so the power function is really just reflecting the probability of a Type 1 error. For step functions, the results are shown in Tables 2.1a through 2.2b. For longer series lengths shown in Tables 2.1b and 2.2b, the Mann-Kendall test dominates since the only times where the Abelson-Tukey has a larger power is when the null hypothesis is true and even in these cases the Mann-Kendall is better since probability of Type 1 error is closer to its nominal 5% level. For smaller samples shown on Tables 2.1a

251

TESTS FOR MONOTONIC TREND

TABLE 1.1a. Linear trend with Gaussian white noise n 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

C\!

tT

0.00 0.00 0.00 0.00 0.01

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

O.oI

0.01 0.01 0.04 0.04 0.04 0.04 0.07 0.07 0.07 0.07 0.10 0.10 0.10 0.10 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.04 0.04 0.04 0.04 0.07 0.07 0.07 0.07 0.10 0.10 0.10 0.10

f3MK

0.060 0.060 0.060 0.060 0.059 0.058 0.058 0.059 0.091 0.064 0.058 0.058 0.178 0.082 0.062 0.059 0.313 0.115 0.068 0.059 0.051 0.051 0.050 0.051 0.061 0.053 0.051 0.050 0.254 0.097 0.061 0.053 0.625 0.204 0.086 0.058 0.901 0.367 0.126 0.067

f3AT

0.056 0.056 0.056 0.056 0.059 0.057 0.056 0.056 0.098 0.067 0.058 0.056 0.183 0.088 0.064 0.058 0.309 0.122 0.072 0.060 0.053 0.053 0.053 0.053 0.063 0.056 0.054 0.053 0.220 0.095 0.063 0.055 0.519 0.181 0.085 0.061 0.800 0.310 0.118 0.069

f3MK/f3AT 1.07 1.06 1.06 1.06 1.00 1.03 1.05 1.06 0.93 0.96 1.00 1.03 0.97 0.93 0.97 1.01 1.01 0.94 0.94 0.99 0.96 0.96 0.95 0.96 0.96 0.94 0.94 0.95 1.15 1.03 0.96 0.95 1.20 1.13 1.01 0.95 1.13 1.19 1.07 0.97

A. I. MCLEOD AND K. W. HIPEL

252

TABLE 1.Ib. Linear trend with Gaussian white noise n 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

a 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.04 0.04 0.04 0.04 0.07 0.07 0.07 0.07 0.10 0.10 0.10 0.10 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.04 0.04 0.04 0.04 0.07 0.07 0.07 0.07 0.10 0.10 0.10 0.10

(1'

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3Mx

0.051 0.051 0.051 0.051 0.140 0.072 0.056 0.052 0.933 0.410 0.140 0.073 1.000 0.858 0.329 0.118 1.000 0.991 0.583 0.192 0.049 0.050 0.050 0.050 0.419 0.141 0.072 0.056 1.000 0.940 0.418 0.141 1.000 1.000 0.868 0.336 1.000 1.000 0.992 0.593

PAT

0.051 0.051 0.051 0.052 0.111 0.066 0.055 0.052 0.783 0.289 0.110 0.066 0.997 0.675 0.234 0.096 1.000 0.924 0.414 0.144 0.051 0.051 0.051 0.051 0.269 0.104 0.064 0.054 0.999 0.757 0.269 0.105 1.000 0.996 0.645 0.218 1.000 1.000 0.910 0.388

f3MK/PAT 0.99 1.00 0.99 1.00 1.27 1.10 1.03 0.98 1.19 1.42 1.27 1.10 1.00 1.27 1.41 1.23 1.00 1.07 1.41 1.33 0.97 0.99 0.98 0.99 1.56 1.35 1.12 1.03 1.00 1.24 1.55 1.35 1.00 1.00 1.34 1.54 1.00 1.00 1.09 1.53

253

TESTS FOR MONOTONIC TREND

TABLE 1.2a. Linear trend with contaminated normal white noise n 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

or

IT

0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.04 0.04 0.04 0.04 0.07 0.07 0.07 0.07 0.10 0.10 0.10 0.10 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.04 0.04 0.04 0.04 0.07 0.07 0.07 0.07 0.10 0.10 0.10 0.10

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

/3MK

0.059 0.059 0.060 0.059 0.059 0.058 0.059 0.059 0.110 0.067 0.059 0.058 0.236 0.096 0.064 0.058 0.416 0.144 0.075 0.061 0.051 0.051 0.051 0.050 0.066 0.054 0.051 0.051 0.341 0.120 0.066 0.054 0.757 0.273 0.103 0.062 0.952 0.488 0.162 0.076

/3AT

0.062 0.062 0.063 0.063 0.066 0.064 0.063 0.063 0.115 0.076 0.066 0.064 0.221 0.103 0.072 0.065 0.367 0.145 0.083 0.067 0.060 0.060 0.060 0.060 0.071 0.063 0.061 0.061 0.242 0.105 0.071 0.062 0.563 0.199 0.094 0.069 0.820 0.341 0.130 0.077

f3MK//3AT 0.95 0.95 0.95 0.95 0.90 0.91 0.94 0.95 0.96 0.89 0.89 0.91 1.07 0.94 0.89 0.89 1.13 1.00 0.90 0.90 0.84 0.85 0.84 0.84 0.93 0.85 0.84 0.83 1.41 1.14 0.93 0.86 1.34 1.37 1.09 0.90 1.16 1.43 1.25 0.98

A. I. MCLEOD AND K. W. HIPEL

254

TABLE 1.2b. Linear trend with contaminated normal white noise n

or

(T

50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.04 0.04 0.04 0.04 0.07 0.07 0.07 0.07 0.10 0.10 0.10 0.10 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.04 0.04 0.04 0.04 0.07 0.07 0.07 0.07 0.10 0.10 0.10 0.10

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

i3MK

0.051 0.051 0.051 0.051 0.180 0.082 0.059 0.053 0.980 0.543 0.181 0.082 1.000 0.945 0.443 0.149 1.000 0.998 0.730 0.255 0.050 0.050 0.050 0.050 0.556 0.183 0.082 0.058 1.000 0.986 0.555 0.183 1.000 1.000 0.954 0.451 1.000 1.000 0.999 0.745

/3AT

0.056 0.056 0.056 0.056 0.112 0.069 0.059 0.057 0.801 0.298 0.112 0.069 0.990 0.698 0.240 0.098 0.999 0.925 0.430 0.146 0.054 0.054 0.053 0.054 0.269 0.103 0.066 0.057 0.996 0.774 0.269 0.102 1.000 0.990 0.662 0.216 1.000 1.000 0.915 0.393

i3MK//3AT 0.91 0.91 0.91 0.91 1.60 1.18 0.99 0.93 1.22 1.83 1.62 1.18 1.01 1.35 1.84 1.52 1.00 1.08 1.70 1.74 0.92 0.92 0.93 0.93 2.06 1.78 1.25 1.01 1.00 1.27 2.06 1.78 1.00 1.01 1.44 2.09 1.00 1.00 1.09 1.90

255

TESTS FOR MONOTONIC TREND

TABLE 2.1a. Step function with Gaussian white noise n 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

(t

(T

0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3MK

0.060 0.060 0.060 0.060 0.184 0.086 0.062 0.058 0.468 0.184 0.086 0.062 0.686 0.469 0.184 0.086 0.694 0.693 0.575 0.253 0.050 0.050 0.050 0.050 0.221 0.090 0.060 0.052 0.635 0.221 0.090 0.059 0.978 0.634 0.221 0.090 0.994 0.991 0.802 0.315

f3AT

0.056 0.056 0.056 0.055 0.143 0.081 0.062 0.058 0.307 0.143 0.080 0.062 0.563 0.307 0.143 0.080 0.907 0.653 0.383 0.182 0.053 0.053 0.053 0.053 0.148 0.078 0.059 0.054 0.376 0.148 0.078 0.059 0.797 0.376 0.149 0.078 1.000 0.906 0.501 0.197

f3MK/f3AT 1.07 1.07 1.07 1.08 1.28 1.06 1.00 1.01 1.52 1.29 1.07 1.00 1.22 1.53 1.29 1.07 0.77 1.06 1.50 1.39 0.95 0.95 0.94 0.95 1.49 1.16 1.00 0.96 1.69 1.49 1.16 1.00 1.23 1.69 1.49 1.16 0.99 1.09 1.60 1.60

A. I. MCLEOD AND K. W. HlPEL

256

TABLE 2.1h. Step function with Gaussian white noise n 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

Q

IT

0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3MK

0.051 0.051 0.051 0.051 0.252 0.100 0.062 0.054 0.723 0.252 0.099 0.063 0.999 0.722 0.252 0.100 1.000 1.000 0.886 0.362 0.050 0.050 0.050 0.050 0.258 0.100 0.062 0.053 0.743 0.259 0.100 0.062 0.999 0.742 0.259 0.100 1.000 1.000 0.902 0.373

/3AT

0.051 0.051 0.051 0.051 0.142 0.074 0.057 0.053 0.394 0.142 0.074 0.057 0.881 0.394 0.143 0.074 1.000 0.967 0.545 0.194 0.051 0.050 0.051 0.051 0.135 0.072 0.056 0.052 0.384 0.137 0.072 0.056 0.891 0.384 0.136 0.071 1.000 0.975 0.539 0.185

f3MK//3AT 1.00 1.01 1.00 1.00 1.77 1.35 1.09 1.01 1.83 1.77 1.34 1.09 1.13 1.83 1.76 1.34 1.00 1.03 1.63 1.87 0.98 0.99 0.98 0.98 1.90 1.40 1.11 1.02 1.93 1.90 1.40 1.11 1.12 1.94 1.90 1.40 1.00 1.03 1.67 2.02

257

TESTS FOR MONOTONIC TREND

TABLE 2.2a. Step function with contaminated normal white noise n 10 lO lO

10 10 10 10 10 10 lO lO

10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

Q

(1'

0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3MK

0.062 0.071 0.056 0.059 0.236 0.109 0.083 0.042 0.519 0.224 0.103 0.065 0.701 0.505 0.219 0.104 0.689 0.695 0.588 0.314 0.041 0.049 0.054 0.054 0.311 0.094 0.066 0.058 0.721 0.285 0.111 0.068 0.966 0.748 0.287 0.103 0.992 0.974 0.865 0.412

/3AT

0.073 0.059 0.068 0.060 0.147 0.105 0.084 0.062 0.330 0.153 0.098 0.084 0.614 0.309 0.167 0.091 0.901 0.685 0.395 0.211 0.049 0.068 0.061 0.067 0.173 0.084 0.074 0.057 0.386 0.164 0.087 0.073 0.811 0.409 0.158 0.089 0.994 0.907 0.533 0.206

f3MK//3AT 0.85 1.20 0.82 0.98 1.61 1.04 0.99 0.68 1.57 1.46 1.05 0.77 1.14 1.63 1.31 1.14 0.76 1.01 1.49 1.49 0.84 0.72 0.89 0.81 1.80 1.12 0.89 1.02 1.87 1.74 1.28 0.93 1.19 1.83 1.82 1.16 1.00 1.07 1.62 2.00

A. I. MCLEOD AND K. W. HlPEL

258

TABLE 2.2h. Step function with contaminated normal white noise n 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

a 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00

(f'

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3MK

0.044 0.046 0.055 0.057 0.350 0.124 0.062 0.065 0.852 0.356 0.113 0.080 1.000 0.807 0.311 0.123 1.000 1.000 0.958 0.496 0.049 0.054 0.052 0.044 0.365 0.114 0.051 0.054 0.880 0.350 0.121 0.068 1.000 0.869 0.363 0.122 1.000 1.000 0.970 0.508

f3AT

0.053 0.065 0.058 0.063 0.161 0.080 0.081 0.055 0.418 0.153 0.074 0.076 0.897 0.365 0.126 0.076 1.000 0.968 0.537 0.220 0.050 0.048 0.060 0.072 0.147 0.073 0.067 0.067 0.387 0.123 0.083 0.062 0.899 0.357 0.135 0.068 1.000 0.976 0.570 0.193

f3MK/f3AT 0.83 0.71 0.95 0.90 2.17 1.55 0.77 1.18 2.04 2.33 1.53 1.05 1.11 2.21 2.47 1.62 1.00 1.03 1.78 2.25 0.98 1.12 0.87 0.61 2.48 1.56 0.76 0.81 2.27 2.85 1.46 1.10 1.11 2.43 2.69 1.79 1.00 1.02 1.70 2.63

259

TESTS FOR MONOTONIC TREND

TABLE 3.1a. Abelson-Tukey function with Gaussian white noise n

Q

iT

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3MK 0.059 0.060 0.059 0.059 0.178 0.083 0.062 0.059 0.494 0.178 0.083 0.062 0.904 0.493 0.178 0.083 1.000 0.964 0.646 0.247 0.051 0.050 0.050 0.051 0.128 0.069 0.054 0.051 0.357 0.129 0.068 0.054 0.827 0.357 0.128 0.068 1.000 0.930 0.497 0.173

{3AT 0.056 0.056 0.056 0.056 0.266 0.108 0.069 0.059 0.740 0.266 0.108 0.069 0.999 0.739 0.266 0.108 1.000 1.000 0.897 0.379 0.054 0.053 0.053 0.053 0.191 0.088 0.061 0.056 0.569 0.191 0.087 0.062 0.988 0.568 0.191 0.086 1.000 1.000 0.757 0.269

f3MK/{3AT 1.07 1.06 1.06 1.06 0.67 0.77 0.90 0.99 0.67 0.67 0.77 0.90 0.90 0.67 0.67 0.77 1.00 0.96 0.72 0.65 0.95 0.96 0.95 0.96 0.67 0.78 0.89 0.92 0.63 0.67 0.79 0.88 0.84 0.63 0.67 0.79 1.00 0.93 0.66 0.64

A. I. MCLEOD AND K. W. HIPEL

260

TABLE 3.lh. Ahelson-Tukey function with Gaussian white noise n

Q

tT

50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3MK

0.051 0.051 0.051 0.051 0.088 0.060 0.053 0.052 0.202 0.089 0.060 0.053 0.576 0.202 0.088 0.060 0.997 0.747 0.284 0.110 0.050 0.050 0.050 0.050 0.069 0.054 0.051 0.050 0.129 0.070 0.055 0.051 0.360 0.128 0.069 0.055 0.958 0.507 0.173 0.080

f3AT

0.051 0.052 0.051 0.051 0.119 0.068 0.055 0.053 0.328 0.119 0.068 0.055 0.855 0.328 0.120 0.068 1.000 0.965 0.472 0.158 0.051 0.051 0.051 0.051 0.089 0.060 0.053 0.051 0.207 0.089 0.060 0.053 0.625 0.208 0.089 0.060 1.000 0.813 0.297 0.111

f3MK/f3AT 0.99 0.99 0.99 0.99 0.74 0.88 0.96 0.98 0.62 0.74 0.88 0.96 0.67 0.62 0.74 0.88 1.00 0.77 0.60 0.69 0.98 0.98 0.98 0.98 0.78 0.91 0.96 0.98 0.62 0.78 0.92 0.96 0.58 0.62 0.78 0.91 0.96 0.62 0.58 0.73

261

TESTS FOR MONOTONIC TREND

TABLE 3.2a. Abelson-Tukey function with contaminated normal n 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

a 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00

(1'

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3MK

0.060 0.060 0.059 0.060 0.230 0.097 0.065 0.059 0.593 0.230 0.097 0.065 0.928 0.592 0.231 0.097 1.000 0.971 0.728 0.321 0.051 0.051 0.051 0.051 0.163 0.077 0.057 0.051 0.458 0.163 0.077 0.057 0.891 0.457 0.164 0.077 1.000 0.961 0.609 0.225

/3AT

0.063 0.063 0.063 0.063 0.323 0.128 0.078 0.067 0.780 0.323 0.128 0.078 0.987 0.780 0.324 0.128 1.000 0.997 0.891 0.452 0.060 0.060 0.061 0.060 0.214 0.097 0.069 0.063 0.615 0.214 0.096 0.070 0.972 0.614 0.214 0.097 1.000 0.993 0.781 0.302

f3MK//3AT 0.95 0.96 0.94 0.95 0.71 0.76 0.83 0.88 0.76 0.71 0.76 0.83 0.94 0.76 0.71 0.76 1.00 0.97 0.82 0.71 0.84 0.85 0.84 0.84 0.76 0.80 0.82 0.82 0.75 0.76 0.80 0.81 0.92 0.74 0.77 0.80 1.00 0.97 0.78 0.75

262

A. I. MCLEOD AND K. W. HIPEL

TABLE 3.2h. Ahelson-Tukey function with contaminated normal n 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

Q

IT

0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 5.00 5.00 5.00 5.00

0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00 0.50 1.00 2.00 4.00

f3MK

0.051 0.051 0.051 0.051 0.105 0.064 0.054 0.052 0.265 0.105 0.064 0.054 0.702 0.264 0.105 0.064 0.999 0.852 0.374 0.135 0.050 0.050 0.050 0.050 0.078 0.056 0.052 0.050 0.163 0.078 0.057 0.052 0.471 0.163 0.078 0.057 0.988 0.639 0.227 0.093

/3AT

0.056 0.056 0.056 0.056 0.122 0.072 0.060 0.057 0.342 0.122 0.072 0.060 0.863 0.342 0.122 0.072 1.000 0.957 0.494 0.162 0.054 0.054 0.054 0.054 0.089 0.062 0.056 0.055 0.206 0.088 0.062 0.056 0.642 0.206 0.088 0.062 0.999 0.826 0.299 0.109

f3MK//3AT 0.91 0.91 0.90 0.91 0.86 0.89 0.90 0.91 0.77 0.86 0.89 0.91 0.81 0.77 0.86 0.89 1.00 0.89 0.76 0.84 0.92 0.92 0.92 0.93 0.88 0.91 0.93 0.92 0.79 0.88 0.91 0.92 0.73 0.79 0.88 0.91 0.99 0.77 0.76 0.85

TESTS FOR MONOTONIC TREND

263

and 2.2a it generally holds true that the Mann-Kendall test outperforms the AblesonTukey test although there is one curious exception which occurs. In particular, when n = 10, a = 5 and tr = 0.5 we have f3MK = 0.694 and f3AT = 0.907 in the Gaussian white noise case and f3MK = 0.689 and f3AT = 0.901 in the contaminated Gaussian white noise case. Finally for the trend function based on the Abelson-Tukey function, one can see that the Abelson-Tukey method is better in almost all cases, as should be expected. The difference in power is roughly comparable to the differences one can see in Tables 1.1a through 2.1b. More generally, in situations when it is not known where the monotonic trend commences, the Abelson-Tukey contrast may be expected to outperform the Mann-Kendall test. On the basis of these simulations, one can conclude that both tests seem to perform reasonably well. In actual applications, it would be reasonable to use either or both tests.

ILLUSTRATIVE APPLICATIONS All datasets discussed in this section are available bye-mail from the statlib archive by sending the following e-mail message: send hipel-mcleod from datasets to statlibGtemper. stat. emu. edu. Additionally, the graphical output and analytical results are generated using the decision support system called McLeod-Hipel Time Series (MHTS) package (McLeod and Hipel, 1994a, b).

Great Lakes Precipitation A time series plot of the estimated total annual precipitation for 1900-1986 for the Great Lakes, is shown in Figure 2 along with a Cleveland's robust LOESS smooth curve (Cleveland, 1979). As shown by this smooth, there is an apparent upward trend. An autocorrelation plot, cumulative periodogram analysis and a normal probability plot of the residuals from the trend curve shown in Figure 2, are shown in Figures 3, 4 and 5, respectively. These figures suggest that the data could be modelled as Gaussian white noise. One can consult Hipel and McLeod (1994, Ch. 7) for details on the interpretation of these plots. In order to test the statistical significance of the apparent trend upwards one may use either the Mann-Kendall or the Abelson-Tukey methods. These methods yield test statistics T = 0.2646 and ZA = 3.22 with twosided significance levels of 2.9 x 10-4 and 3.3 x 10-3 , respectively. Therefore, the null hypothesis of no trend is strongly rejected. As a matter of interest, neither the autocorrelation plot nor the cumulative periodogram for the original data detect any statistically significant non-whiteness or autocorrelation. Hipel et al. (1983, 1994, Section 23.4) demonstrate that for monotonic trends in Gaussian white noise, the Mann-Kendall trend test clearly outperforms autocorrelation tests. This example also nicely illustrates this fact.

264

A. I. MCLEOD AND K. W. HIPEL

41 z

0

• • • • •• • • • • • RS80 • • • • •• • • •• •••••• •• • • •• • • •• • •• •• • • • • • •• • • • • •• •

. ... •

~

~ 33

a::

1AJ

V>

m 0





37



29

10

20

30

40

50

60

70

80

90

OBSERVATION NUMBERS Figure 2. Annual precipitation in inches for the Great Lakes (1900-1986).

1.0..-

... ... I...

0.5 i-l-

t..

U

«

0.0 -0.5

l~



-I

I

I I

~ ~ ~ ~ ~

l-

-1.0 0

I

I

I

I

I

I

I

I

I

1

2

3

4

5 LAG

6

7

8

9

I 10

Figure 3. Autocorrelation function (ACF) plot of the Great Lakes residual precipitation data (1900-1986).

TESTS FOR MONOTONIC TREND

265

1.0 ::E

< a::::

(,!)

0

O.B

0

0

a:::: 0.6 UJ ~

UJ

> 0.4

~

5::> ::E

::>

u

0.2 0.0 0.00

0.20

0.10

0.30

FREQUENCY

0.40

0.50

Figure 4. Cumulative periodogram graph of the Great Lakes residual precipitation data (1900-1986).

9 1=.028. S.L. .91029E+00

DSP=.045. .76202E+00

W=.9894, .94726E+00

Vl 1.&.1

-'

t=

z « o~

0.00

-'

~ -1.50 oz -3.00~~__~______~______~____~

0::

-4 a 4 EMPIRICAL QUANTILES

-8





8

•• •

Figure 5. Normal probability plot of the residuals of the Great Lakes residual precipitation data (1900-1986).

266

A. I. MCLEOD AND K. W. HIPEL

Average Annual Nile Riverflows Hipel and McLeod (1994, Section 19.2.4) show how the effect of the Aswan dam on the average annual riverflows of the Nile River measured just below the dam can be modelled using intervention analysis. They also demonstrate using intervention analysis that there is in fact a significant decrease in the mean annual flow after the dam went into operation in 1903. It is of interest to find out whether the trend tests are able to confirm this trend hypothesis by rejecting the null hypothesis of no trend. Figure 6 displays a time series plot of the mean annual Nile River flow for 1870-1945 (rn 3 / 8) with a superimposed Cleveland robust LOESS trend smooth. Figures 7 and 8 show autocorrelation and cumulative periodogram plots of the error component or residuals from the trend. These plots indicate that the error component is an auto correlated time series. Thus, Brillinger's trend test can be applied. We chose a running-average smooth with V = 8 in (5). The smoothed curve and original data are shown in Figure 9. Next a smoothing parameter for the spectral density estimation at zero is set to L = 5 in (6) and a 10% cosine-bell taper is used (see Figure 10). The resulting test statistic is ZB = -4.83 with a two-sided significance level of 1.4 x 10-6 • For comparison, the Mann-Kendall and the Abelson-Tukey methods yield test statistics T = -0.4508 and ZA = -4.52 with two-sided significance levels < 10- 10 and 6 x 10-6 , respectively. Brillinger (1989) demonstrates the usefulness of his method on a very long time series (n > 3 x 104 ) of daily river heights. Our example above, shows the usefulness of Brillinger's test even for comparatively short series (n = 75).

CONCLUSION The problem of testing for a monotonic trend is of great importance in environmetrics. Hipel and McLeod (1994) survey the literature on this problem and present several actual case studies. In this paper, we focus on tests for trend in the case of nonseasonal series. We compare basically two different methods for testing for monotonic trend. The older methods of Mann (1945) and Abelson and Tukey (1963) can be used when one can model the time series as a monotonic trend plus an independent and identically distributed white noise component whereas the new method of Brillinger (1989) can be used for the more general case when the time series is comprised of a monotonic trend plus a stationary autocorrelated error component. We show how a trend model can be fitted to time series data and examined to see whether it appears to be monotonic with a correlated or uncorrelated error component. The usefulness of our approach is demonstrated with two interesting illustrative environmetrics examples.

267

TESTS FOR MONOTONIC TREND





z 4000



• ••••

0

t=

<

~ 3200

L&J



(/)

m

0



2400

10

• •





. .. ... •

•• •• •

20

40

30

• ••• • • • • •• •• •



50

60

RS80

• •

70

80

OBSERVATION NUMBERS Figure 6. Average annual flows (water year) of the Nile River (m 3 /s) (1870-1945).

1.0 rl-

IlI-

0.5 i-I-

lL.

U

<

0-

I

0.0 ...

-0.5 -1.0



I

I



I

I

I

I

I

8

9

10

I

----

o

I 1

I 2

I

I

I

I

3

4

5 LAG

6

I 7

I

Figure 7. Autocorrelation function (ACF) of the error component from the trend of the average annual Nile River flows.

A.1. MCLEOD AND K. W. HIPEL

268

1.0 :E

< cr 0.8 (,!)

0 0 0

cr 0.6

I.&J

0..

I.&J

> 0.4 .....

:5 :::>

:E

:::>

u

0.2 0.0 0.00

0.20

0.10

0.30

0.40

0.50

FREQUENCY Figure 8. Cumulative periodogram plots of the error component from the trend of the average annual Nile River flows .

• •

Z

• ••

0

~

cr 3200 I.&J (/)

CO

:.

• .. • • -

• •• • •• •







0





•• •

2400

10

20

30





. .. •

40

• • ••• •

• •

• 50

60

70

80

OBSERVATION NUMBERS Figure 9. The smoothed curve and original annual data of the Nile River.

TESTS FOR MONOTONIC TREND

269

~

« a:::: 9.38 C,!)

0 0 0

a::::

6.25

L.U

D-

C,!)

0

.....J

3.13

o.0 0

LLU...u..L.u.L.lu..u.~u.....u..&.&..U.I..L.L.I..&...L.l.lu..a..&.""""""I..I..I..I...u..&.'-&.I

0.0

Figure 10.

0.4 0.5 0.3 0.2 FREQUENCY Test statistic with L = 5 and a 10% cosine-bell taper. 0.1

REFERENCES Abelson, R. P. and Tukey, J. W. (1963) "Efficient utilitzation of non-numerical information in quantitative analysis: general theory and the case of simple order", The Annals of Mathematical Statistics, 34, 1347-1369. Box, G. E. P. and Muller, M. E. (1958) A note on the generation of normal deviates, Annals of Mathematical Statistics 28, 610-611. Bloomfield, P. (1976), Fourier Analysis of Time Series, Wiley, New York. Brillinger, D. R. (1981), Time Series Data Analysis and Theory, (expanded edition), Holt, Rinehart and Winston, New York. Brillinger, D. R. (1989), "Consistent detection of a monotonic trend superimposed on a stationary time series", Biometrika 76, 23-30. Cleveland, W. S. (1979), "Robust locally weighted regression and smoothing scatterplots", Journal of the American Statistical Association 74, 829-836. Hipel, K. W. and McLeod, A. I. (1994), Time Series Modelling of Environmental and Water Resources Systems, Elsevier, Amsterdam. Hipel, K. W., McLeod, A. I. and Fosu, P. (1983), "Empirical power comparisons of some tests for trend, in Statistical Aspects of Water Quality Monitoring" , in Developments in Water Science, Volume 27,347-362, Edited by A.H. EI-Shaarawi and R.E.

270

A. I. MCLEOD AND K. W. HIPEL

Kwiatkowski. Hurvich, C. M. (1988), "A mean squared error criterion for time series data windows", Biometrika 75,485-490. Kendall, M. G. and Stuart, A. (1968). The Advanced Theory of Statistics, Volume 3, Hafner, New York. Kendall, M. G. (1973). Time Series, Griffin, London. Kendall, M. G. (1975). Rank Correlation Methods (4th ed), Griffin, London. Mann, H. B. (1945), Nonparametric tests against trend, Econometrica 13, 245-259. Marsaglia, G. (1976), "Random Number Generation", in Encyclopedia of Computer Science, ed. A. Ralson, pp. 1192-1197, Petrocelli and Charter, New York. McLeod, A. 1. and Hipel, K. W. (1994a) The McLeod-Hipel Time Series (MHTS) Package, copyright owned by A. 1. McLeod and K. W. Hipel, McLeod-Hipel Research, 121 Longwood Drive, Waterloo, Ontario Canada N2L 4B6, Tel: (519)884-2089. McLeod, A. 1. and Hipel, K. W. (1994b) The McLeod-Hipel Time Series (MHTS) Package Mannual, McLeod-Hipel Research, 121 Longwood Drive, Waterloo, Ontario Canada N2L 4B6, Tel: (519)884-2089. Tukey, J. W. (1960), "A survey of sampling from contaminated distributions" in Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, Edited by I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow and H. B. Mann, Standford University Press, Standford. Tukey, J. W. (1967), "An introduction to the calculations of numerical spectrum analysis" in Advanced Seminar on Spectral Analysis of Time Series, edited by B. Harris, pp.25-46, Wiley, New York. Valz, P., McLeod, A. I. and Thompson, M. E., (1994a, to appear) "Cumulant generating function and tail probability approximations for Kendall's score with tied rankings", Annals of Statistics. Valz, P., McLeod, A. I. and Thompson, M. E., (1994b, to appear) "Efficient algorithms for the exact computation of significance levels for Kendall's and Spearman's scores.", Journal of Statistical Graphics and Computation.

ANALYSIS OF WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION BYRON A. BODOl,2, A. IAN MCLEOD2,3, and KEITH W. HIPEV,4 1 Byron A. Bodo & Associates, 240 Markham St., Toronto, Canada M6J 2G6 2Department of Statistical and Actuarial Sciences The University of Western Ontario, London, Ontario, Canada N6A 5B7 3Department of Systems Design Engineering University of Waterloo,Waterloo, Ontario, Canada N2L 3G1 4Department of Statistics and Actuarial Science University of Waterloo,Waterloo, Ontario, Canada N2L 3G1 Methods are proposed for quantifl!ng long term mean annual riverine load reductions of the nutrient phosphorus lP] and other agricultural pollutants anticipated in southwestern Ontario Great Lakes tributaries due to farm scale nonpoint source [NPS] remediation measures implemented in the headwater catchments. Riverine delivery of NPS pollutants is a stochastic process driven by episodic hydrometeorologic events; thus, progress towards tributary load reduction targets must be interpreted as the expected mean annual reduction achieved over a suitably long, representative hydrologic sequence. Trend assessment studies reveal that runoff event biased water quality monitoring records are conceptualized adequately by the additive model Xi = iti + Ci + Si + Ti + ei where Xi is sample concentration, iti is 'global' central tendency, Ci is discharge effect, Si is seasonality, Ti is trend (local central tendency) and ei is residual noise. As the watersheds systematic hydrochemical response embodied in components Cj and Si has remained stable in the presence of gradual concentration trends, the expected mean annual load reductions may be inferred by the difference between the mean annual loads estimated by adjusting the water quality series to (1) pre-remediation and (2) current mean concentration levels where concentrations on unsampled days are simulated by Monte Carlo methods. Fitting components by robust nonparametric smoothing :filters in the context of generalized additive models, and jointly fitting interactive discharge and seasonal effects as a two dimensional field C®St are considered.

INTRODUCTION Diffuse or nonpoint source [NPS] water pollution by agricultural runoff has long been recognized as a significant issue in southern Ontario. During the 1970s under the aegis of the International Joint Commission [IJC], Canada and the U.S. undertook to define NPS impacts on the Great Lakes with the PL U ARG [Pollution from Land Use Activities Reference Group] studies that documented the extent of water quality impairment by agriculture in southern Ontario (Coote et al., 1982; 271 K. W. Ripel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 271-284. © 1994 Kluwer Academic Publishers.

272

B. A. BODO ET AL.

Wall et al., 1982; Miller et al., 1982; Nielsen et al., 1982; Frank et al., 1982). A 1983 review by the IJC (1983) noted that Ontario had yet to implement any comprehensive NPS remediation policies in response to PLUARG recommendations tabled in 1978. In 1987 Canada and the U.S. finally set specific remediation targets under Annex 3 of the amendments to the 1978 Canada-U.S. Great Lakes Water Quality Agreement (IJC, 1987) which call for a 300 tonne per annum reduction of phosphorus [P]loading to Lake Erie from the Canadian side. Presumably much of the reduction was to be achieved by NPS remediation which involves implementation of various farm scale land management practices including vegetated buffer strips along streambanks, low tillage cultivation, restricted livestock access to streams, solid and liquid waste management, and the retirement of erosion prone land from cultivation. Inherently stochastic NPS contaminant delivery mechanisms driven by randomly episodic hydrometeorological processes mitigate against attempts to establish the quantitative impact of abatement practices. While short term pilot studies at the plot and small watershed scale can demonstrate the general effect of a particular management practice, they often fail to provide a reliable basis for broad scale extrapolation because the hydrologic regime of the pre-treatment monitoring phase differs appreciably from that during treatment. Linking farm scale NPS abatement measures implemented in small headwater catchments to progress towards specific Great Lakes tributary nutrient load reduction targets poses a formidable challenge as very subtle trends must be detected against appreciable background variation imparted by stochastic NPS contaminant delivery mechanisms. Over the past decade, federal and provincial agencies began implementing NPS remediation policies in Ontario ostensibly to mitigate inland surface water quality problems from agricultural runoff, and hopefully, to reduce Canadian tributary nutrient loads to the lower Great Lakes. In the short term, miscellaneous farm scale abatement initiatives will not discernibly influence water quality in the main channels of larger rivers draining southwestern Ontario. However, the long term cumulative impact of many small scale improvements should ultimately manifest in downstream waters. It is presently unclear what has been achieved on a grander scale by the patchwork of Ontario agricultural pollution remediation programs. Some have been in progress since the mid 1980s and a comprehensive review of water quality trends across southwestern Ontario would be timely. This paper explores extensions of time series methods deVeloped for assessing long term water quality concentration trends to the problem of estimating long term trends in mass delivery of nutrients and other NPS pollutants to the Great Lakes.

RIVERINE MASS-DISCHARGE ESTIMATION In Ontario, the problem of river mass-discharge estimation came to the fore during PLUARG when tributary inputs were required for mass budgets of nutrient P for the Great Lakes. For tributaries draining sedimentary southern Ontario, instantaneous P concentrations usually correlate positively with streamflow. Thus, P mass delivery is dominated by relatively brief periods of high streamflow superimposed on seasonally maximal discharge norms that occur over late winter and early spring. Accordingly, Canadian PL UARG tributary surveys emphasized high frequency sampling of these dominant mass delivery periods. Because annual tributary mass loads estimated from flow-biased concentration data were highly sensitive to the method of calculation, as the standard reporting technique, the IJ C ultimately imposed a

WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION

273

method informally known as 'stratified Beale ratio estimation' by which the annual mass load is derived as the standard ratio estimate from sampling survey theory (Cochran, 1977) adjusted by the bias correction factor of E.M.L. Beale (Tin, 1965). To improve estimates, data were retrospectively blocked into two or more strata or flow classes of relatively homogeneous concentrations. Two way blocking by flow and time categories was also applied to treat both flow and seasonal variation. Beyond estimation technique, the quality of annual mass load estimates depends on the quality of the river monitoring record. At the outset of PLUARG, vaguely understood mass delivery phenomena and the formidable logistical burdens posed by runoff event sampling inevitably lead to unsampled high delivery periods and oversampled low delivery periods. Realizing that a watershed's fundamental hydrophysicochemical response had not changed appreciably from one year to the next, Ontario tributary loads were determined by pooling 1975 and 1976 survey data in order to improve the respective annual and monthly estimates (Bodo and Unny, 1983). Following PLUARG, Environment Ontario (MOE) implemented the 'Enhanced Tributary Monitoring' [ETM] program at 17 major tributary outlets to the Great Lakes where higher frequency sampling was to be conducted in order to more reliably estimate mass delivery for a limited suite of variables. Annual P loads were to be reported to the IJ C for inclusion in the biennial reports of the Great Lakes Water Quality Board. Due to minimal budgets and staffing, the execution of the ETM program was somewhat haphazard from the outset. Local 'observers' resident in the vicinity of the ETM sites were retained to collect and ship samples. Observers were instructed to sample more frequently at high flows with neither quantitative prescription of river stage for judging precisely what constituted high flows nor any regular supervision. Accordingly, sampling performance has varied erratically and critical high flow periods have gone unsampled. Initially from 20-120 and more recently from 20-60 samples per annum were obtained from which annual P loads were determined by two strata Beale ratio estimation with a subjectively determined flow boundary separating low and high flow classes. Each year was treated independently and flow boundaries were manipulated subjectively to force data into two classes. Consequently, annual P mass load estimates for Ontario tributaries reflect the vagaries of sampling practice as much or more so than legitimate trends and hydroclimatic variations that determine the true mass transport. Figure 1 shows the location of the 3 southwestern Ontario ETM sites most suitable for studying mass-discharge trends. The Grand and Saugeen Rivers which were the Ontario PL UARG pilot watersheds now have lengthy water quality records spanning 1975-1993. The ETM site in the Thames River basin, the most intensely agricultural watershed of southwestern Ontario, has low frequency data from 19661975 and higher frequency records from late 1979. Figure 1 also illustrates a problem common to ETM sites that are not co-located with flow gauges where flows estimated by areal proration from gauges in upstream or adjacent watersheds are employed to generate annual P loads. Errors in mean daily flow estimates derived for the Grand and Thames ETM sites are small as only 15% and 10% of respective upstream areas are ungauged. While studies (Dolan et al., 1981; Richards and Holloway, 1987; Young and DePinto, 1988; Preston et al., 1989) have shown stratified Beale ratio estimation to perform at least as well as other techniques for estimating annual loads in larger

B. A. BODO ET AL.

274

SCALE:

~

+

Figure 1. Map, major southwestern Ontario Grea.t Lakes tributaries. rivers, other approaches are better suited for evaluating long term. trends in m.a.ss delivery. A simple alternative estimate of annual load L is given by the sum (1)

where Xi is the concentration of the ith sample collected at time ti, Qi is the mean discharge over bi = (tHl - ti-l)/2, the time interval represented by sample i. This method produces acceptable results when the sampling is conducted at a frequency appropriate to the characteristic hydrochemical response of the stream. The rivers we consider herein are large enough that instantaneous How on any given day does not differ appreciably from the mean How for that day and that water quality concentrations do not vary significantly during the day. For these systems, dally sampling would produce very good annual load estimates. The quality of estimates would decline as sampling rates fell below the characteristic duration time of a runoff event. Technique (1) wa.s employed by Baker (1988) to estimate sediment, nutrient and pesticide loads to Lake Erie from U.S. tributaries. In contra.st to Ontario monitoring activity, the programs supervised by Baker and colleagues have obtained from 400-500 samples per year at the main sites which are co-located with How gauges.

TIME SERIES MODELS Motivated by anticipated need to detect improvements from agricultural NPS pollution abatement initiatives, MOE sponsored a research project (McLeod et al., 1991) to develop statistical methods for detecting subtle time trends in How-bia.sed water

WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION

275

quality time series like those generated at ETM sites. Lengthy concentration series from outlet sites of the two main Ontario PLUARG watersheds, the Grand and Saugeen Rivers, served as the development data set. Beyond statistical developments and some indications of modest trends, the results clearly demonstrated that these larger southwestern Ontario watersheds exhibit generally stable hydrochemical response over the long term. Fundamental concentration-flow relationships and seasonal cycles remain largely unchanged in the presence of negligible to modest concentration trends. Thus, to good first approximation, the water quality concentration process may be conceptualized as the additive process (2) X t is constituent concentration, Xt the central tendency of X t , Ct is covariate effect, 8 t is seasonal effect, Tt is chronological time trend and et is residual noise. Here Ct , 8t , and Tt represent relative departures from process mean Xt • Covariate effect Ct is defined by a functional relation Ct = I( Qt) with stream discharge. Trend Tt is the temporally local level of the constituent in the chronological time dimension t at which the process evolves. Seasonal 8t represents stable annually recurrent variation in the circular dimension of seasonal time T defined here on the interval [0,1] as the fraction of time from the beginning of the year. In decimal format, chronological time t is the sum of the year plus the seasonal fraction T.

For watersheds with stable hydrochemical response, these times series models of water quality concentrations afford a means of estimating reductions in riverine contaminant mass delivery attributable to changes in upstream land practice. Suppose that model (2) is successfully fit to the Grand River P series which extends from 1972-1992 and that we wish to evaluate net changes in annual P delivery that may have occurred since the PLUARG reference years 1975-76. Diagnostic graphics and trend tests reveal that P concentrations have declined gradually through the 1980s. After adjustments for flow and seasonal effects, we determine that the respective 1975-76 and 1990-92 mean levels were Xl and X2. Next we construct the two hypothetical series

(3a)

(3b) which amounts to centring the entire data series about the two respective reference levels. For each series we determine annual P loads by a consistent method for each year from 1975-1992 and average the results to obtain £1 and £2 the mean annual P loads as if mean levels Xl and X2 respectively had prevailed over the entire period. The difference a£ = £2 - £1 gives an estimate of the mean annual reduction in P mass delivery that could be expected for the hydrologic sequence observed from 1975-1992. On consideration of the intrinsic stochasticity of mass delivery, the binational P reduction targets for Lake Erie can be meaningfully interpreted only as mean annual estimates determined over a suitably long period of representative hydrologic record of 10 years or more. Recently, Baker (1992) proposed essentially this method for estimating the extent of P load reductions from U.S. tributaries to Lake Erie, and gave preliminary

B. A. BODO ET AL.

276

results suggesting that significant progress towards the U.S. P reduction commitment of 1700 t per annum (IJC, 1987) had been achieved by NPS remediation measures applied in the upper reaches of Ohio's Maumee lliver in the 1980s. Recent analyses (llichards and Baker, 1993) support the early results. Though Ohio monitoring data are superior, there is no good reason preventing the application of the same approach to Ontario's ETM records. Bodo (1991) applied a similar time series adjustment technique at the Thames ETM site to contrast differences in the expected seasonal risks of exceeding Canada's water quality guideline for the herbicide atrazine between years of high and low applications. The main requirement of the technique is a good time series model fit, demonstrated for the Grand and Saugeen lliver sites in the trend assessment study (McLeod et al., 1991). Due to hapzard ETM records and difficulties with stratified Beale ratio estimation, it is proposed to simulate concentrations on unsampled days according to model (3) and then determine annual loads by the technique of equation (1).

FITTING THE ADDITIVE MODEL COMPONENTS To determine mass-discharge trends, the best fit possible must be obtained for model (2) particularly at high flows which dominate mass transport. In the earlier trend assessment work (Mcleod et al., 1991) with Grand and Saugeen time series, model (2) systematic components were fit by conventional sequential reduction without iteration. First the discharge effect was estimated as a smooth function of flow 8; = !(Qi) with the LOWESS scatterplot smoother (Cleveland, 1979). Next, seasonal effects Si were estimated as the calendar monthly means of flow adjusted concentrations Vi = Xi - Xi - 8i • Finally, trend term Ti was determined as the LOWESS smooth of the flow-adjusted, de- seasonalized residuals Pi = Vi - Si' Difficulties with autocorrelation effects induced by event sample clusters were circumvented by analysis of both the original concentration series and the reduced series of monthly mean concentrations. While this approach was adequate for trend assessment, extensive experience fitting simplified seasonal adjustment models to low frequency water quality concentration series (Bodo, 1991) suggests various ways that the fitting of model (2) can be improved. It is useful to consider model (2) in the contemporary context of generalized additive models [GAM] (Hastie and Tibshirani, 1990) which are a broad generalization of linear regression in which the predictors on the right side - systematic terms Ct , St, Tt in model (2) - are arbitrary smooth functions. Though formal parametric models are permissible in the GAM framework, we consider fitting systematic terms by nonparametric smoot hers that are robust against outliers and asymmetric data distributions. Additive time series models are usually fit by iterative decomposition schemes (e.g., Mcleod et al., 1983; Cleveland et al., 1979) that are particular forms of the Gauss-Seidel algorithm known as backfitting (Hastie and Tibshirani, 1990) which gives the systematic components as

Gj = h(Xt

-

X-

2: Gk)

(4)

k:¢j

where Gj are Ct , Sf, and Tt for model (2) and hO are arbitrary smooth functions. Iteration reduces the influence of potentially significant sampling biases embedded in the initial fits of systematic terms. For the Grand and Saugeen lliver data

WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION

277

Figure 2. P concentration versus flow, Grand River. 1,000 ..J

........

E !0)

0 100 .... 0

'e

D-

10 10

...

.. 100

Flow cubic mI s

1,000

these include: (1) chronological biases toward heavily sampled periods, particularly the PL UARG years 1975-1977, (2) seasonal bias towards spring runoff months, and (3) a covariate bias towards higher flows. To varying degrees, these three sampling biases overlap. The seasonal and covariate bias are largely synonymous. In Figure 2, the unadjusted P concentration-flow relationship fit with the LOWESS smoother is biased towards the PL UARG years, the spring period, and high flows. On second and higher iterations of the backfitting algorithm, chronological and seasonal bias are reduced as the relation is determined on data that have been de-trended and de-seasonalized. To optimize the model fit, it is necessary to understand clearly the respective roles of the model components. In seasonal and chronological time respectively, seasonal St and trend Tt are expected to represent temporally local concentration norms where norms are interpreted as concentrations that prevail most of time. Consequently, model estimates St and 1\ should be close to the median concentrations expected at time t. In contrast, the modelled covariate effect Ct must represent as precisely as possible the contribution of high flows that are generally atypical of seasonal flow norms; hence, maintaining a bias towards high flows is deand 1\ to best achieve their objectives, sirable. Tuning the fitting procedures for will contribute to attaining the best fit possible of Ct. Iteration does not necessarily eliminate all the potential distortions introduced by sampling biases, but additional measures can be applied to optimize the fit of specific components. Heavy chronological and seasonal sampling biases introduce autocorrelation effects that play havoc with the LOWESS filter currently used to fit trend component Tt • LOWESS smoothing is controlled by a nearest neighbour specification. For time series smoothing, it performs best when data density is relatively uniform over the time horizon of the series. More uniform chronological data density was achieved in previous work by reducing data to a series of monthly mean concen-

'St

278

B. A. BODO ET AL.

trations before applying the LOWESS smoother. The procedure could be improved in the following ways. Because Tt should represent temporally local concentration norms, replacing the arithmetic mean with a median or a bi-weight mean (Mosteller and Tukey, 1977) would provide resistance to abnormal concentrations. Further improvement can be obtained by introducing sample weights Wf <X 6i = (ti+l -ti-l )/2 that restrict the influence of closely spaced samples. Component Tt is now estimated as the LOWESS smooth of a reduced series of monthly weighted means. Experience with a wide variety of low frequency water quality series (Bodo, 1991) suggests that Tt is most often best fit by tuning LOWESS to approximate smoothing at a fixed temporal bandwidth of about one year by specifying the number of nearest neighbours mas 1-2 times the typical annual sample density. Individual estimates at the original sample times may be derived by interpolating the smoothed output of the reduced series.

n

Like trend term Tt , St should represent seasonal variation in concentration norms; hence, a robust central tendency statistic is again preferred over an arithmetic mean. Approximating St as a smooth function by replacing the monthly mean sequence - essentially a bin filter with a mid month target point - with the output of a fixed bandwidth running bin filter applied on a uniform grid of target points distributed throughout the year will, by at least a small amount, better the fit of recurrent seasonal variation by eliminating end of month step discontinuities. Generally, a bandwidth of one month, or for simplicity 1/12 year, eliminates problems with high spring sampling densities and provides a sharp seasonal fit that is rarely improved by smaller bandwidths even with very high frequency water quality data. A current implementation, the seasonal running median rSRM] smoother (Bodo, 1989, 1991), employs 100 target points spaced every 3.65 days in non leap years. This may be excessive, but the added computational burden is small. For the ETM river data, the robust SRM :filter may yet generate output biased to heavily sampled years and flows higher than the expected seasonal norms. If the median is replaced by a weighted mean as the measure employed to summarize the observations within bins, weights can be introduced to minimize the influence of (a) heavily sampled years, and (b) sample flows remote from the expected seasonal mean daily flow norms. Weights defined previously can be applied to treat the chronological bias. Weights to minimize flow bias for sample i at chronological time ti could be proportioned as the probability of occurrence of sample flow Qi conditional on the seasonally local distribution of mean daily flows encountered at the corresponding seasonal target point Ti within a bin of the same bandwidth employed for seasonal smoothing. Alternatively, the flow weight might be made inversely proportional to the distance of sample flow Qi from the seasonal median of mean daily flows at time Ti. Some experimentation will be necessary to obtain acceptable weights. The influence of both chronological and flow bias on the definition of the seasonal St may ultimately prove inconsequential; nonetheless, the effects are presently unknown and some investigation is required.

w?

wf

The covariate term Ct may also be influenced by chronological and seasonal sampling biases. Applying the chronological weights Wf should approximately equalize the influence of all years on the fitted Ct. It is reasonable to ignore the seasonal sampling bias which is largely equivalent to the bias towards higher flows.

WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION

279

JOINTLY FITTING COVARIATE AND SEASONAL COMPONENTS A deficiency of additive model (2) is that it assumes that the shape of the covariate response function remains constant throughout the year, or equivalently, that the seasonal response remains constant at all flow levels. Effectively, the combined covariate and seasonal response Ct + 5 t maintains a constant functional relation with discharge that is simply adjusted upward or downward as the seasons progress. With model (2), there is no obvious way of adding a covariate-seasonal interaction term to account for a covariate response that varies seasonally. However, within the GAM framework it is permissible to jointly fit the response two or more predictors. For water quality concentrations it is most logical to consider modelling the concentration response surface to combined covariate-seasonal effects represented here as C05t • The additive time series model is now written

(5) This requires smoothing concentrations over the two dimensional cylindrical field defined by the circular seasonal time dimension and the longitudinal covariate dimension. The resulting response surface inherently includes covariate-seasonal interactions. LOESS (Cleveland and Devlin, 1988; Cleveland and Grosse, 1991), the multivariate extension of the LOWESS filter, has been used to fit C05t within a backfitting algorithm, but other procedures such as two dimensional spline smoothers could be considered. Experimental results (Figure 3) confirm suspicions that the shape of the concentration response to flow can vary through the course of the year. For the Grand and Saugeen Rivers, variations are generally confined to low flows which suggests that potential bias in the additive covariate-seasonal fit 8t + St of model (2) is small and may not unduly affect annual mass-discharge estimates. Though promising, model (5) currently remains best suited as a conceptualization aid. The main difficulties lie in optimizing the degree of smoothing obtained by the LOESS filter. Irregular water quality data distributions over the C05t field may confound the usual automatic smoothing control selection procedures such as cross validation (Hastie and Tisbshirani, 1990) which perform best with relatively uniformly distributed data. Currently, the best approach is by graphical comparison of the respective two dimensional and unidimensional filter outputs on thin strips of the C05t field. For example, for each month, concentration-discharge can be plotted with the LOWESS fit for that month's data and the LOESS fit at the mid point of that month. Similarly, for a series of discrete flow ranges, seasonal plots may be developed with overlays of the respective SRM filter output and the LOESS fit at the range mid point. With experimentation, it is expected that objective rules can be formulated for adapting LOESS and other two dimensional smoothers to jointly fitting the covariate and seasonal effect as the two dimensional field C05t •

AUTOCORRELATION EFFECTS While autocorrelation effects can be eliminated during fitting of the systematic model components, to simulate daily water quality concentrations it is necessary to develop a model for autocorrelation structure that is evident in the residual noise Ci of model (2) particularly during periods of high flow. To first approximation, simple autoregressive or ARMA error structures will probably suffice. Practically,

B. A. BODO ET AL.

280

-.....I

""E: e

~

~

~ ~

e

"'i

E:

"'i

t::J)

.~

Cl...

~

~

"I-

"-

<:;::)

~

t::J) C -.....I

....:

~

Figure 3. Joint discharge-seasonal effect, Grand River P series.

error models can be fit only to the series of daily concentrations derived by reducing multiple sample concentrations obtained on certain runoff event days to a single value. Presuming that a suitable noise model can be fit, concentrations can then estimated for missing days j by Monte Carlo methods. As the task involves filling gaps in the concentration series, closure rules are required to assure that concentration trajectories simulated forward from the left end of the gap, close within plausible proximity of the next observed concentration at the right end of the gap.

HYSTERESIS EFFECTS Hysteresis is a secondary effect that may merit attention for some water quality variables. The hysteresis effect is observed over runoff events when the concentration peak precedes the discharge peak with the net result that for the same flow, higher concentrations are observed on the rising limb of the hydrograph than on the recession limb. Sediment and sediment associated variables like P have exhibited hysteresis in both the Grand and Saugeen River monitoring records. As a result, the covariate response function underestimates concentration on rising flows and overestimates on receding flows. The net effects on cumulative annual mass-discharge estimates are likely small; nonetheless, some investigation is necessary to verify this hypothesis.

WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION

281

TIME VARYING MODEL COMPONENTS Covariate and seasonal effect that change over chronological time may significantly affect annual mass-discharge estimates. Because, P data in any given year except for PL UARG years are generally insufficient to precisely define the covariate term Ct at the high flows so crucial to accurate mass-discharge estimation, long term variations in covariate response must be investigated by segmenting the record into multi-year periods and comparing the fitted covariate and seasonal components. If necessary, long term mean annual mass delivery reductions can be evaluated via the hypothetical re-constructed water quality series

(6a) (6b) for the two respective reference periods where the systematic components C}j) and S~j) have been fit separately to two multi-year segments of the record. Preliminary analysis suggests that nutrient P appears relatively unaffected, but further confirmatory analysis is warranted.

In contrast to P, the nutrient species total nitrate (NO;=N03" +N02") is a dissolved ionic constituent with unique biochemical response. As the Saugeen lliver plot (Figure 4) shows, a well defined seasonal cycle dominates the series. Discharge variation is a virtually negligible determinant of seasonal NO; concentration variation which is driven by the annual growth-decay cycle of terrestrial and aquatic biomass. In agricultural watersheds, year-to-year shifts in mean level reflect mainly variations in annual fertilizer application rates. For the Saugeen lliver, the annual amplitude increases as annual mean level increases. Also the amplitude is affected asymmetrically as the late winter maxima increase disproportionately relative to the late summer minima that vary only from one year to the next. While the fixed amplitude seasonality models of (6a,b) are adequate to evaluate changes in mean annual NO; mass delivery between two reference periods with stable mean levels, introducing flexible models that permit year-to-year variation seasonal response would be advantageous for NO; series. Specifically the variable amplitude Fourier harmonic representation Sk,Tk,i

= Ak cos 21l"Tk,i + Bk sin 21l"Tk,i

(7)

where k indexes the year and Tk,i indexes the seasonal time of observation i in year k, is often employed for variables with well defined seasonal response (e.g., EI-Shaarawi et al., 1983). Because the summer minima are relatively stable from year to year, Sk,T should be fit to yearly periods beginning during the summer lows. To assure continuity from one year to the next, we can place 'knots' at the summer minima and formulate the seasonal model as a regression spline (Eubank, 1988). Additionally, time weights can be introduced to neutralize the usual spring sampling bias and robustness weights can be applied to reduce the effect of outliers. For long NO; series like those at the Grand and Saugeen lliver sites, there exists a distinct possibility that the seasonal model coefficients Ak, B k can be linked empirically to the respective mean level Xk in year k in which case long term NO; mass delivery variations can readily explored over a range of expected mean levels.

B. A. BODO ET AL.

282

Figure 4. Nitrate trends, Saugeen River

....I

.........

. . .0'

3

:

OJ

E c:

-

~

o

~

2

'-

c: ~ c:

8 0

74

76

78

80

82

84

86

88

90

SUMMARY Nonpoint source agricultural pollution of waterways is a stochastic process driven by episodic hydrometeorological events. Thus, riverine NPS mass delivery reduction targets must be estimated as mean annual loads over a suitably long, representative hydrological sequence. It is proposed to estimate the cumulative gradual impacts of farm scale NPS remediation measures implemented in the headwater catchments of southwestern Ontario Great Lakes tributaries by modelling and simulation of water quality concentration time series. Because the crucial systematic model components characterizing the river basin's hydrochemical response, namely discharge and seasonal effects, have remained largely stable over the horizon of the available data sets, probable mean annual NPS mass delivery reductions may be estimated using water quality time series simulated to represent (1) pre-remediation, and (2) posttreatment scenarios. It is expected that good results can be obtained by treating water quality series as generalized additive models and fitting the systematic model components by nonparametric smoothing filters. Preliminary trials with arbitrarily chosen smoothers have been able to capture 50-75% of the variability in available data series for the major nutrients phosphorus and total nitrates. Detailed investigations of sampling biases, secondary systematic effects and correlated residual error structure embedded in the available data are required to obtain an optimal model fit and ultimately to assure that good quality estimates of NPS mass delivery reductions are available to provide policy makers with a means a to assess progress towards the Lake Erie phosphorus reduction targets and, generally, the success of agricultural NPS remedial measures being applied in southwestern Ontario.

WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION

283

ACKNOWLEDGEMENT The first author's efforts were in part supported by a Natural Sciences and Engineering Research Council of Canada grant. REFERENCES Baker, D.B. (1988) Sediment, nutrient and pesticide transport in selected lower Great Lakes tributaries, EPA-905/4-88-00l, U.S. Environmental Protection Agency, Great Lakes National Program Office, Chicago. Baker, D.B. (1992) "Roles of long-term chemical transport in river basin management", In Abstracts, 13th Annual Meeting, November, 1992. Society of Environmental Toxicology and Chemistry. Bodo, B.A. (1989) "Robust graphical methods for diagnosing trend in irregularly spaced water quality time series", Environ. Monitoring Assessment, 12, 407-428. Bodo, B.A. (1991a) "Trend analysis and mass-discharge estimation of atrazine in southwestern Ontario Great Lakes tributaries: 1981-1989", Environ. Toxico!. Chem., 10, 1105-1121. Bodo, B.A. (1991b) TRENDS: PC software, user's guide and documentation for robust graphical time series analysis of long term surface water quality records. Environment Ontario, Toronto. Bodo, B.A., and Unny, T.E. (1983) "Sampling strategies for mass-discharge estimation", ASCE J. Environ. Eng. Div., 109, 812-829, 1984; "Errata and discussion", 110, 867-871. Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatterplots", J. Am. Stat. Assoc., 74, 829-836. Cleveland, W.S., and Devlin, S.J. (1988) "Locally weighted regression: An approach to regression analysis by local fitting", J. Am. Stat. Assoc., 83(403),596-610. Cleveland, W.S., and Grosse, E. (1991) "Computational methods for local regression", Statistical Computing, 1, 47-62. Cleveland, W.S., Devlin, S.J., and Terpenning, I.J. (1979) "SABL A resistant seasonal adjustment procedure with graphical methods for interpretation and diagnosis" , In A. Zellner {ed.) Seasonal Adjustment of Economic Time Series, U.S. Dept. of Commerce, Bureau ot the Census, Washington, D.C. Cochran, W.G. (1977) Sampling Techniques, 2nd ed., Wiley, New York, NY. Coote, D.R., MacDonald, E.M., Dickinson, W.T., Ostry, R.C., and Frank, R. (1982) "Agriculture and water quality in the Canadian Great Lakes Basin: I. Representative agricultural watersheds", J. Environ. Qual., 11(3), 473-481. Dolan, D.M., Yui, A.K., and Geist, R.D (1981) "Evaluation of river load estimation methods for total phosphorus", J. Great Lakes Res., 7, 207-214. EI-Shaarawi, A.H., Esterby, S.R., and Kuntz, K.W. (1983) "A statistical evaluation of trends in water quality of the Niagara River" , J. Great Lakes. Res., 9(2), 234-240. Eubank, R.L. (1988), Spline Smoothing and Nonparametric Regression, Dekker, New York. Frank, R., Braun, H.E., Holdrinet, M.V.H., Sirons, G.J., and Ripley, B.D. (1982) "Agriculture and water quality in the Canadian Great Lakes Basin: V. Pesticide use in the 11 agricultural watersheds and presence in stream water, 1975-1977", J. Environ. Qual., 11(3),497-505.

284

B. A. BODO ET AL.

Hastie, T.J., and Tibshirani, R.J. (1990) Generalized Additive Models, Chapman and Hall, London. IJC (1983) Nonpoint source pollution abatement in the Great Lakes Basin: an overview of post-PLUARG developments, International Joint Commission, Great Lakes Regional Office, Windsor, Ontario. IJC (1987) Revised Great Lakes Water Quality Agreement of 1978 as amended by Protocol signed November 18, 1987, International Joint Commission, Great Lakes Regional Office, Windsor, Ontario. Miller, M.H., Robinson, J.B.,Coote, D.R., Spires, A.C., and Draper, D.W. (1982) "A~riculture and water quality in the Canadian Great Lakes Basin: III. Phosphorus ,J. Environ. Qual., 11(3), 487-493. Neilsen, G.H., Culley, J .L.B., and Cameron, D.R. (1982) "Agriculture and water quality in the Canadian Great Lakes Basin: IV. Nitrogen", J. Environ. Qual., 11(3), 493-497. McLeod, A.I., Hipel, K.W., and Camacho, F. (1983) "Trend assessment of water quality time series", Water Resour. Bull., 19, 537-547. McLeod, A.I., Hipel, K.W., and Bodo, B.A. (1991) "Trend analysis methodology for water quality time series" , Environmetrics, 2(2), 169-200. Mosteller, F., and Tukey, J.W. (1977) Data Analysis and Regression: A Second Course in Statistics, Addison-Wesley, Reading, MA. Preston, D.S., Bierman Jr., V.J., and Silliman, S.E. (1989) "An evaluation ofmethods for the estimation of tributary mass loads", Water Resour. Res., 25,1379-1389. Richards R.P., and Holloway, J. (1987) "Monte Carlo studies of sampling strategies for estimating tributary loads", Water Resour. Res., 23, 1939-1948. Richards R.P., and Baker, D.B. (1987) "Trends in nutrient and suspended sediment concentrations in Lake Erie tributaries, 1975-1990", J. Great Lakes Res., 19(2), 200-211. Tin, M., (1965) "Comparison of some ratio estimators", J. Am. Stat. Assoc., 60, 294-307. Wall, G.J., Dickinson, W.T., and Van Vliet, L.P.J. (1982) "Agriculture and water quality in the Canadian Great Lakes Basin: II. Fluvial sediments", J. Environ. Qual., 11(3), 482-486. Young, T.C., and DePinto, J.V. (1988) "Factors affecting the efficiency of some estimators of fluvial total phosphorus load", Water Resour. Res., 24, 1535-1540.

DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY, ONTARlO 1973-1992 BYRON A. BOD01,2 and PETER J. DILLON3 1Byron A. Bodo & Associates, 240 Markham St., Toronto, Canada M6J 2G6 2Department of Statistical and Actuarial Sciences The University of Western Ontario, London, Ontario, Canada N6A 5B7 3Dorset Research Centre, Ontario Ministry of Environment P.O. Box 39, Dorset, Ontario, Canada POA lEO Historically, Clearwater Lake on the Precambrian Shield near Sudbury, Ontario, has received significant acid deposition from local smelters and remote sources. From 1.46x106 tonnes (t) in 1973, local S02 emissions fell to .64x106 tin 1991. To assess lake response, temporal trends were examined for 26 water quality variables with records dating from 1973-1981. But for brief drought-induced reversals, aqueous SO!- fell steadily from 590 p.eq/L in 1973 to under 320 p.eq/L in 1991-92, while pH rose from 4.3 in 1973 to exceed 5 in May 1992 for the first time on record. Disproportionate lake response to local S02 emission reductions suggests that remote-source acid deposition is an important determinant of Clearwater Lake status. Chloride adjusted base cation, AI and Si trends mirror SO~- trends indicating that geochemical weathering is decelerating as acid deposition declines. Lake levels of toxic metals Cu and Ni derived from local smelter emissions seem to have fallen appreciably in recent years and there has been a small surge in biological activity that may have declined abruptly in 1991. With its unique long term record, continued sampling in Clearwater Lake is advised to monitor the success of local and remote S02 emission reduction commitments in the U.S. and Canada.

INTRODUCTION For nearly a century terrestrial and aquatic ecosystems in the Sudbury area of northeastern Ontario have suffered the consequences of mining and smelting of nickel and copper bearing sulphide ores. Reports in the 1960s of soils and acidic surface waters (Gorham and Gordon, 1960a,b) with elevated levels of metals Cu and Ni (Johnson and Owen, 1966) prompted comprehensive limnological studies of Sudbury area lakes begun in 1973 (Dillon, 1984; Jeffries, 1984; Jeffries et al., 1984). Clearwater Lake, in a small headwater catchment 13 km south of the large nickel smelter in Copper Cliff (Figure 1) was maintained as an unmanipulated control for neutralization and fertilization experiments conducted in downstream lakes. The past two decades have seen substantial reductions in local smelter emissions of acid precursor sulphur dioxide (S02) and metals Cu and Ni. Pre-1972 mean annual S02 emissions of 2.2 x 106 tonnes (t) fell to a mean of 1.41 x 106 t/yr over 19731977 (Figure 2). Low emissions over 1978/79 and 1982/83 reflect extended smelter 285 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 285-298. © 1994 Kluwer Academic Publishers.

B. A. BODO AND P. J. DILLON

286

• SMELTER SITES

(Conislon inoclive)

"""" "i,"

"Y

?7 CLEARWATER LAKE

• FALCONBRIDGE

.CONISTON

o I

5 I

10 I

Km

81°00'

Figure 1. Location Map, Sudbury and Clearwater Lake. shutdowns. S02 emissions have declined gradually from .767 x 106 t/yr in 1984 to .642 x 106 t/yr in 1991 for an average yearly decrease of 17,900 t, at which rate target emissions of .47 x 106 t/yr or 1/2 the 1980 level should be reached in about 10 years. The Sudbury area also receives acid deposition from long range southerly air flows. Though quantitative estimates vary, eastern North American sulphur emissions peaked about 1970, declined significantly to about 1982 and then stabilized through 1988 (Dillon et al., 1988; Husar et al., 1991). Bulk deposition data south of Sudbury in Muskoka-Haliburton for 1975-1988 (Dillon et al., 1988; OMOE, 1992) and west of Sudbury in the Algoma highlands for 1976-1985 (Kelso and Jeffries, 1988) reflect the general eastern North American emission trends. As local and remote S02 emissions have declined, water quality has improved in the Sudbury region's acid-stressed lakes (Dillon et al. 1986; Gunn and Keller, 1990; Keller and Pitibaldo, 1986; Keller et al., 1986; Hutchinson and Havas, 1986); however, some lakes remain beset by chronic acidity and heavy metal concentrations that would likely prohibit the recovery of desirable aquatic biota. Recently, Keller and Pitibaldo (1992) noted that ongoing de-acidification trends in Sudbury area lakes reversed following drought in 1986/87. In addition to dry acid deposition that accumulates steadily throughout dry periods, drying and exposure to air induces re-oxidation of reduced sulphur retained in normally saturated zones of the catchment. Thus, when it resumes, normal precipitation can generate substantial acid loads. Keller and Pitibaldo present data for Swan Lake in the watershed neighbouring Clearwater

DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY

287

Lake that shows drought related acidification began as the drought ended in late 1987, was most prominent in 1988 and had begun to subside in 1989. Spanning nearly 20 years, Clearwater Lake records offer a unique, ongoing chronicle of de-acidification processes. The primary objectives of the present work were to rigorously review the entire historical data base from 1973, to update the chemical status of Clearwater Lake, last reported by Dillon et al. (1986) and to examine trophic indicators for signs of incipient biological recovery.

SITE CHARACTERISTICS AND WATER QUALITY DATA BASE Clearwater Lake hydrology and morphometry are summarized in Table 1. The lake stratifies near the end of May and turns over in late September. During the 1970s, the hypolimnion remained oxic with dissolved oxygen declining to 2-4 mg/L by late summer. Normally, ice cover forms in early December and breaks between mid March and mid April. The influent drainage area comprises 82% exposed aluminosilicaceous bedrock (quartzite, gabbro, gneiss) with the remainder as thin till (10%), peat (5%) and pond.s (3%). The lake margin supports some cottage development. Two of the four influent subcatchments are impacted by road salting. Detailed descriptions can be found in Jeffries et al. (1984) and OMOE (1982). Sudbury area annual precipitation trends are shown in Figure 3. The 1986/87 drought is the most significant extended dry period since the early 1960s. Below normal precipitation occurred from July 1986 to October 1987 coincidentally with extraordinarily high temperatures that extended into 1988, one of the wettest years on record. A lesser drought was observed in 1975/76. Cumulative precipitation deficits from long term norms were 147 mm (1976/77) and 286 mm (1986/87). Except for 1982, 1977-1985 precipitation was well above long term norms. Clearwater Lake water quality records extend from June 1973 to May 1992 for the 26 water quality variables in Table 2. Sample counts were determined after averaging quality control replicates. In the 1970s, 13-24 samples per year were obtained mainly during open water season. Approximately monthly sampling was maintained through the 1980s. Secchi depth and chlorophyll a are obtained only in open water season. Due to severe fiscal constraints, sampling has all but ceased as of 1992. In the 1970s, samples were analyzed in Sudbury and Toronto labs (OMOE, 1982). Over 1980/81 Dorset Research Centre assumed responsibility for perishable parameters including pH, alkalinity, and nutrients, while other analyses continued to be performed in Toronto. Dorset lab and field methodologies are described by Locke and Scott (1986). Some excessively crude 1973-1977 ion (Ca, Mg, Na, K, CI) measurements were excluded from analysis.

DATA VALIDATION Ionic species data were examined for consistency with (a) carbonate equilibria, (b) electroneutrality, and (c) equivalent conductance. Validation exercises were generalized as tests of agreement between two independently determined quantities within the framework of robust regression analysis, specifically, Rousseeuw and Leroy's (1987) 'high breakdown' Least Median of Squares [LMS] regression which can withstand contamination by up to 50% outliers. Resistance to outliers imparts greater confidence to the results and means to assess the frequency of measurement errors - an important aspect of retrospective data analysis.

B. A. BODO AND P. 1. DILLON

288

TABLE 1. Clearwater Lake morphometry and 1977-78 water balance. Drainage area

Lake area

Total area

Lake volume

Mean depth

Maximum depth

Shoreline length

342 ha

76.5 ha

419 ha

6.42 x 106 m 3

8.4 m

21.5m

5.0km

Precipitation

Evapotranspiration

Mean outflow

Retention time

918 mm

388mm

64.5 L/s

3.2 yr

TABLE 2. Clearwater Lake water quality record summary. Variable

1st year

Conductance pH Gran alkalinity Total alkalinity Ca* Mg* Na* K* CI* S0 4 F Si Al

1973 1973 1980 1979 1977 1977 1974 1975 1975 1973 1983 1973 1975

samples 238 231 114 95 169 165 213 191 186 217 97 226 175

Variable

1st year

samples

Fe Mn Cu Ni Zn DIe DOC N0 3 NH4 TKN P Chlorophyll a Secchi depth

1975 1975 1975 1975 1975 1981 1981 1973 1973 1973 1973 1973 1973

193 193 150 150 169 105 109 222 232 238 233 204 212

* early records considered unreliable

Carbonate Equilibria At the low pHs ranging from 4 to 5 in Clearwater Lake over 1973-1992, the definition of Acid Neutralizing Capacity (ANC) in fresh surface waters simplifies to

(1) where concentrations are expressed in equivalents. Virtual 1:1 correspondence between concurrent Gran alkalinity (ALKG) and negative hydrogen ion data (Figure 4) confirms that ALKG accurately represents Clearwater ANC. Scatter in the lower left quadrant is due to the earliest ALKG measurements of 1980/81. Robust regression between total alkalinity ALKT and -[H+] yields similar results. The relation

289

DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY

Figure 2. Sudbury smelter S02 annual missions; 1960-1991.

Figure 3. Annual precipitation deviations from norms; Sudbury airport 1956-1990

200

E E 100 I/)

E

(;

I:

E

g

o

~

::s

t::

[-100 CD

o

-200 1960

1970

1980

1990

Figure 4. Gran alkalinity versus -[H+ ] with LMS regression line.

56

61

66

71

76

81

86

Figure 5. Equivalent versus measured conductance, 19n-1992.

-5r---------------------~

115

E

Q.

-15

~ 105 ~

I:

~

"8

95

I:

E 85 CD

"ii >

'3 ti1 75 -45 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 -[H+ 1/Jeq/L

91

Measured conductance /JS/cm

B. A. BODO AND P. 1. DILLON

290

between the two alkalinity measurements is less precise than their respective relations with hydrogen ion; however, with ALKT as independent variable, the intercept and slope coefficients [-31.2, .731] are remarkably close to the theoretical approximation [-31.1, 1.0] of Harvey et al. (1980, Appendix V) for acidified Precambrian Shield waters.

Electroneutrality and Equivalent Conductance By electroneutrality, C+ = C- where C+ and C- are the respective sums of positively and negatively charged species, and ANC may be defined as ANCE = CB-CA where CB are base cations and CA are strong acid anions. For Clearwater Lake,

C+ = [H+] + [Ca2+] = [H+] + CB

+ [Mg2+] + [Na+] + [K+] + [NHt] + [Alm+] + [Fe2+] + [Mn2+] (2a)

and C-

= [SO!-] + [NO;] + [CI-] + [F-] = CA

(2b)

where all [.] are analytical total concentrations in p.eq/L except [Alm+] which should be total monomeric Al concentration. Available data for total aluminum; however, supplementary speciation data suggest that most Al in Clearwater Lake samples is inorganic monomeric. In (2a), net charge m of monomeric Al species is unspecified. According to speciation models (e.g., Schecher and Driscoll, 1987), below pH 4 most monomeric Al exists as AIH. As pH rises from 4 to 5 - the change in Clearwater Lake from 1977-1992 - the AlH fraction gives way to complexes that yield a net Al charge approaching 2. A comprise value of m = 2.5 was selected. Data quality was investigated via the relationships between C+ and C-, and between ANCE and titrated ALKG. A set of 172 'complete' ionic records for 19771992 was prepared by substituting time series model estimates (see Trend Analysis section) for missing observations if no more than two of the major species (Ca2+, Mg2+, Na+, CI-, SO!-) were absent. Before 1983, fluoride was estimated as [F-] = 1.57 + .0181 [SO;-] where [·1 are p.eq/L. Given the numerous potential sources of measurement error, the C1earwater charge balance for 1977-1991 shows good agreement with only a few outliers. Because ANCE estimates at Clearwater Lake pH comprise largely ·the cumulative measurement error of 13 ionic species, the ANCEALKG relationship was not especially insightful. Equivalent conductance (Laxen, 1977) was calculated for 167 'complete' 19771992 ionic records with concurrent specific conductance measurements. Figure 5, the plot of equivalent conductance against independently measured conductance with robust LMS regression line, shows good agreement but for several, mostly positive, outliers occurring mainly in 1977 when pHs fell to about 4, the lowest recorded observations. Were true pH 4.2, an incorrect measurement of 4 would overestimate [H+] equivalent conductance by 13 p.S/cm which would explain the conductance discrepancies of 1977. Regression of III records for 1981-1992 reveals that since 1981, about 3.6% of ionic records contain potentially erroneous measurements; however, in terms of total measurements performed, the error frequency is well below 1%.

DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY

291

TEMPORAL TREND ANALYSIS Trend analyses of the Clearwater Lake records were accomplished by robust, graphically oriented procedures for characterizing nonmonotonic (reversing) trends in irregular quality time series with seasonal periodicity (Bodo, 1991). Data series are conceptualized as either the linear additive processes

Zi = Ti + Ci Zi = Ti + Si + C;

(3a) (3b)

where index i corresponds to sample time ti and Zi is the data series comprising trend Ti representing the temporally local level of the constituent, optionally seasonal Si representing stable annually recurrent phenomena, and a residual noise Ci. Zi may be either raw concentrations G;, or an appropriate skew reducing transform of the raw series. Time trends in the sense of changes in Ti over time and the significance of seasonal cycle Si are judged by visual inspection of graphics and statistical diagnostics including the seasonal Mann-Kendall test (Hirsch et al., 1984). Often, it is most expedient to initially assume seasonal adjustment model (3b) which is solved iteratively for Ti and Si, and revert to the simple model (3a) if seasonal effects are negligible. With model (3b) plots are generated of (a) trend Ti superimposed on the seasonally adjusted series Zi-S;, and (b) the relative seasonal Si against the de-trended series Zi-Ti. Numerically, T; is fit by modified LOWESS smoothing filters (Cleveland, 1979) controlled to approximate short term year-toyear trends. Better results for a few variables (DOC, Secchi depth, Chlorophyll a) were obtained with a companion algorithm that fits Ti as the calendar year medians of de-seasonalized data in which case Ti plots as a step function. Seasonal Si is a zero centred function fit by a fixed bandwidth running median filter (Bodo, 1990) applied to de-trended series reordered by date within the year. Filter design minimizes distortions introduced by seasonally variable sampling density.

Trend Results: pH, SO!- and related minerals Increasing pH and declining SO!- reported by Dillon et al. (1986) to the end of 1985, have continued (Figure 6). Over 1977-1987, SO!- declined at a net rate of 23 p.eq/L/annum, while pH rose at a net rate of .06 units per year. After the 1986/87 drought, trends reversed during 1987/88, and since have continued as before. Sulphate seems to be declining at the same rate as over 1978-1987, but the rate of pH increase accelerated significantly in 1991; so that, the final May 1992 sample exceeded 5 for the first time in historical record. Total and Gran alkalinity behave almost identically to pH. Though hydrologically less severe, drought acidification effects during 1976/77 seem to have been stronger than in the 1987/88 episode. Greater dry acid deposition from local and remote sources over 1976/77 was likely responsible. Suspiciously low 1977 pH readings exaggerate effects on pH. Declining acid deposition should decelerate chemical weathering of the watershed's alumino-silicaceous terrain that should yield reduced aqueous levels of primary base cations (Ca2+, Mg2+, Na+, K+), Al, Si, and Mn (Dillon, 1983; Jeffries et al., 1984) that mimic the SO!- trend. Neutral road salt (NaCI, CaC12 ) applications have masked the expected base cation trends, but a plot (Figure 7) of base

292

B. A. BODO AND P. 1. DILLON

a

cations adjusted by subtracting chloride equivalence, i.e, C = [Ca2 +)+[Mg2+)+ [Na+)+[K+)-fCI-), mirrors the SO~- trend including the 1987/88 drought-induced reversal. Simiiar trends are strongly evident in Al and Si, and weakly in Mn and F. Collective ionic trends between 1977/78 and 1990/91 are summarized in Figure 8. Substantial decreases in H+, Alm+ and SO~- charge equivalence are compensated by road salt constituents Ca2+, Na+ and CI-; so that, ionic strength is virtually identical for the two periods.

Trend Results: Heavy Metals Heavy metals Cu, Fe, Ni and Zn have been significant components of Sudbury smelter emissions (Hutchinson and Whitby, 1977; Chan et al., 1984). While Cu, Fe and Ni are emitted as coarse particulate that deposits near the source, Zn emissions are fine particulates prone to wider dispersal. Concern focuses mainly on 'toxic metals' Cu and Ni which are present at significantly elevated levels in precipitation near the Sudbury smelters (Jeffries and Snyder, 1981; Chan et al., 1984). Cu and Ni trends (Figure 9) generally parallel the declining SO~- levels except for the 1987/88 drought-induced reversal. Recent 1990/91 data show substantial decreases in lake concentrations to 15-20 p.g/L Cu and 70 p.g/L Ni; however, since 1989, measurements are too few to consider these encouraging results conclusive. Zn shows similar but less dramatic decline as levels have fallen from near 50 p.g/L in 1976 to 10-15 p.g/L in 1991. Other than a late 1970s decline that may be related to reduced smelter emissions, Fe shows neither appreciable trend nor correlation with any other time series. Trend Results: N, P, DOC, Chlorophyll and Secchi depth Trends for total N [TN~TKN+NOS"), organic N [ON=TKN- NHt), and inorganic N [IN=N0S" +NHtJ are overlaid on Figure 10. Over 1973-1985, TN fell from 250 to 100 p.g/L, driven by declining IN as NOs dropped from 120 to 25 p.g/L and NHt fell 50 to 15 p.g/L. The cause is unclear as N has not been associated with smelter emissions (Chan et al., 1984) and no significant long range emission or deposition trends have been noted since the early 1970s (Dillon et al., 1988; Husar et al.; 1991, OMOE, 1992). From 1984, TN climbed back to over 150 p.g/L mostly due to a rise in ON to 100 p.g/L over 1989/90 indicating increased biological fixation that declined somewhat in 1991. Organic N correlates strongly with DOC whose 1981-1987 median level of .4 p.g/L, rose to .75 p.g/L over 1988-1991, a small but statistically significant increase (Figure 11). Phosphorus has shown little significant trend except for a decline from pre1979 mean of 4.5 p.g/L to a mean of 2.7 p.g/L since then that is likely the artefact of imp-roved measurement technology. Echoing recent organic N trends, after the 1986/87 drought P rose slightly in mean to 3.3 p.g/L over 1989/90 and declined to 1.7 p.g/L in 1991. Except for a 1973 high of 1.25 mg/L and a 1991 record low of .2 mg/L, annual median chlorophyll a concentrations have varied from .5-.85 mg/L about long term mean .66 mg/L with no appreciable trend. Historically, annual median Secchi depth varied in the range 7-10 m about long term mean 8.6 m without perceptible trend; however, the 1991 annual median rises to 11.1 m, matching a previously recorded high in 1973. Collectively, this suite of variables suggests that, in the late 1980s, there was a modest increase in primary biological productivity that may have fallen off abruptly in 1991.

293

DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY

Figure 6. pH and 804 (dashed line) trends 1973-92. r-----------------~1650

5.0 \

',\

\

\

\ 4.6

.

"

o:::!.

...: \-

450

\ '.

3.8

72

76

"

" ,

\"\

-,

80

84

88

92

500-

AI

Ii" ::L

C5 en

350

250

FiRure 8. Ion distribution diaRram; 1977-78 and 1990-91. 600-

~r-----------------~600

550

'.

4.2

Figure 7. Chloride adjusted base cation & aluminum (dashed line) trends.

200 72

350 Na

80

84

88

92

01

Na

,.,

t

90

400-

Mg

0

Figure 9. Ni and Cu (dashed line) trends 1975-92.

01

H

76

Mg

..J

IT 300-

804

~

200-

Oa

804

50 150

Oa 100-

0-

1977-78

1990-91

50

72

76

80

84

88

92

10

B. A. BODO AND P. J. DILLON

294

Figure 10. Organic, inorganic & total Nitrogen trends 1973-92.

FiQure 11. DOC trends 1981-92.

1.2.---------------,

2~~~~~~~~~~~-----;

......TN

xx

200

x

0.9

_Xli(

1~

...J

:::!.

Cio.6 E

g 100

0.3

xx xx x x x

FiQure 12. Total chlorophyll a trends

1973-92.

x xx

Figure 13. Inorganic N seasonal plot. 60

x

..

x !IE

lE x

...J

Ci ~

x )(

x

1.0

xx xx

x

40

•• • •

~

)(

-....





:J

x

...J

0;

1::

IiCD

"o C

I

E

..••••,.•• -.

.... ' ..... • ••• •

()

• ••

~-40

U

• •• I "

• ~L-~...J..~~~~L-~...J..~....J

0.1 l...I...-'-'-...i....I.............J.....'-'-.i-.l..-'-'-...I...J..................~ 84 72 76 80 88 92

0.0

0.2

0.4

0.6

0.8

Seasonal date as fraction of year

1.0

DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY

295

SEASONAL CYCLES For most Clearwater time series, strong cnronological trends overwhelm seasonality; nonetheless, examination of Si provides some useful insights into seasonal biogeochemical cycling. Figure 13 shows the strong, distinctive seasonal function for inorganic N that peaks between mid March and mid April (.2 yr) just before ice break-up, drops quickly to the end of May (.4 yr) just before stratification, declines more gradually to its annual minimum near the end of August (.7 yr), and then rises steadily to the late winter peak. The declining phase reflects biomass assimilation during the growing season, and the rising phase reflects IN release by decaying organic debris. The rapid loss of IN between ice break-up and stratification is mainly as NHt which then remains relatively stable until autumn overturn. In contrast, N0 3 declines gradually from ice break-up to late August. Similarly, NHt was preferentially consumed over N0 3 early in the growing season during fertilization experiments (Yan and Lafrance, 1984) in downstream lakes. The net IN seasonal amplitude for 1973-1992 is 60 ""giL to which NH+ and N0 3 contribute equally. Seasonal amplitude of IN species is most likely level dependent. Analysis of the log transformed IN series suggests that at recent IN levels of 50 ""giL the expected seasonal amplitude is about 35 ""giL. Other variables also show perceptible seasonal variation. The mineral ions (CaH , MgH, Na+, K+, Mn H , 80~-, Cl-, F-) and conductance exhibit a unimodal pattern with a broad peak extending from ice formation (mid December) to break-up (mid March to mid April) followed by rapid decrease to the annual low about mid May that is likely the result of dilution by low ionic strength snowmelt water. Though the net amplitude is only .4 mglL, silica has a strong cycle with a broad peak from ice break-up to stratification followed by a decline to a late summer low. Aluminum has shown a generally similar pattern. Fe has weak bimodal seasonality with a first higher peak that occurs after ice break-up and persists into stratification before falling to a mid summer low followed by a smaller secondary peak at autumn turn-over.

SMELTER S02 EMISSIONS AND CLEARWATER LAKE STATUS The link between lake status and smelter 80 2 emissions was examined with simple regression analo~ues of time series transfer function models of form (a) Yi = ao +aIXi-k, and (b) Yi = f30 +f3IYi-1 +f32Xi where Xi is either 802 orloglo S02 emissions lagged from k = 0,2 years, and Yi is annual median H+, pH or SO!-. For model (a), the current year's S02 emissions gave the best overall predictive capability; however, predicted lake response disagreed significantly with observed response for 1977 - a year of drought effects and suspiciously low pH data - and the smelter shut down years of 1978/79 and 1982/83. Using the previous year's 80 2 emissions as independent variable gives better predictions during smelter shut down years, but the quality of predictions declines in other years. In particular, lake pH is increasingly underpredicted since 1985. Regression with both current and previous years' 80 2 emissions is not significant. Model form (b) using the previous years lake response as an independent variable yields better results, but again, abnormal years were identified: (1) 1974 for which surficial grab sample mean pH and 80~- concentrations from OMOE (1982) were employed, (2) 1978 for which pH is underpredicted due to suspiciously low 1977 pH, (3) 1987 for which pH and SO~-

296

B. A. BODO AND P. 1. DILLON

are poorly predicted due to drought effects, and (4) 1991 for which pH is underpredicted. Forecasts of future steady state lake response under constant annual target Sudbury smelter S02 emissions of .47x10G t degenerate to implausible predictions 2-6 years forwards of the 1990-91 Clearwater Lake median concentrations used as initial values. At 19 years, annual lake concentration series are too brief to reliably fit a forecasting model; however, some tentative conclusions emerge. To varying degrees, statistical model residuals and plots of lake concentrations against S02 emissions, suggest that long term decline in Clearwater H+ and SO!- levels has been greater than expected if lake status were governed by simple linear dynamical response to local S02 emissions. Aerometric and bulk deposition data of the late 1970s (Scheider et al., 1981; Chan et al., 1984) suggested that beyond the immediate vicinity of the smelters (>5 km, Jeffries, 1984) acid deposition was dominated by remote sources. Thus regression results suggest that the much of the drop in Clearwater SO!- and H+ over 1973-1985 was attributable to general decline in North American S02 emissions over 1970-1982. Reasons for the continuing recent decline in Clearwater SO!- and H+ cannot be assessed until concurrent bulk deposition figures become available at sites remote from Sudbury smelters.

SUMMARY Clearwater Lake continues to respond favourably to declining acid deposition from local and remote sources, and declining heavy metal emissions from local smelters. In May 1992, a pH reading above 5 was observed for the first time in recorded water quality history and the lake is poised to experience significant biological recovery as further emission controls are implemented. Declining levels of chloride adjusted base cations, aluminum, silica, manganese, and fluoride confirm that mineral weathering rates have decelerated. Concentrations of toxic metals Cu and Ni have fallen appreciably over 1990/91. Since 1988, a small surge in biological activity occurred that appears to have declined abruptly in 1991 as indicated by DOC, organic N, P, chlorophyll and Secchi depth data. Though droughts of 1975/76 and 1986/87 induced brief reversals of de-acidification trends, Clearwater Lake is relatively drought resistant. Comparable data for neighbouring Swan Lake (Keller and Pitibaldo, 1992) reveal that some Sudbury area waters may remain at serious risk from episodic drought induced re-acidification and metal toxicity for some time after acid emission targets are achieved. Clearwater Lake acid-base status has improved disproportionately relative to local smelter S02 emission reductions supporting indications that remote source acid deposition is an important determinant of surface water status in the Sudbury area and that further improvements depend on reduced acid deposition from both local and remote sources. Maintaining adequate surface water monitoring in an era of severe fiscal restraint presents an immediate challenge. Metal analyses for 1990/91 are so sparse that the ability to characterize ambient levels and time trends within practical time horizons is severely jeopardized. With its unique long term record of de-acidification processes in an unmanipulated headwater catchment, Clearwater Lake ranks foremost among Sudbury area sites for continued surveillance to judge the success of remedial actions implemented in Canada and the U.S. through the forthcoming decade.

DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY

297

ACKNOWLEDGEMENT The first author's efforts were supported by a research grant from Limnology Section, Water Resources Branch, Environment Ontario. REFERENCES Bodo, B.A. (1989) "Robust graphical methods for diagnosing trend in irregularly spaced water quality time series" , Environ. Monitoring Assessment 12,407--428. Bodo, B.A. (1991) TRENDS: PC-Software, users guide and documentation for robust graphical time series analysis of long term surface water quality records, Ontario Ministry of the Environment, Toronto. Chan, W.H., Vet, R.J., Ro, C., Tang, A.J., and Lusis, M.A. (1984) "Impact oflnco smelter emissions on wet and dry deposition in the Sudbury area", Atmos. Environ., 18(5), 1001-1008. Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatterplots", J. Am. Stat. Assoc., 74(368), 829-836. Dillon, P.J. (1983) "Chemical alterations of surface waters by acidic deposition in Canada", p. 275-286, In Ecological Effects of Acid Deposition, National Swedish Environment Protection Board, Report PM 1636. Dillon, P.J. (1984) "The use of mass balance models for quantification of the effects of anthropogenic activities on lakes near Sudbury, Ontario", p. 283-347, In J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley, New York. Dillon, P.J., Reid, R.A., and Girard, R. (1986) "Changes in the chemistry of lakes near Sudbury, Ontario following reductions of S02 emissions", Water Air Soil Pollut., 31, 59-65. Dillon, P.J., Lusis, M., Reid, R., and Yap, D. (1988) "Ten-year trends in sulphate, nitrate and hydrogen ion deposition in central Ontario", Atmos. Environ., 22,901905. Gorham, E., and Gordon, A.G. (1960a) "Some effects of smelter pollution northeast of Falconbridge, Ontario", Can. J. Bot., 38, 307-312. Gorham, E., and Gordon, A.G. (1960b) "The influence of smelter fumes upon the chemical composition of lake waters near Sudbury, Ontario, and upon the surrounding vegetation", Can. J. Bot., 38, 477-487. Gunn, M.J., and Keller, W. (1990) "Biological recovery of an acid lake after reductions in industrial emissions of sulphur", Nature 345, 431--433. Harvey, H., Pierce, R.C., Dillon, P.J., Kramer, J.R., and Whelpdale, D.M. (1981) Acidification in the Canadian aquatic environment, Pub. NRCC No. 18475 of the Environmental Secretariat, National Research Council of Canada, Ottawa. Hirsch, R.M., and Slack, J.R. (1984) "A nonparametric trend test for seasonal data with serial dependence", Water Resour. Res., 20, 727-732. Husar, R.B., Sullivan, T.J., and Charles, D.F. (1991) "IDstorical trends in atmospheric sulfur deposition and methods for assessing long-term trends in surface water chemistry", p. 65-82, In D.F. Charles [ed.] Acid Deposition and Aquatic Systems, Regional Case Studies. Springer-Verlag, New York. Hutchinson, T.C., and Havas, M. (1986) "Recovery of previously acidified lakes near Coniston, Canada following reductions in atmospheric sulphur and metal emissions", Water Air Soil Pollut., 29, 319-333.

298

B. A. BODO AND P. 1. DILLON

Hutchinson, T.C., and Whitby, L.M. (1977) "The effects of acid rainfall and heavy metal particulates on a boreal forest ecosystem near the Sudbury smelting region of Canada", Water Air Soil Pollut., 7, 123-132. Jeffries, D.S. (1984) "Atmospheric deposition of pollutants in the Sudbury area", p. 117-154, In J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley, New York. Jeffries, D.S., Scheider, W.A., and Snyder, W.R. (1984) "Geochemical interactions of watersheds with precipitation in areas affected by smelter emissions near Sudbury, Ontario", p. 195-241, In J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley, New York. Jeffries, D.S., and Snyder, W.R. (1981) "Atmospheric deposition of heavy metals in central Ontario", Water Air Soil PoI1ut., 15, 127-152. Johnson, M.G., and Owen, G.E. (1966) Report on the biological survey of survey of streams and lakes in the Sudbury area, 1965. Ontario Water Resources Commission, 46 pp. Keller, W., Pitibaldo, J.R., and Carbone, J. (1992) "Chemical responses of acidic lakes in the Sudbury, Ontario, area to reduced smelter emissions, 1981-89", Can. J. Fish Aquat. Sci., 49(Suppl. 1), 25-32. Keller, W., and Pitibaldo, J.R. (1986) "Water quality changes in Sudbury area lakes: a comparison of synoptic surveys in 1974-76 and 1981-83", Water Air Soil Pollut., 29, 285-296. Keller, W., Pitibaldo, J.R., and Conroy, N.!. (1986) "Water quality improvements in the Sudbury, Ontario, Canada area related to reduced smelter emissions", Water Air Soil Pollut., 31, 765-774. Kelso, J .R.M., and Jeffries, D.S. (1988) "Response of headwater lakes to varying atmospheric deposition in north central Ontario, 1979-1985" , Can. J. Fish Aquat. Sci., 45, 1905-1911. Laxen, D.P.H. (1977) "A specific conductance method for quality control in water analysis", Water Res., 11, 91-94. Locke, B.A. and Scott, L.D. (1986) Studies of Lakes and Watersheds in MuskokaHaliburton, Ontario: Methodology (1976-1985), Ontario Ministry of the Environment, Data Rep. DR-86/4, Dorset, Ontario, Canada. OMOE (1982) Studies of lakes and watersheds near Sudbury Ontario: finallimnological report, supplementary volume 10. Sudbury Environmental Study, Ontario Ministry of the Environment, Toronto. OMOE (1992) Summary: some results from the APIOS atmospheric deposition monitoring program (1981-1988). Environment Ontario, Toronto. Rousseeuw, P.J., and Leroy, A.M. (1987) Robust Regression and Outlier Detection. Wiley, New York. Schecher, W.D., and Driscoll, C.T. (1987) "An evaluation of uncertainty associated with aluminum equilibrium calculations", Water Resour. Res., 23(4),525-534. Scheider, W.A., Jeffries, D.S., and Dillon, P.J. (1981) "Bulk deposition in the Sudbury and Muskoka-Haliburton areas of Ontario during the shutdown of Inco Ltd in Sudbury" , Atmos. Environ., 15, 945-956. Yan, N.D., and Lafrance, C. (1984) "Responses of acidic and neutralized lakes near Sudbury, Ontario, to nutrient enrichment", p. 457-521, In J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley, New York.

PART VI SPATIAL ANALYSIS

MULTIVARIATE KERNEL ESTIMATION OF FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA

U. LALL. Utah Water Research Laboratory. Utah State Univ .• Logan'UT 84322-8200. K. BOSWORTH. Dept. of Mathematics, Idaho State 'Univ:, poCatello, ID 83209-8500. A nonparametric methodology for exploring multivariate hydrologic data is presented. A multivariate kernel estimator for estimating the joint probability density function of a set of random variables is developed. A multivariate Gaussian kernel is used with its covariance matrix specified through a truncated singular value decomposition of a robust, local data covariance matrix. Estimators for conditional probabilities and expectations are also presented. An application to data from the Great Salt Lake is presented.

INTRODUCTION Statistical estimation problems of interest to hydrologists include spatial interpolation (e.g.• of groundwater levels or rainfall), assessment of space/time trends (e.g.• in contaminant concentration data), functional dependance between different parameters (e.g., rainfall and runoff). and generation of stochastic time series with attributes similar to the data (e.g. monthly streamflow into a reservoir). A basic building block for such analyses is the joint probability density function (p.d.f.) of two or more variables. Traditionally, hydrologists have explicitly or implicitly fit parametric (usually in a Gaussian framework) probability models to the available data. Such an approach can be parsimonious, expedient and efficient if the correct parametric structure is chosen fortuitously. The latter is not easily verified. Techniques that reveal the structure of the observed spatial or temporal field from the relatively sparse data are desirable. A refreshing alternative to the traditional parametric approaches was provided in the applications of nonparametric regression and density estimation to hydrologic problems by Yakowitz (1985); Yakowitz and Szidarovsky (1985); and Karlsson and Yakowitz (1987a and b). A perusal of the statistical literature shows that nonparametric statistical estimation using splines. kernel functions. nearest neighbor methods and orthogonal series methods is one of the most active and exciting areas in the field. with major developments still unfolding. Most of the statistical literature on the subject has a theoretical flavor and is concentrated on the univariate case. Exceptions are Silverman (1986), and Scott (1992). A pragmatic, multivariate kernel function estimator that is effective in moderate (3 to 5) dimensional settings for the estimation of the joint p.d.f. of a set of random variables, as well as for the estimation of functions related to the conditional p.d.f. of a subset of the variables is presented here. Nonparametric estimation schemes are weighted moving averages of the data and thus provide "local" estimates of the target function as opposed to parametric methods that are inherently global in their assumptions and application. While the local nature of the nonparametric estimates is attractive since it allows the procedure to adapt to the underlying function. it also leads to their suffering from a "curse of dimensionality" in multivariate settings. An exponentially increasing number of points is needed to densely populate 301 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 301-315. © 1994 Kluwer Academic Publishers.

U. LALL AND K. BOSWORTH

302

Euclidean spaces of increasing dimension. Consequently, the estimation variance increases dramatically for a fixed sample size as the dimension of the data set increases. Global methods (e.g., generalized additive models, projection pursuit, average derivative estimation, sliced inverse regression) to address this issue are reported in Scott (1992). These methods try to project the data into a subspace of ~maller dimension" d~temlined by some criteria. We pursue a similar strittegy here but with locally varying projections. The rationale is that even where the underlying structure is complex and high dimensional, it may be dominated locally by a few variables. Ability to reproduce Gaussian relationships was also of concern. This is similar in spirit to the locally weighted linear regression approach of Cleveland and Devlin (1988) and to the kernel estimation framework (NKERNEL) used by NSSS (1992). An application of the techniques that explores the inter-relationship between annual inflow (q) into the Great Salt Lake (GSL), and its annual precipitation (p) and evaporation (e) is used to illustrate the techniques developed. Scatterplots of p and e, q and p, and q and net precipitation (p-e), for the 136 year record are shown in figure 1. A cubic smoothing spline fitted to each data set, with smoothing parameter chosen by Generalized Cross Validation (GCV, see Craven and Wahba, 1979), is also shown. The nonlinearity of these relationships is apparent. The correlations are -0.8 for p and e; 0.15 for q and p; -0.06 for q and e; and 0.26 for q and (p-e). All correlations and scale parameters referred to in this paper are estimated using the robust procedures described in the Appendix. The GSL is in an arid region where e exceeds p on the average. Based on the q and (p-e) data, one can speculate whether there is a critical intermediate response regime related to aridity. Inflow seems to have little dependence on (p-e) during this regime. Outside this regime, q is responsive to (p-e). We shall revisit this question during the course of our exploration of this data. METHODOLOGY Kernel density estimators (k.d.e.) can be motivated as smoothed histograms, or as an approximation to the derivative of the cumulative distribution function of the data. A local approximation to the underlying density is developed through a weighted average of the relative frequency of data. This is achieved by averaging across (the convolution of) weight functions centered at each data point. Say we have data ui £ Rd, i= 1...n. Then the k.d.e. is: 1 n -d t(u) = h K«u - u. )Ih)

nL i=l

1

The pointwise bias and variance oft(u) depend on the sample size, the underlying density, the scale parameter or bandwidth h, and the Kernel function. If the weight or Kernel function K(.) is a valid p.d.f., t(u) is also be a valid p.d.f. Typical choices for K(.) are symmetric Beta p.d.f.'s, and the Gaussian p.d.f. In terms of asymptotic Mean Square Error (MSE) of approximation it has been shown that there is little to choose between the typically used kernels. The critical parameter is then h, since it governs the degree of averaging. Increasing h reduces variance of t(u) at the expense of increasing bias. Objective, data based methods (based on minimization of a fitting metric with respect to the class of smooth p.d.f.'s, and data based measures of their attributes, or with respect to a parametric p.d.f.

303

FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA

GSL Ann. Precipitation and Evaporation 0.6,--...,.--,---,---,--.....---,

·····1)~····L···~···t·············+···········+········.....L. . . . . ..

0.5 c:

i

0.4 +--l!d::-o=--+-:--r---+..--l---l

o

0.3

~

a.. 0.2

+---;---+----+--+---+---1

0.1

1.0

1.1

1.2

1.3

1.4

1.5

1.6

Evaporation

GSL Ann. Inflow and Precipitation

·----t--~---t·--t--

10.---~------~---------,

.

.

6+----+'---0-+i----'~_0-~i~-~

~

£

0 ~

0

4+--~+-=n~~~~~~~----4 :

91

o

2

0

···~·······;·······

i

o

00 :

i

0, ..········t·-······-·····

i

\

O+-~~~~-~+-~~~--l

0.1

0.2

0.3

0.4

0.5

0.6

Precipitation

GSL Ann. Inflow and Net Precipitation

10.-----.----,----,----,----,

· ----I----r----+--t-~-:

6

····_·········-r-··_·_···_··r·······o·······l·················t·__············

~

S

4

i 0 1001 0:

o~~ o~

i i

rP.

2+-_~~~~Oi:o~~~~~_~0~ O+---~--~+-~-+~--~--~

-1.5

-1.3

-1.1

-0.9

-0.7

-0.5

net precipitation

Figure 1 Relationships between GSL Data (Solid Line is a Cubic Smoothing Spline)

U. LALL AND K. BOSWORTH

304

considered as a viable choice) for choosing h are available (see Scott (1992». Minima of MSE based criteria with respect to h, tend to be broad rather than sharp. Thus, even when using an objective method for choosing h, it is desirable to examine the estimate at a range of h values in the neighborhood of the optimum. In the multivariate setting K(.), two popular choices for K(.) are : "The Product Kernel":

1

~(u) =Ii'

n d 1 .. ' LIT h K.«u.- u ..)Ih') i=l j=l j J J 1J J

with independent bandwidths hj and univariate kernels ~(.) located at ~j; and "Sphering:"

t(u) = det (S) nhd

·1(2 n

L K(h-2(U i=l

u/8-\u - u J ) 1

1

where 8 is a dxd data covariance matrix, and K(v) =(21t)d/2e-v/2; i.e. a Gaussian function. The motivation for these choices is that the attributes of the p.d.f are likely to vary by j, and that a radially symmetric, multivariate kernel (i.e. h invariant with j) would be deficient, in the sense that it may simply average over empty space in some of the directions j. Consequently, at least variation of h by j is desirable. The product kernel has the disadvantage that it calls for the specification of d bandwidths by the user. Where a Gaussian kernel is used, the product kernel is equivalent to the sphering kernel with an identity covariance matrix. If the variables of interest are correlated, sphering is attractive since it aligns the kernel with the principal axes of variation of the data. The advantage of this is striking, if the rank of the data covariance matrix, 8, is r« d, i.e., the data cloud in d dimensions can be resolved completely in r linearly independent directions. Further, the bias oft(u) is proportional to the Hessian matrix off(u). The matrix 8 is proportional to an approximation to the Hessian. Thus sphering can help reduce the bias in t(u) by adjusting the relative bandwidths. Wand and Jones (1993) show that while sphering offers significant gains in some situations, it can be detrimental where the underlying p.d.f. exhibits a high degree of curvature, and/or it has multiple modes that have different orientations. It is clear that it is worth exploring locally adaptive bandwidth variation. For example, suppose the underlying p.d.f. has two distinct modes that have different directions for their principal axes, and perhaps involve different subsets r1 and r2 of the d variables. A reasonable adaptation of the bandwidth would be achieved by partitioning the raw data into the 2 appropriate subsets, and using matrices 81 and 82 that are aligned with the principal axes of the two modes. This is indeed the strategy followed here. Our k.d.e. algorithm is outlined in Table 1. A manuscript with a more formal presentation is in preparation. First, it is desirable to scale the data so that all variables are compatible. This can be done by mapping each coordinate to the interval [0,1], "normalizing", i.e. subtract the mean and then divide by the standard deviation, or through an appropriate logarithmic or other transform. Then the scaled data is recursively partitioned into a k-d tree (Friedman (1979». An illustration of this process with the q, p data is shown in Figure 2. The first partition was for p, at 0.34. The two resulting partitions were then split along q, as shown. Another iteration takes us to the eight partitions shown, each with 17 points. Note that requiring each partition

305

FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA

TABLE 1 K.D.E. ALGORITHM

1. Scale the data so that each column has the same scale 2. Partition the data ui> i=1..n into a k-d tree (npan partitions) - Split each partition at the median of the coordinate with'the greatest spread - Stop splitting partitions if resulting number ofpoints in the partition <mink mink =max(d+2,..Jn)

- Define indicator function Iij=1 if ui is, or 0 if it isn't in partition j 3. Compute a (dxd) Robust Covariance Matrix Cj for each partition j (see Appendix) 4. Perform a Singular Value Decomposition (SVD) of Cj = EjAjE{ where Ej is an eigenvector matrix (EjTEj=I) and Aj is an ordered (descending) diagonal eigenvalue matrix corresponding to Ej- All matrices are (dxd) 5. Retain the rj leading components of the decomposition, such that r.

d

J

Dn .1 ~)'n . ~

1=1

1=1

J

crit ,where crit is 0.95 or 0.99

J

6. Form Sj = EjAjEjT ,where Ej is (dxrj), Aj is (rjxrj), and EjT is (rjxd) 7. Define the multivariate p .df. estimate as: n npart -r./2 or. T -t -2 t(u) = n- 1 I, I, I.. (21t) J h Jdet(S.) -1/2 e - 0.5(u - u j) Sj (u - Uj» h i=l.i=l

IJ

J r.

where h is a specified bandwidth, det(S.) = ITA... and S ~t=E.A,-IE~

J

1=1 J,j

J

J J

J

8. For a partition of u defined as (y, x), where the dimension of y is q, and of x is (d-q), the conditional p.d.f. f(ylx) is estimated as: n npart

t(ylx)

n

=(I, I, I..G ..(y) i(x.) )/I,t(x.) i=l .i=1

IJ

IJ

i=l

1

1

where t(Xi) is the p.d.f. of x evaluated at Xi, and Gij(y) is a q-variate Gaussian p.d.f. with mean Rj. "

=y,+ S YX,J,S XX,J.(x - x,), and covariance Syy x J' =S YY,j,- SYX,j,S XX,j,S XY,j.. ~

~

1

1

.,

9. The regression/unction r(x) =E[ylx], of y (q=l) on x is then estimated as: n npart

f(ylx)

=(I, L

i=l.i=l

I ..R. ,t(x,) Ij

IJ

1

n

)/I,t(x.) i=l

1

U. LALL AND K. BOSWORTH

306

K-D Tree Development for Great Salt Lake Data 1.0

Full Sample Correlation 0.15, (136 years) 17 points Ipartition

0

0.8

0 -0.20

0

°ta

a .(3

0.6

!!? Q. "iii ::l c c

1\1

0.4

o

0

o 'b "0 o ~


"iii

0.2

00 0

0.0 0.0

0.2

0

0 4110

00

0.17

0

~

OJ 0 0

000 0.23 Of$) 0 0 ~.2 0

0

I

0

0.0'9 0 ftQ. ... ~. ~ ;~

I?

000

00

j,':'c 0.2l oX; 0 0 o

"0

1Il

, c

0

0.55

0

c

0

0

0

0

0

0

0 0

0

-0.20

0

0.4 0.6 scaled annual inflow

0.8

1.0

Figure 2 to have the same number· of points leads to partitions that are large in data sparse regions. The partition variance is larger, leading to a larger effective bandwidth. A natural adaptation of the bandwidth to tails and modes of the data is thus effected. The emphasized numbers in each box report the robust correlation between q and p values in the box. We find, the clustering of data shown by a k-d tree to be a useful data exploratory tool as well. Here, note that the partition corresponding to the highest p,q values has the highest p,q correlation; correlation varies with partition; and that most partitions have p,q correlations that are higher than the full sample value. As the number of points in a partition approaches the dimension of the space, d; the matrix Cj becomes singular. This brings up the need to decide on an optimum data partitioning. Our approach is exploratory. The number of partitions is a smoothing parameter - bias decreases, variance increases as the number of partitions increases. Arguments for parsimony suggest that using a minimum number of partitions. Our experiments (also, NSSS(1992» with a variety of synthetic data, suggest that a value of mink greater than d, and somewhere between vn and n4/(4+<1) works well. Since the variation in mink is by factors of 2, consistent results upon partitioning are significant. So the strategy is to form a sequence of estimates, starting with the full sample. A robust covariance matrix is computed for each partition, and a truncated singular value decomposition of the matrix is performed. The resulting matrix Sj is then used for multivariate k.d.e, as outlined. Robustness is important, since the effect of an outlier is pronounced as the sample size shrinks upon partitioning. It can force the covariance matrix to be near singular, and lead to the specification of a wild eigenvector sequence, and hence

FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA

307

of kernel orientation. One can choose the bandwidth h "automatically" by optimizing a criteria such as maximum likelihood or mean integrated square error (MISE) through cross validation. However, it is known that such estimators have very slow convergence rates (O(n-l/l~ for d=I), high variance (induced by fme scale structure), and are prone to undersmoothing. The recent trend in the statistical literature (Scott (1992» is to dev.elop data based in~tl)ods for choosing h, that use recursive estimation of the terms (involving derivatives of f(u» needed in a Taylor series expansion of MISE, and thereby developing an estimate of the optimal h. In the univariate case, this can get a close to theoretically optimal convergence rate of O(n- 1f2). Similar developments in the multiva.tiate case are forthcoming. In the data exploration context, choosing h by reference to a known parametric p.d.f. is reasonable. It has Bayesian connotations - correct choice under proper specification, and adaptation under mild mis-specification. Since we are interested in discerning structure in the data, it is desirable to oversmooth to avoid focusing on spurious structure. Choosing h by reference to a Gaussian p.d.f. is attractive in this context, since for a given variance, one would oversmooth relative to a mixture, and the multivariate framework is well known. In our context, Silverman (1986), gives the MISE optimal h with reference to a multivariate Gaussian as: h

opt

={4/(d+2)}

1/(d+4) -1/(d+4)

n

It was pointed out earlier that the number of partitions is a smoothing parameter as well. We have not yet determined the optimal h taking that into account The use of mink, instead of n in the above has been used with some success. NSSS (1992) uses a strategy similar to our kd.e., and simply takes h=l, regarding the estimator as a convolution of Gaussians with locally estimated covariance structure. At this point, recall the insensitivity of MISE to h, and note that increasing h, or reducing variance smooths out structure and vice versa, without affecting MISE very much. If the goal is data exploration, varying h and noting data structures resistant to h variations is desirable. We shall illustrate the effect of such choices by example in the next section. The conditional p.d.f. estimator (item 8, Table 1) is based on a weighted convolution of conditional p.d.fo's centered at each observation, with weights proportional to the estimated density at the point. One can also estimate this as t(u)h(x), or equivalently by taking the appropriate slice out of an estimate t(u) and normalizing it. The regression estimator (item 9, Table 1) is presented in Owosina et al (1992), and compared with other nonparametric regression schemes for spatial estimation. NKERNEL by NSSS(1992) has the same framework as described here, except for data partitioning, and treatment of the covariance matrix (no SVD, no robustness). APPLICATIONS Selected kd.eo's for the data set introduced earlier are presented in figures 3 through 8. In each case, the variable referenced first is on the x-axis, and second on the y-axis. The kd.e. estimate of the p and e p.d.f. (fig. 3), with I partition and h=O.4 (Gaussian reference), appears consistent with a bivariate Gaussian density with a correlation coefficient of -0.8. The k.d.e. of the q and p p.d.f. (fig. 4) was constructed with 8 partitions (as in fig. 2), and h=1 (oversmoothed), clarifies the features apparent from the clusters in figure 2.

U. LALL AND K. BOSWORTH

308

P d

Evaporation

0.00 0.10 0.20 0.30 0.40 O.SO 0.60 0.70 0.80 0.90 Precipitation

Figure 3 Joint p.d.f. ofp and e, npart=l, h=O.42

309

FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA

0.96

Precipitation

P 70 r e c p i t

a t

i 0

n

10 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Inflow

Figure 4 Joint p.d.f. of q and p, npart=8, h=O.42

U. LALL AND K. BOSWORTH

310

K.d.e.'s of the p.d.f. of q, p and e were also constructed and evaluated along a slice defined through the line segment between (0,0.72) and (0.72,0) on the p,e axis (i.e. scaled p-e=-O.72). This line segment corresponds approximately to the principal axis of the p, e p.d.f. in figure 3. Figure 5 was constructed with 1 partition and a h chosen by reference to a Gaussian p.d.f.. The kernel orientation and the bandwidth are global~y prescribed. We see two weakly separated modes in the conditional p.d.f. in contourS that suggest a skewed, p.d.f., with principal axes consistent with the eigenvalues and eigenvectors of the (q,p,e) covariance matrix. In figure 6, we worked with four partitions, and h=1. With this modification we see one mode in the p.d.f., but more complex structure in the joint p.d.f. than in figure 5. Finally if' Figure 7, we worked with 4 partitions, but with h=O.4 (Gaussian reference). The structure of figure 6, has now sharpened into 2 modes with major principal axes that are nearly orthogonal to each other. This is consistent with figure 1, where we saw low dependence between q and (p-e) in the middle range of the data. The antimode between may reflect an instability in the q, (p-e) relationship, speculated about in figure 1. Correlation between q and the other variables is weak, and the n is small for estimating a trivariate k.d.e., let alone its conditionals. The correlation between p and e is relatively high. One would expect k.d.e. to perform poorly for conditionals of q on the other variables. A direct k.d.e. of q and (p-e) (figure 8) is similar to the p.d.f. for the slice from (q,p,e) for npart=1 and 4 (figures 5 and 7), h by Gaussian reference, but d=2, instead of 3. The purpose of this application has been to illustrate k.d.e's potential for revealing the underlying structure in hydrologic processes, as an aid to better understanding. We find the nonparametric estimates (regression in fig.l, scatterplot organization in fig. 2 and p.d.fs in the others) to be very useful tools for data exploration. The application shows the effect of varying the bandwidth and the number of partitions on the resulting k.d.e. As expected partitioning affects the orientation of the kernels, as well as the degree of local smoothing and hence of the resulting p.d.f., while the bandwidth controls the degree of smoothing and the ability to see modes. Varying both, clarifies the underlying structure.

SUMMARY The utility of multivariate k.d.e. for using data to improve our understanding of hydrologic processes is obvious. Model specification, as well as estimation of hydrologic variables can be improved. Pursuit of k.d.e. to generate vitriolic debates on the multimodality of a data set, or to justify one's favourite parametric p.d.f is counterproductive. Choices of models and estimation procedures are always subjective at some level. The spirit of k.d.e is to highlight features of the data set at the expense of estimation efficiency (in the classical parametric sense). Clearly artifacts of the particular realization at hand are likely to be highlighted as well as genuine underlying features. Implementations of multivariate k.d.e. need to be carefully designed to balance such a trade-off. The k.d.e. algorithm presented here was shown to be effective in circumventing the detrimental effect of sphering with heterogeneous data as shown by Wand and Jones (1993). The p.d.f. in figures 5 through 8 appears to be precisely of the type that would be obscured by global sphering. We noted that was the case, and that the structure was resolved upon partitioning. Further work on improving parameter specification is needed and is in progress.

311

FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA

/'

//~ 1

.

25.00 /

22.50~

20.00-i 17.50

1

15 .00 ~

p

e

c e

J-I-'-4-..L.1-LLj-.l....!....I....I-I-.Ll....L...LO.OO

0.00

0.21

0.42

0.62

0.83

Inflow

Figure 5 Joint p.d.f. of q and slice of p,e along (0,.72) to (.72,0), npart=1, h=O.42

U. LALL AND K. BOSWORTH

312

Inflow

~0

o

.00

0.85 042 0 .64 0.21' p.e slice

p

+ H-+t-t-1H-++++ 0.64 ~

s ->-H-+t-t-1H-+O.42 I c

e

0.00

0.21

0.42 0.62 IDflow

0.83

Figure 6 Joint p.df. of q and slice of p,e along (0.72) to (0.72,0), npan=4,h=1

FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA

313

60.00 54.00 48.00 42.00

~ 36.00 ~ 30.00

~ 24.00 l I8.00

~ 12.00

L6.00

Inflow

:;:3 0.21 0 .42 0 0.00

0.64

0.85 p,eslice

++-+-H++-+-H+ +++0.85

p.e slice

0.00

0.21

0.42 0.62 Inflow

0.83

Figure 7 Joint p.d.f. of q and slice of p,e along (0,.72) to (.72,0), npart=4, h=O.45

U. LALL AND K. BOSWORTH

314

1-+-+++4--1-4+-1--W 0.8

(p-e)

o

0.2

0.4

0.6

0.8

Inflow

o

0.2

0.4 0.6 Inflow

0.8

Figure 8 Joint p.d.f. of q and (p-e), (a) npatt=l, h=O.42, (b)npart=4, h=O.42

FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA

315

APPENDIX Robust estimators of scale and covariance as suggested by Huber [1981] are used. The pair wise covariance Cij is computed as rijtitj' where ti is a robust estimator of standard deviation obtained as 1.25(Mean Absolute Deviation of ui)' and rij is a robl,!st estimator for correlation, given as: 2

2

t[(au.+bu.) ] - t[(au.-bu) ] riO = J

1

J

2

J

1

2

t[(au.+bu.)] + t[(au.-bu.) ] 1

J

1

J

where a = 1/ti and b = 1/tj. Huber indicates that this estimator has a breakdown point of 1/3, i.e. up to 1/3 of the data can be contaminated without serious degradation of the estimate. ACKNOWLEDGEMENTS The work reported here was supported in part by the U.S. Geological Survey through Grant No. 14-08-0001-G1738 and in part through his 1992-3 assignment with the Branch of Systems Analysis, WRD, USGS, National Center, Reston VA, while on sabbatical leave. REFERENCES Cleveland, W. S. and S. J. Devlin. (1988). "Locally weighted regression: an approach to regression analysis by local fitting." JASA 83(403): 596-610. Craven, P. and G. Wahba (1979). "Smoothing noisy data with spline functions." Numerical Mathematics 31: 377-403. Friedman, J. H. (1979). "A tree-structured approach to nonparametric multiple regression." Smoothing Techniques for Curve Estimation 757: 5-22. Huber, P. J. (1981). Robust Statistics. New York, John Wiley. Karlsson, M. and S. Yakowitz (1987a). "Nearest neighbor methods for nonparametric rainfall-runoff forecasting." Water Resources Research 23(7): 1300-1308. Karlsson, M. and S. Yakowitz (1987b). "Rainfall-runoff forcasting methods, old and new." Stochastic Hydrol. Hydraul. 1: 303-318. N-SSS (1992). N-Kernel User's Manual, Non-Standard Statistical Software, Santa Monica, CA Owosina, A, U. LaB, T. Sangoyomi, and K. Bosworth, (1992) Methods for Assessing the Space and Time Variability of Groundwater Data. NTIS 14-08-ooo1-G1738. Scott, D.W.(1992). Multivariate Density Estimation, John Wiley and Sons, New York. Silverman, B. W. (1986). Density Estimation, Chapman and Hall, London. Wand, M.P. and M.e. Jones (1993). "Comparison of smoothing parametrizations in Bivariate Kernel Density Estimation." JASA 88(422):520-528. Yakowitz, S. J. (1985). "Nonparametric density estimation, prediction, and regression for Markov sequences." JASA 80(389): 215-221. Yakowitz, S. 1. and F. Szidarovsky (1985). "A comparison of Kriging with nonparametric regression methods." Journal of Multivariate Analysis 16(1): 21-53.

COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS J. SATAGOPANl and B. RAJAGOPALAN2 1 Department of Statistics, University of Wisconsin, Madison, WI 53706 2Utah Water Research Laboratory, Utah State University, Logan, UT 84322 Precipitation data from Columbia River Basin was analyzed using different spatial estimation techniques. Kriging, Locally weighted regression (lowess) and Smoothing Spline ANOVA (SS-ANOVA) were used to analyze the data. Log(precipitation) was considered as a function of easting, northing and elevation. Analysis by kriging considered precipitation only as a function of easting and northing. Various quantitative measures of comparisons were considered like maximum absolute deviation, residual sum of squares and scaled variance of deviation. Analyses suggested that SS-ANOVA and lowess performed better than kriging. Residual plots showed that the distribution of residuals was tighter for SS-ANOVA than for lowess and kriging. Precipitation seemed to have an increasing trend with elevation but seemed to stabilize after certain elevation. Analysis was also done for Willamette River Basin data. Similar results were observed.

INTRODUCTION Spatial estimation of precipitation is of fundamental importance and a challenging task in hydrology. It has significant application in flood frequency analysis and regionalization of precipitation parameters for various watershed models. The irregularity of sampling in space and the fact that precipitation exhibits substantial variability with topography (i.e nonstationarity) makes the spatial estimation task more difficult. Kriging is the most popular geostatistical techniques used by hydrologists for spatial estimations. It assumes a priori specification of the functional form of the underlying function that describes the spatial variation of the parameter of interest. Most often, this assumption is never satisfied in nonstationary situations, resulting in possible errors in the estimates. Akin (1992) has extensively compared kriging with other nonparametric techniques on a large number of data sets, and found that kriging was inferior to all the other methods. Yakowitz and Szidarovszky (1985) compared the theoretical properties of kriging and kernel functions estimation and gave comparative results from Monte Carlo simulations for one and two dimensional situations. The kernel estimator was superior in their theoretical and applied analyses. These serve as a motivation for our exploratory data analysis. In tIllS paper, we present results from preliminary analysis of precipitation data from mountainous region in Columbia River Basin for the relative performance of three methods for spatial interpolation. The methods considered are kriging, locally weighted regression (lowess) and smoothing spline analysis of variance (SS-ANOVA). The rest of the paper is organized as follows. A brief discussion on kriging, SS-ANOVA and lowess are presented first followed by a note on the study area, data set and statistical models. Comparative 317 K. W Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 317-330. © 1994 Kluwer Academic Publishers.

J. SATAGOPAN AND B. RAJAGOPALAN

318

results and discussion are presented in the end.

KRIGING Kriging is a parametric regression procedure due to Krige (1951) and Journel (1977). It has become synonymous with geostatistics over the last decade and represents the state of the art for spatial analysis problems. Isaacs and Srivastava (1989) present a comprehensive and applied treatment of kriging, while Cressis (1991) provides a comprehensive treatment that covers much of the recent statistical research on the subject. Most of the work has been largely focused on ordinary kriging. The model considered is

y=f(x)+€

(1)

where the function f(x) is assumed to have a constant but unknown mean and stationary covariance, y is the observed vector and € is the vector of LLd noise. Most often the assumptions for the function f are not satisfied, especially in the case of mountainous precipitation. In our data analysis we have looked at ordinary kriging only. Cressie (1991), Journel (1989) and de MarsHy (1986) have detailed discussions on the various types of kriging and their estimation procedures as applied to different situations. Kriging is an exact interpolator at the points of observation, and at other points it attempts to find the best linear unbiased estimator (BLUE) for the underlying function and its mean square error (MSE). The underlying function f( x) is assumed to be a random function . f( x) and f( x + hx) are dependent random variables leading to ergodicity and stationarity assumptions. The kriging estimate !k of f is formed as a weighted linear combination of the observations as n

fk(XO)

=L

;=1

(2)

)..o;y;

where the subscript k stands for kriging estimate. The weights are determined through a procedure that seeks to be optimal in a mean square error sense. The weights relate to the distance between the point at which the estimate is desired and the observation points, and to the degree of covariance between the observations as a function of distance, as specified by the variogram r(h). The variogram is given as

r(h)

= Var(y(x)

- Cov(y(x),y(x+ h»

(3)

where h is the distance. The weights )..0; are determined by solving the normal equations for kriging which are n

L)..ojr(x; - Xj)

+ Jt = rex; -

xo),

i

= 1,·· ·,n

(4)

j=l

;=1

(5)

where JL can be interpreted as a Lagrange multiplier for satisfying the constraint that the weights sum to unity, in an optimization problem formed for minimizing the mean square

COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS

319

error estimation. AOi'S are obtained by solving the above two equations. The ideas of Gaussian linear estimation are thus implicit in the kriging process. The MSE of the estimator II< is given by Cressie (1991) as n

MSE(j,,(xo))

= EAoir(Cxi) + I'

(6)

i=1

where r( 6Xi) is the variogram and 6Xi = Xi - Xo. The above estimation procedure is under the presumption that the variogram is a known function. In practice, the variogram is never known a priori. In reality the observations are unequally spaced. Hence a direct estimate of r( h) from the data is not feasible. Therefore, the data are grouped into distance categories and a parametric function (e.g exponential or spherical) is fit to the estimated or raw variogram. This is called variogram fitting and is the central issue in kriging. Fitting the variogram is the most difficult and important part of kriging, more so in case of nonstationarity. Lack of objective methods to fit the variogram result in a poorly fit variogram and consequently the estimates are likely to be significantly in error. For details on variogram fitting we refer the reader to Cressie (1991). Yakowitz and Szidarovszky (1985) argue that there is no consistent variogram estimator, even for the case where the data are noise-free. Wahba (1990) also shows that no consistent estimators of the variogram parameters from the data are readily available as part of the kriging estimation process. Journel (1989) has discussions on the demerits of kriging, and stresses that stationarity assumptions are made for ease of analysis and are not necessarily properties of the process studied. Akin (1992) has studied kriging with known data sets and also groundwater data and found that kriging performed very poorly in almost all the cases as compared to other techniques. Bowles et al., (1991) compared kriging with thin plate smoothing splines on precipitation data from mountainous region and made similar inferences. These results support argument by Yakowitz and Szidarovszky (1985). Universal kriging, Co-kriging and the Intrinsic random function hypotheses attempt to deal with non-stationary situations, but fitting variograms in these cases is even more tenuous which affects the estimates and are difficult to implement. We have analyzed the precipitation data using Ordinary kriging on the public domain software GeoEAS widely used by government regulating agencies and consulting firms.

SMOOTHING SPLINE ANOVA Smoothing spline analysis of variance (SS-ANOVA) is a semiparametric procedure for fitting models. The model considered in this case is similar to the kriging model. The SS-ANOVA method decomposes the function I into various components like in any analysis of variance model Le., the function I is split into main effects and interaction terms. This is useful because one can find out how the observed data is affected by each variable. Consider the model i

= 1,"',n

where Ylo Y2, •• " Yn are observations, I is the function to be estimated, variables such that the jth variable Xj E Xi, some measurable space, and

(7) are are LLd

Xlo X2,' " , Xl: flo' • " fn

320

J. SATAGOPAN AND B. RAJAGOPALAN

with Ei '" N(O, 0- 2 ),0- 2 unknown. Usually the space considered is Xj = [0,1]. Whenever the variables are not in the range [0,1], we can rescale them to lie in this range. Wahba (1990), Gu (1989) and Gu and Wahba (1992) give an overview of the SS-ANOVA models. They discuss applications to polynomial splines, tensor product splines and thin plate splines. The SS-ANOVA model is described briefly in what follows. The assumption in this model is f E H, where H is a Hilbert space. The function f is required to be smooth in its domain with f, f(1) absolutely continuous, f(2) E £2, where f{i) denotes the ith derivative of f, and f(t)dt = O. The space H is uniquely decomposed into a tensor sum as

J:

H

= 1 E9 LHi E9 LHi 0Hj E9 ... i<;

f

is decomposed uniquely as

fi;

+ ...

Based on this representation of H, the function f

= J1- + L

fi

+L

i<;

(8)

(9)

where J1- is a constant, fi E Hi are the main effect terms, 1;; E Hi 0 H; are the 2 factor interaction terms and so on. The space H is decomposed in such a way that the resulting subspaces are unique and orthogonal in the tensor product norm induced by the original inner products. This decomposition is very similar to the decomposition in any analysis of variance problem. The hilbert space H can further be decomposed into polynomial and smooth spaces. Let Hi have an orthogonal decomposition H"i E9 H. i , where H"i is the polynomial or parametric space and H.i is the smooth space. The f;s satisfy

11 J;(

x i)dJ1-i



(10)

where J1-i is a lebesgue measure on [0,1]. fijS satisfy

(11) and so on. These are similar to the side conditions in any analysis of variance model. The SS-ANOVA procedure obtains 1>. as an estimate of f which minimizes the quantity

where A, (Ji, (Ji,j,' .• are smoothing parameters. J's are smoothness penalty functionals. The polynomial space does not have any penalty. Kimeldorf and Wahba (1971) and Wahba (1990) discuss different penalty functionals. Gu and Wahba (1991) discuss the idea of thin plate splines and Bayesian confidence intervals for the main effect and interaction terms for a thin plate spline. For further details, we refer the reader to Gu and Wahba (1991). The publicly available code RKPACK by Gu (1989) enables one to fit tensor product splines, polynomial splines and thin plate splines for SS-ANOVA models. We have used the thin plate spline program in RKPACK for our data analysis.

COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECWITATlON ANALYSIS

321

LOCALLY WEIGHTED REGRESSION (LOWESS) Locally weighted regression or lowess is a local regression procedure that is a quick and versatile tool for exploratory data analysis and regression surface estimation. Cleveland et al., (1988) and Cleveland and Devlin (1988) developed lowess. Under certain conditions, Muller (1987) shows an equivalence between lowess and nonparametric kernel regression estimation. The estimates proposed by Cleveland et al., (1988) considers local linear or quadratic fits using k nearest neighbors of the point at which the estimate is desired. Also, Cleveland et al., (1988) consider the standard multivariate nonparametric regression situation defined through the general model Yi=f(Xi,{3)+f.i,

i=I,···,n

(13)

where f(x,{3) is the smooth regression function or conditional expectation with parameter {3 and the f.i'S are Li.d N (0, q2) variables, with q2 usually unknown. An estimate j(x) of f(x) is developed nonparametrically (in the global sense) by a weighted loca} linear or quadratic least squares regression in the neighborhood of x defined by its k nearest neighbors. The quadratic model fitted locally is (14)

where D is a design matrix and the coefficients weighted least squares problem defined as

minp

E

/3

are determined as the solution to a

(Yi - f(Xi,{3)) 2Wi(X)

(15)

iEk(.,)

where k(x) is the index set of Xi that are the k nearest neighbors of x and Wi(X) is the weight function defined as Wi

() x

=

W(P(X,Xi» dk(x)

(16)

where p(.) is the Euclidean distance function and dk(x) is the Euclidean distance from x to the kth nearest neighbor in Xi. W(.) is usually taken as the tricubic weight function. The number of nearest neighbors, k, acts as a smoothing parameter. As k increases, bias increases, but variance decreases. The fit determined by lowess depends on the choice of the number, k, of nearest neighbors and the order, r, of the fit. Cleveland et al., (1988) propose a graphical method, based on analyzing an M-plot for the selection of both these parameters. For further details we refer the reader to Cleveland et al. (1988). For the present analysis, the span (fraction of sample used to compute local regression) and degree (order oflocal approximation, where 1 represents linear and 2 represents quadratic) were chosen by using the F -statistic to compare alternate values of span and degree. A lowess surface is fit for a span and degree and the residual sum of squares is computed. The F-statistic using the proper number of degrees offreedom in each case is used to compare the residuals sums of squares (RSS) at a significance level of 0.01. In this case the residual sum of squares are significantly different (at 0.01 level) and the span/degree with

J. SATAGOPAN AND B. RAJAGOPALAN

322

the lowest RSS is selected. Otherwise the span/degree with the lower number of equivalent parameters (higher degrees of freedom) is selected. Cleveland (1979) suggests that local linear fitting produces good results, especially in the boundary region. Akin (1992) has shown that the scheme works well even for a small number of independent variables and, from comparisons of lowess with kriging on a large number of known data sets, found that lowess performed very well in reproducing the known functions.

DESIGN AND MODEL Study Area and Data Set The application area is the Columbia River Basin in the states of Washington and Oregon, with an area of 57 million hectares and subdivided into 9 subregions corresponding to the USGS sub-basin classification. The data consisted of annual precipitation obtained from 491 gauges spread over the entire basin. By subregion, the number of gauges are 82, 41, 12,77, 77,50,75,25 and 52 respectively. The gauges are denoted in three dimensions by northing, easting and elevation. Figure 1 gives the topographical map of the study area. The study area is very mountainous and this could cause non-stationarity in the precipitation process. Phillips et al., (1991) applied kriging, detrended kriging and co-kriging to the annual precipitation data from the Willamette River Basin (region 9). They found that co-kriging with elevation worked better than ordinary kriging for region 9. However, Phillips (personal communication) indicated that it gave poor results when applied simultaneously to the entire Columbia River Basin. This is likely because the data spans both sides of the mountain range, which results in non-stationarity in the data set. This motivated our study for looking at alternative techniques. We analyzed the data set from Willamette River Basin and the data from the entire Columbia River Basin.

Model We have used log(precipitation) throughout our analysis. This is a very common transformation. One of the advantages of this transformation is it stabilizes the variance. The basic requirements in any analysis is the assumption of constant variance for error. This assumption is violated usually when the response y follows a probability distribution in which the variance is functionally related to the mean. Since the inclusion of elevation as a third variable in the model only makes estimation of variogram more complicated, we have considered precipitation as a function of easting and northing only for kriging. For lowess and SS-ANOVA the response was considered as a function of easting, northing and elevation. The model considered for SS-ANOVA was y

where y

= J.L + lelev + lea.t,north + lelev*(ea.t,north) + f

= iog(precipitation), J.L is

(17)

the constant term, Ie lev denotes the effect of elevation, denotes the joint effect of easting and northing, lelev*(ea.t,north) denotes the interaction between elevation and (easting,northing) and f denotes the Li.d Gaussian error. This model is very similar to any analysis of variance model. Instead of considering the lea.t,north

COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS

Columbia lliver Basin

Willamette River Basin

Figure 1. Topographical Maps.

323

J. SATAGOPAN AND B. RAJAGOPALAN

324

effects of easting and northing separately, we have looked at them as a two dimensional variable (easting,northing) and used a thin plate smoothing spline (Gu and Wahba (1991)) approach. Gu and Wahba (1991) show simulation results and suggest that for geographical data, such an approach would be appropriate.

RESULTS AND DISCUSSION Visual examination of the estimated surfaces are very helpful in assessing the performances of different estimators. However, measures of accuracy are also important in decision making as they are based on quantitative criteria. We estimated a few measures that quantify the local and global measures of fit, for comparing the three techniques. Vve estimated the following measures, maximum absolute deviation (MAD), mean absolute deviation (MNAD), residual sum of squares (RSS), scaled residual sum of squares (SRSS 1 and SRSS2 ), and scaled variance of deviation (SVD). These measures are given below. • MAD

= maxi=l,. .. ,nIYi - i);1

• MNAD

= kE?=l IYi -

Yil

• RSS = E?=l(Yi - Yi)2

• SRSS1 = n * var(y) 1 "':' (y. _ y".)2 L...,.=l· ,

• SVD

= vary-y) var")"

where y's denote the response (log(precipitation)) and Y denote the estimated values. MAD signifies how far away the estimate is from the true underlying function at the point where the fit is poorest, but does not ensure the best fit over the entire range of the function. MNAD represents the average nearness of the fit to the true function over the entire range of the function. SRSS1 and SRSS2 measure the nearness of the fit to the true function over the entire range of the function and to ensure no regions of relatively poor fit. SVD would return the squared signal to noise ratio if the estimate matches the true function. This also tells how well the techniques discriminate the noise in the function. Akin (1992) has detailed discussion on these measures. Table 1 gives various measures for comparing the three spatial techniques for both Columbia and Willamette River Basins. SS-ANOVA and lowess seem to do better function estimation than kriging for this data. Kriging is comparable to lowess on the Columbia data. This substantiates the fact that kriging performs better as a global estimator, while on the local scale (Willamette River Basin) it performs poorly compared to the other methods. Lowess and SS-ANOVA are less sensitive to heterogeneous discontinuity in the data unlike kriging.

325

COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS

TABLE 1. ivleasures of comparison for the three spatial techniques

Measure MAD MNAD RSS SRSS 1 SRSS2 SVD

Columbia SS-ANOVA Lowess 0.1772 1.6769 0.0402 0.2256 55.5073 1.3691 0.2444 0.0060 0.0063 0.0002 3.8041 160.9792

Kriging 1.2823 0.2657 59.7683 0.2632 0.0429 2.8387

Willamette SS-ANOVA Lowess 0.1412 0.2861 0.0366 0.0657 0.1251 0.4133 0.0280 0.0925 0.0001 0.0003 31.8066 10.1950

Kriging 0.6777 0.1532 2.1886 0.4899 0.0017 1.2468

Figure 2 gives the histogram of residuals for the three techniques for Columbia River Basin. The residual distribution for SS-ANOVA method is tighter than for lowess and kriging. For kriging, the precipitation was considered only as a function of easting and northing whereas for SS-ANOVA and lowess, precipitation was a function of easting, northing and elevation. Figure 3 gives the histogram for residuals for Willamette River Basin. The fit for Willamette River Basin was not as good as for Columbia River Basin. The distribution of residuals for SS-ANOVA method was again tighter than for lowess and kriging. Even though the figures in Table 1 suggest that lowess would be a better approach than kriging for precipitation data, the distribution of residuals for kriging was tighter than for lowess in the case of Columbia River Basin. The contour plot of the estimated function obtained from the three methods for Columbia River Basin and Willamette River Basin are given in figures 4 and 5 respectively. Though precipitation was obtained as a function of easting, northing and elevation for SS-ANOVA an lowess, we have shown precipitation against easting and northing only in the plot for these two methods for ease of comparison with kriging. For the Columbia River Basin data, SS-ANOVA and lowess were found to handle the boundary points better than kriging. Also, SS-ANOVA and lowess did more smoothing than kriging for this data. Figures 6 and 7 give plots of the effect of elevation on precipitation. The effect of elevation was obtained from SS-ANOVA. the plots also give a 95% (Bayesian) confidence interval for the effect of elevation. An increasing trend in precipitation with elevation can be observed. From the plot corresponding to Columbia River Basin it could be seen that though elevation seems to have an increasing trend, it seems to settle down after some time (or certain elevation).

326

J. SATAGOPAN AND B. RAJAGOPALAN

SS-ANOVA

KRIGING

•• Figure 2. Histogram of residuals of the three methods for Columbia lliver Basin.

KR

G

Figure 3. Histogram of residuals of the three methods for Willamette lliver Basin.

COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS

...... ........

........

f ....... U"'" 1.2.'" ·Uto..

--

...

.,...,

.,.r,...

u ....

,,.","

_(1<_

Figure 4. Contour plot of the estimated function for Columbia River Basin .

.71" ......

.71' ....

}

r'"

r"

.11"'"

.......

......

.8O".ft

_I. . . . ,

·Z.OFt'"

r'"

'2""

Figure 5. Contour plot of the estimated function for Willamette River Basin.

327

328

J. SATAGOPAN AND B. RAJAGOPALAN

I

olovOlionollOd 15%oonI.irO.

J

I

/

(

Columbia River Basin

500

1000

1500

2000

2500

3Il00

Eating

Figure 6. Elevation Effect for Columbia Rh'er Basin obtained from SS-ANOVA.

-

.......on.ffId _conI.1n!.

Willamelle River Basin

200

400

500

800

1Il00

1200

1400

Figure 7. Elevation Effect for Willamette lliver Basin obtained from SS-ANOVA.

COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS

329

So far we have looked at an exploratory data analysis which used three spatial estimation techniques for precipitation analysis. Kriging has so far been the most widely used geostatistical technique by hydrologists. As mentioned earlier, kriging has become synonymous with geostatistics over the last decade. Results and research by various authors like Yakowitz and Szidarovszky (1985) and Akin (1992) motivated us to look at other spatial estimation techniques and we found that the other techniques - SS-ANOVA and lowess better estimated the precipitation function than kriging. These methods suggest a possible alternative for spatial interpolations of precipitation data, especially when the process is non-stationary. Future work is required in terms of more data sets and other nonparametric techniques.

ACKNOWLEDGEMENT We would like to thank Dr. Upmanu Lall for valuable suggestions and for providing us with relevant manuscripts. We also thank Mr. D.L. Phillips, Mr. J. Dolph and Mr. D. Marks of USEPA, Corvallis, OR, USA for providing us with the precipitation data sets and also the results of their analysis which motivated our study. We would also like to thank our numerous friends for conversations and discussions on spatial estimation techniques.

REFERENCES Akin, 0.0 (1992) "A comparative study of nonparametric regression and kriging for analyzing groundwater contamination data", M.S. Thesis, Utah State University, Logan, Utah. Bowles, D.S., Binghan, G.E., Lall, U., Tarboton, D.G., AI-Adhami, M., Jensen, D.T., McCurdy, G.D., and Jayyousi, E.F. (1991) "Development of mountain climate generator and snow pack model for erosion predictions in the western United States using WEPP", Project report - III. Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatter plots", Journal of American Statistical Association 74,368,829-836. Cleveland, W.S., and Devlin, S.J. (1988) "Locally weighted regression: an approach to regression analysis by local fitting", Journal of American Statistical Association 83, 403, 596-610. Cleveland, W.S., Devlin, S.J., and Grosse, E. (1988) "Regression by local fitting", Journal of Economics 37, 88-114. Cressie, N. (1991) Statistics for Spatial Data, John Wiley and Sons, New York. de Marsily, G. (1986) Quantitative Hydrogeology, Groundwater Hydrology for Engineers, Academic Press, California. Gu, C. (1989) "Rkpack and its applications: fitting smoothing spline models", Technical Report 857, University of Wisconsin - Madison. Gu, C. and Wahba, G. (1992) "Smoothing spline ANOVA with component-wise Bayesian confidence intervals", Technical report 881 (rev.), University of Wisconsin - Madison. Isaaks, E.H., and Srivastava, R.M. (1989) An Introduction to Applied Geostatistics, Oxford University Press, New York. Journel, A.G. (1977) "Kriging in terms of projections", Journal of Mathematical Geology

330

J. SATAGOPAN AND B. RAJAGOPALAN

9, 6, 563-586. Journel, A.G. (1989) Fundamentals of Gcostatistics in five lessons, American Geophysical Union, Washington, D. C. Kimeldorf, G., and Wahba, G. (1971) "Some results on Tchebycheffian spline functions", Journal of Mathematical Analysis and Probability 33,82-95. Krige, D. G. (1951) "A statistical approach to some mine valuations and allied problems in Witwaterstrand" unpublished Masters' Thesis, University of Wit waterstrand, South Africa. Muller, H.G. (1987) "Weighted local regression and kernel methods for nonparametric curve fitting", Journal of American Statistical Association 82, 397, 231-238. Phillips, D.L., Dolph, J., and Marks, D. (1991) "Evaluation of geostatistical procedures for spatial analysis of precipitation", USEPA Report, Corvallis, Oregon. Wahba, G. (1990) Spline Models for Observational Data, SIAM series in Applied Mathematics, Pennsylvania. Yakowitz, S., and Szidarovszky, R. (1985) "A comparison of kriging with nonparametric regression methods", Journal of Multivariate Analysis 16, 1,21-53.

PART VII SPECTRAL ANALYSIS

EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES

ANDRZFJ LEWANDOWSKI Wayne State University Detroit, MI 48202 U.S.A.

INTRODUCTION Over 20 years have passed since the publication of a book by Box and Jenkins (Box and Jenkins, 1970) where the principles of time series analysis and modeling were formulated. During this period the Box-Jenkins methodology has been successfully applied to numerous practical problems and has become the standard procedure for time series modeling and analysis (Pankratz, 1983). Most commercial statistical computer packages support this methodology. According to the Box - Jenkins approach, the process of model building consists of 3 steps: 1. Model identification, during which preliminary analysis is performed and an initial version (structure) of the model is determined, 2. Parameter estimation, during which exact values of model parameters are computed, 3. Model validation, during which the quality of the resulting model is examined. The model building procedure is interactive and iterative in nature. The procedure is repeated until the model attains desired form and accuracy. Although several methods and algorithms for model estimation constituting the 2nd stage of the above procedure have been developed, the identification and validation stages are somewhat more diffuse in nature. The original identification method proposed by Box and Jenkins is based on the visual inspection of the autocorrelation and the partial autocorrelation functions. This approach is very exploratory in its nature but rather difficult to apply. Although in several cases structure of the model generating the time series can be relatively easily deduced from the shape of the autocorrelation function, in many other cases it is difficult or is not possible at all. Moreover, since the dependence between values of the autocorrelation function and parameters of the model generating the time series is rather complicated, values of model coefficients cannot be easily determined without performing complicated numerical calculations. Several attempts have been made to classify possible patterns of the autocorrelation function and to build a catalog of such 333 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 333-346. © 1994 Kluwer Academic Publishers.

334

A. LEWANDOWSKI

functions (Polasek, 1980). Unfortunately, this approach leads to a catalog containing hundreds of patterns, making it difficult to match the sample autocorrelation function with one of the catalog entries. An up-to-date review of identification methods has been recently published by Choi (Choi, 1992). In addition to the above mentioned analysis of the autocorrelation function, he discusses 4 other groups of methods: penalty (unction methods, innovation regression methods, pattern identification methods and testing hypothesis methods. With the exception of the pattern identification methods, all these methods do not possess exploratory character: the data is given to some "black box" mechanism which produces the information about the order of the model generating the time series. The pattern identification methods discussed by Choi use the fact that if a time series has ARMA(p,q) structure then the theoretical autocorrelation function for lags greater than q satisfies a linear difference equation of order p. Therefore, to determine the structure of a model it is sufficient to check whether a subsequence of the autocorrelation function satisfies a linear difference equation. Several algorithms based on this test are discussed by Choi Most of these algorithms base on testing of a rank of a Henkel matrix. The basic difficulty in applying these methods is connected with the fact that the computation of a rank of a matrix is an ill-defined problem and that instead of visual inspection of the autocorrelation function the analyst must use the same visual inspection procedure to analyze patterns of columns of several matrices. The early analysis performed by De Goojier and Hents (1981) shows that these methods have limited applicability.

TIME SERIES ANALYSIS AND EXPLORATORY STATISTICAL ANALYSIS The role of the model identification step in time series analysis is frequently underestimated. Model identification is not simply a procedure for determining the structure of a model; it is a scientific process which leads to a deeper understanding of the phenomena being investigated. In many practical cases this increased knowledge of the system is more important than the resulting model. This is one of the main principles of Exploratory Data Analysis (Tukey, 1977). For these reasons a new approach to Box-Jenkins model identification is proposed in this paper. In contrast to the existing methods, this approach is based on spectral methods and involves frequency analysis of ARMA models. However, it differs from the standard spectral approach presented in the textbooks on time series analysis since it provides a way of understanding and interpreting the spectrum and hence the model itself. The theoretical background of this method is well known in control engineering and circuit theory and has been applied successfully in these fields. According to experience gained in electrical engineering and control engineering, spectral methods are in general more useful than time domain methods. Spectral characterization of a model is easier to interpret, analyze and understand than time domain characteristics such as the autocorrelation function.

335

EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES

LINEARIZED TRANSFER FUNCTIONS OF ARMA MODELS Before presenting the proposed method for time series analysis, it is necessary to discuss the basic concepts from Z transform theory (Elliott, 1987). Let {xt!

tE(-oo,+oo)

(1)

be a real sequence. The Z transform of this sequence is the complex function +00

X(z) =

which is denoted by

X(z)

I

-00

~

(2)

Xi Zi

(3)

txt!

Under certain conditions this transformation is invertible and X(z) uniquely characterizes the series {xt! (Eliott, 1987). Let two time series {Xt} and {Ut} be connected by a linear relationship (operator)

G:

The complex function G(z)

{xd=G({ud)

(4)

G( ) = X(z) z U(z)

(5)

will be called the transfer (unction of the operator G. It is easy to calculate the transfer function of an ARMA model. Taking (2) and multiplying both sides by z, the following relationship can be obtained +00

+00

-00

-00

zX(z) = IXiZi+1 =

Thus, if then

I

Xi-l Zi

(6)

X(z)

~

{xd

(7)

zX(z)

~

{Xt-d

(8)

It follows from the above that the complex variable z can be formally interpreted as the shift operator B used by Box and Jenkins. Therefore, in order to obtain the transfer function for this model it is sufficient to replace operator B by complex variable z. The term spectral or frequency transfer (unction will be used to describe the following formula . P(e- jW ) G{Jw) = Q( e-}W .)

(9)

where P and Q are respectively the numerator and denominator of the operator transfer function (from now on we will consider only rational operator transfer functions). The spectrum of the output signal x generated by the model (which is actually the time series being analyzed) is equal to the modulus of the transfer function few) = IG(jw) I

(10)

336

A. LEWANDOWSKI

Formula (10) shows where most of the basic difficulties in interpreting the spectrum arise. Although the structure of the transfer function itself is rather simple, the spectrum is a highly nonlinear function of frequency w since it is the modulus of the transfer function evaluated on the unit circle and therefore includes trigonometric functions. This leads to an important question: does the modulus of a complex function evaluated on the unit circle characterize this function uniquely? The answer is generally no. If a function G(z) is given such that its modulus evaluated on the unit circle is (11)

then the function

-

G(z)

=

Z-D(

G(z)-I-

(12)

--D(

z

will have the same modulus on the unit circle (Hannan, 1970). But if the function G(z) has the minimum phase property (Le .. has no poles or zeros inside the unit disc), the modulus evaluated on the unit circle will characterize the function uniquely. This is one of the most important results in the theory of signal processing, electrical circuit theory and automatic control (Robinson, 1981). Another problem can also be formulated: is it possible to evaluate the modulus of G (z) on a curve other then the unit circle to produce a simpler form off (w) ? The real and imaginary axes appear especially attractive for this purpose. In this case a similar result can be obtained: ifG(z) has the minimum phase property (Le., it has no zeros or poles in the right half-plane) it is sufficient to know the modulus of G(z) evaluated at z = jw to determine this function uniquely. Moreover, when the real and imaginary axes are used instead of the unit circle, the function IG(jw)1 has a much simpler structure than IG(e-jW)1 since it contains no trigonometric components. Unfortunately, computing the spectrum on imaginary axis is of little immediate use in time series analysis since it is unreasonable to expect that the transfer function of a model generating the time series under study will have no poles or zeros in the right half-plane. Fortunately, there is a way to bypass this difficulty by transforming the region outside the unit circle into the left half plane and evaluating the modulus of the resulting (unction on the imaginary axis. This can be achieved by applying the following transformation (called the A transformation)

,\=l-z l+z This transformation has the following properties: 1. It is invertible

1-,\

Z=--

1+,\

(13)

(14)

2. It transforms the region outside the unit circle in the complex plane z into the left half of the complex plane ,\ and the unit circle into the imaginary axis (Silverman, 1975),

337

EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES

3. If this transformation is applied to the rational transfer function then the resulting function will also be rational. If the ARMA model is stable and invertible then all poles and zeros of the transfer function of a model generating this time series will be located outside the unit circle. After applying the A transform to this transfer function, all the poles and zeros of the transformed transfer function will be located in the left half of the A complex plane. This means that the resulting transfer function has the minimum phase property (Robinson, 1981). The minimum phase property is important since it allows one to work with the modulus of the function rather than with the function itself. The rational function (A)

o

=

P

(1- A)

p(A) G(1I+A -A) = Q(I-A) "i""+X = q(A)

(15)

I+A

will be called the linearized transfer (unction. Similarly,

W)

(·w)=p(jw) =G(I- i OJ q(jw) 1 +jw

(16)

will be called the linearized frequency transfer (unction, and ](00) =

lo(jw) 1

(17)

will be called the linearized spectral density (unction (LSDF). Comparing the definition of the standard spectrum and the linearized spectrum (see 16) it is possible to conclude that to calculate the linearized spectrum instead of making the substitution z = e- jw (18)

the following substitution must be used

z

I-jw l+jw

= ---"--

(19)

which is the Pade approximation (or linearization) of exponential function (18). This is the source of the name linearized spectrum. It is not difficult to find the exponential form of (19) 1 - jw = e-2jarctanw

l+jw

(20)

Comparing this result with (18) it is possible to conclude that the linearized spectrum can be interpreted as a standard spectrum with a distorted frequency scale. Therefore, no special tools are necessary to calculate the linearized spectrum since it is sufficient to have a plotting chart with a suitably scaled frequency axis. In contrast to the standard frequency transfer function, the linearized transfer function (16) is a rational function of frequency. This explains why analysis based on a linearized spectrum is Simpler than for the standard spectrum. The transformation (13) is widely used in control engineering for the design and analysis of sampled data control systems (Bishop, 1975) and has been used by Hannan (1970) as a tool for investigating continuous-time stochastic processes.

A. LEWANDOWSKI

338

ASYMPTOTIC FREQUENCY RESPONSES OF ARMA MODELS It is possible to provide a simple graphical procedure for constructing the approximation of the linearized spectrum for general ARMA models. This procedure uses the notion of an asymptotic linearized spectrum which is the standard linearized spectrum plotted on a logarithmic plotting chart. If the logarithmic scale is used, the multiplicative factors of the linearized transfer (unction became additive. Therefore the linearized spectrum of the general ARMA model can be constructed by adding components generated by all poles and roots of the transfer function calculated independently. linearized spectrum of MA(1) model

In this section the moving average MAW model will be considered G(z) = 1- 9z

(21)

After applying the A transformation, the above transfer function will have the following form l-i\

9 (i\)

or

= 1 - 9 1 + i\ = (.00) = Kl

9 J

where

(1-9)(I+~~:i\)

+ j9'oo

1 + jw

1 + i\

(22)

= p(joo)

(23)

q(jw)

K=1-9

(24)

1+9 1-9

(25)

9'

=

The logarithm of the modulus of the numerator (23) has the form log Ip(jw)1 =

~log(l + 9'200 2 )

It follows from the above formula that for low frequencies such as 9' 00

log Ip(jw) I

::: 0

(26)

« 1 (27)

In the opposite situation, for high frequencies it is possible to write 10glp(jw>1 :::::log9' + log 00

(28)

It follows from (28)-(29) that if suitable axes are chosen, the function (26) can be approximated by 2 asymptotes: a line of zero slope for 00 < W and a line of slope + 1 for w > ii:J where ii:J = ; , (29)

The asymptotic construction provides a reasonably good approximation of the modulus of the function (26). This construction (the asymptotes of a frequency response function) is known as the Bode plot. The analysis based on the Bode plot constitutes

EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES

Always pole for

loglf(j",)1

339

"'= 1 log(1+8)

o ~----~----+---~~----------~----------~-----2

1

log(i-S) -1

t

log(cu)

- - Asymptotic - - Exact

Always zero

Figure 1: The Bode plot for MA(1) model one of the basic tools in electronics and control engineering (Sage. 1981). The Bode plot for MA(l) model can be easily constructed using the asymptotic representation presented above. If the linearized transfer function of the MA(1) model is presented in logarithmic form log ID(jw) I

=

10gK + log I p(jw) I -log Iq(jw) I

(30)

then a simple graphical operation can be performed to obtain the asymptotic representation of the function (30). The shape of the Bode plot depends on the value of 9; however the Bode plot of MA(l) model always has a pole at w = 1 and zero at ~

1- 9 1+9

w =----

(31)

The Bode plot for MA(1) model is presented in Figure 1. Unearized spectrum of AR(l) model The transfer function of the autoregressive AR(1) model has the following form 1 1- 9z

G(z)=--

(32)

A. LEWANDOWSKI

340

After applying the A transformation, the following linearized transfer function will be obtained 1 1+~ g(~) = (33) 1 1 - 9 - ~ - (1 - 9) (1 + 1 + 9~) 1+~ 1-9 and, analogously . l+joo (34) g(Joo) = K(l + 9' joo) where

1

K=1_9

(35)

1+9 (36) 1-9 The methodology presented in the previous section can be used to construct a Bode plot for AR(l) model. The only difference is that the transfer function of the AR(1) model always has zero at 00 = 1 and pole at 9'

A

=

1- 9 1+9

00=--

(37)

The Bode plot for AR(1) model is presented in Figure 2. linearized spectrum of ARMA(l.l) model

Using the methodology described in the previous sections, it is easy to construct a Bode plot for the autoregressive moving average ARMA(1,l) model. When the A transformation is applied to the transfer function of the ARMA(1,l) model G(z) = 1- 9z

1- cpz

(38)

the following linearized transfer function will be obtained 9 (~) = ( 1 - 9 )

1-cp

1 + 1 + 9~ 1- 9 l+l+cp~ 1- cp

(39)

This function has one pole and one zero. The corresponding Bode plot can be easily constructed applying a procedure similar to that used in building the Bode plot for MA(1) and AR(1) models (Figure 3). linearized spectrum of AR.(2) model with complex roots

This situation is more complicated than those dealt with previously. The following is the standard form of the transfer function of a general AR(2) model 1

G(z) = 1 - 91Z - 92Z2

(40)

341

EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES

Always zero for ('.0)= 1

loglf(j('.o)) I -log(1+0)

- - Asymptotic --- Exact

('.0)=(1-0)/(1+9)

o r---~-----T----~----~----~--------~~----1 1 log «('.0)) -2 o -log (1-9) -1

Always pole

Figure 2: The Bode plot for AR(1) model After applying the A transformation this transfer function reduces to the following form (i\.) 1 _ K (1 + ;\)2 (41) B (I-i\.) (1_i\.)2- 1+9~i\.+9;i\.2 91 + i\. - 92 + i\.

1- 1

where

1

K=

1

(42)

1- 91 - 92

9' _ -2(1 + 92) 1 - 1 - 91 - 92

(43)

9' _ 1 + 91 - 92 2 - 1 - 91 - 92

(44)

It is more convenient to use the canonical form of the denominator of (41)

i\.9 1 i\.2 2~i\. (i\.)2 1-9 9 = 1+ (_1_) ~ + (_1_) = 1+ + ,

'2

12 -

22

2

~

~

W

y

W

y

(45)

A. LEWANDOWSKI

342

loglf(jw)1 log(1-e)/(1-~)

Zero generated by MA(l) part

w=(1-9)/(1 +9)

o r---~------r---~----~----~----r_--~------2 -1 1 log(w) o w=(1-~)/(1 +~)

- - Asymptotic --- Exact -1

Pole generated by AR( 1) part

log (1 +9)/(1 +~)

Figure 3: The Bode plot for ARMA(l,l) model The value Wy =

1 .J9I

(46)

is known as the resonance frequency while the parameter

~=

9' zF;

(47)

is the damping factor. These two factors determine the resonance properties of the transfer function. If ~ > 1 then the quadratic polynomial has real roots and can be represented as the product of two first order factors. For 0 < ~ < 1 the roots are complex and more careful analysis is required since for small values of ~ (low damping) and for frequencies close to the resonance frequency, asymptotic approximation will not be very accurate. However, experience has shown that asymptotic analysis can be useful even in these cases and therefore the asymptotic behavior of (41) must be investigated. Making the substitution .\ =

jw

(48)

343

EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES

and considering the canonical form (45), the following expression will be obtained

+ joo)2 J 2~joo 1+ - - - (00)2 ooy ooy The logarithm of the modulus of this function is as follows g( '00) = K

Joglg(joo)1

(49)

(1

~ 10gK + 10g(1+ 00') - ~Iog{[1-

c:rr

+4~' (.:;'J'}

(50)

The first two terms of this formula are similar to those for the MA(l) model, and their asymptotic approximation has been discussed in previous sections. The results obtained previously can be used in this case. The only difference between this and MA(1) model is that in this case the slope of the asymptote of the second term is +2. Analysis of the third term is also straightforward. For low frequencies 00 (51) while for sufficiently large 00 the term (oo/ooy)4 is dominant and (52)

Comparing this result with (SO) we observe that the third term of (SO) has two asymptotes: for low frequencies the slope is equal to zero, while for high frequencies the slope is -2. The asymptotes cross at the vertex corresponding to resonance frequency 00 = ooy. The following component of the frequency response (SO) (53)

generates a peak for small values of~. The amplitude and frequency at which this peak occurs are given by the following expression M = log

( 1) 2~~1- ~2

(54)

oom = ooy~1 - 2~2 (55) It follows from (55) that this peak exists only for sufficiently small values of the damping factor, namely for

, <

V;

(56)

It is now not difficult to construct a Bode plot for (49) (see Figure 4). The only difference between the asymptotic frequency response presented in Figure 4 and the frequency response of the AR(1) model is that the slope of the asymptote is equal to - 2. The AR(2) model with complex roots always has a double zero at 00 = 1 and a double pole at 00 = OOy.

344

A. LEWANDOWSKI

Always zero for c.J= 1

loglf(jc.J) I

o ~----~----------~----------~----------~----1 log(c.J) -2 o --- Asymptotic Exact ~=O.9 - - - Exact ~=O.3

-1

t

-2

Always double pole

Figure 4: The Bode plot for AR(2) model SPECTRAL ANALYSIS OF THE HYDROLOGICAL TIME SERIES

The procedure presented in this paper has been applied to several artificially generated time series (Box and Jenkins, 1970) as well as to time series describing real phenomena. In all cases, the structure of the model generating the time series was determined correctly and the parameters estimated using the linearized spectrum were close to values obtained using a time series analysis package. Selected results have been presented by Lewandowski (1983) and more results will be presented in the forthcoming publication. The model identification procedure presented in this paper has been used to identify the structure of a model which generates the daily flows of the Little White River, measured during the year 1979. The Little White River is a tributary to the Missisagi river in Ontario with drainage area 1960 km2 . The spectrum of the time series consisting of 358 points has been estimated using the ARSPEC method by Jones (1978) based on an autoregressive approximation technique. Other spectrum estimation techniques give similar results. The identification procedure can be easily applied to this data. Although a plotting chart, a pencil and a ruler are sufficient tools to perform this procedure, it can be automated using the EXSPECT program developed by the author (Lewandowski, 1993). The procedure consists of the following steps:

EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES

345

Figure 5: Bode plot for daily flow data 1. A horizontal line is plotted. This line is the asymptote to the linearized spectrum for w = O. This asymptote determines the amplification factor K of the transfer

function,

2. A line with slope -1 is plotted to approximate the spectrum for frequencies close to w = 0.06. When the exact frequency response is calculated and plotted then it becomes clear that the line with slope -2 must be used to approximate the linearized spectrum for w > 0.5, 3. A horizontal line must be plotted to approximate the linearized spectrum for w> 1 Therefore, the asymptotic Bode plot consists of 4 line segments (Figure 5). The linearized transfer function has poles for w = 0.06 and w = 0.6 and 2 zeros for w = 1. Comparing these results with standard patterns of AR(1) model it is possible to conclude that the model generating this time series must be AR(2) with real roots. Since there is one-one relationship between the frequency w and values of roots and zeros of the linearized transfer function it is possible to determine the parameters of the model directly from the plot. These values for the model under study are 0.53 and

A. LEWANDOWSKI

346

0.94. Therefore, the transfer function of the model generating this time series is 1 G(z) = (1 - 0.53z)(1 - 0.94z)

1 1 -1.47z + 0.498z2

(57)

Parameters of the ARMA(2,O) model have been also estimated using the statistical package MINITAB. The transfer function of the model generating the time series and obtained from MINITAB is as follows G(z) =

1

i - 1.54z + 0.593z2

(58)

The coincidence of coefficients in (57) and (58) is satisfactory.

REFERENCES Bishop, A B. (1975) Introduction to Discrete Unear Controls: Theory and Application. Academic Press. Box G. E. P. and G. M. Jenkins (1970) Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco. Choi B. S. (1992) ARMA Model Identification. Springer-Verlag, New York. De Gooijer, j.G. and R.M.J. Hents (1981) "The corner method: an investigation of an order discrimination procedure for general ARMA processes". University of Amsterdam, Faculty of Agricultural Science and Econometrics, Report AE 9/81. Eliott, D. F. (1987) "Transforms and Transform Properties" in D. F. Elliot (ed.) Digital Signal Processing: Engineering Applications, Academic Press, New York. Hannan, E. J. (1970) Multiple Time Series. John Willey and Sons, New York. Jones, R.H. (1978) "Multivariate autoregression estimation using residuals", in D.F. Findley (ed.), Applied Time Series Analysis, Academic Press. Lewandowski, A (1983) "Spectral methods in the identification of time series", International Institute for Applied Systems Analysis, WP-83-97, Laxenburg, Austria. Lewandowski, A (1993) "EXSPECT - computer program for exploratory spectral analysis of time series", to be published. Pankratz, A (1983) Forecasting with Univariate Box-Jenkins models: Concepts and Cases, j. Willey, New York. Polasek, W. (1980) "ACF patterns in seasonal MA processes", in O. D. Anderson (ed.), Time Series, North-Holland Publ. Co., Amsterdam. Robinson, E. A (1981) "Realizability and minimum delay aspects of multichannel models", in E. A Robinson (ed.), Time Series Analysis and Applications. Goose Pond Press. Sage, AP. (1981) Linear System Control. Pitman Press. Silverman, H. (1975) Complex Variables. Houghton-Mifflin. Tukey, j. W. (1977) Exploratory Data Analysis. Addison-Wesley Publishing Company, Inc, Reading, Massachusetts.

ON THE SIMUlATION OF RAINFALL BASED ON TIlE CHARACfEruSTICS OF FOURIER SPECTRUM OF RAINFALL

U. Matsubayashi 1, S. Hayashi2 and F. Takagjl 1 Department of Civil Engineering Nagoya University, Chikusa-ku, Nagoya 464-01, Japan 2 Kawasaki Heavy Industries Noda City, Chiba 278,Japan Design rainfall is usually detennined by magnifying historical data to an amount corresponding to a certain return period. However, the spatial distribution of the precipitation is usually not considered in the design rainfall computation. With this point of view, in this paper we aim to discuss the spatial characteristics of rainfall. Precipitation occurs with various turbulences, i.e. convective cells, rain bands, cyclones, etc., which can be successfully expressed bya Fourier series. Moreover, a Fourier series formulation can easily relate the volume of rainfall through its linear term and various other shapes through its trigonometric terms. We apply the Fourier analysis to the rainfall data of radar observations and discusses the Fourier coefficient and phase angle. INTRODUCTION Design rainfall is the most basic quantity in the planning of flood control projects. This design rainfall is based on the frequency analysis of historical records of point rainfall, i.e., T-year return period. Because rainfall varies in time and space, the design rainfall is sometimes reproduced from historical rainfall data by adjusting the total amount of rain to be equal to the T-year rainfall depth. This method however, gives umealistic results when the magnitude of the rainfall referred to is different from the design rainfall depth. To improve the determination of the design rainfall, a stochastic simulation procedure is needed wherein the design rainfall is distributed in time and space and corresponds to a certain return period. As for the simulation of rainfall distribution in time and space, there are several procedures which can be used (i.e., Amorocho and Wu 1977, Cortis 1976, Meija and Rodriguez-lturbe 1974, Bras et al. 1986, Kavvas et al. 1987 and Matsubayashi 1988). However, these methods do not succeed in introducing the return period. With this point of view, this paper utilizes the Fourier series to express the rainfall field subjected to a given magnitude (i.e. the areal averaged rainfall). 347 K. W. Hipel et al. (eds.j, Stochastic and Statistical Methods in Hydrology and Environmental Engineermg, Vol. 3, 347-359. © 1994 Kluwer Academic Publishers.

U. MATSUBAYASHI ET AL.

348

THEORETICAL DISCUSSION

The rainfall field is a typical random field which originates from the fluctuation in the meteorological properties of the atmosphere such as vapour pressure, temperature, phase change of matter and wind field. However it is also recognized that the rain field consists of several scales of phenomena individually known as extratropical cyclone, rain band, cell cluster and convective cells where a number of smaller scale phenomena are included in a larger phenomenon (Austin and Haze 1972). This hierarchical characteristics is a kind of regularity in the randomness found in turbulent flow. These characteristics can be expressed by a Fourier series which consists of various scales of waves expressing each scale of rainfall phenomena stated above. The Fourier series for a two-dimensional field can be expressed by equation 1.

r(x,y)

m

n

=aoo + ,.z;;,';1 Aijsin ((JZlj + kz,x) x sin ((Jyij + ky jY)

(1)

_iz.. k z;L x

It is worthwhile to discuss the one-dimensional Fourier series shown in equation 2 because it

inherently posses almost all the important properties of the two-dimensional case, i.e. significance of the constant term a00' distribution characteristics of wave number, kxi and kyj' and phase angle difference (Jx ij' ij'

e,

t

a m . rex) = 2° + ~ Aj sin (k;x +(Jj) where kj =

(2)

In equation 1, ao is the areally averaged rainfall in a region of the length L, Ai and (J; are amplitude and phase angle of the sinusoidal function with a wave number ki. In these parameters, randomness is included in ao,Ai and (J;. As mentioned above, the rainfall field is

not simply a random process but also a deterministic process shown in the hierarchical structure of the rain band and convective cell and others. Therefore, to simulate the rainfall field, these variables should be properly evaluated based on their stochastic and deterministic properties.

Tbe Fourier spectrum A,.2 Among three variables, the amplitude Ai, or the Fourier spectrum A,.2 has two aspects for treatment in the analysis. One is from the knowledge about the turbulence. Kolmogorov derived a relationship between the energy spectrum Ek of turbulence and wave number k by

CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL

349

means of dimensional analysis as shown in equation 3, (3)

This equation explains the distribution of energy of turbulence where energy is transferred from a long wave to a short wave like a cascade, and finally dissipates as heat of molecular shear stress. By analogy with turbulence, the rainfall field may also be considered to have a similar mechanism wherein rainfall has large scale of rain band and numerous small convective cells where water is transferred to larger scale of phenomena to a smaller scale and falls as rainfall. Based on this consideration and applying the achievement in turbulent studies, the Fourier spectrum of rainfall field is expected to have a form shown in equation 4 for the one dimensional case, corresponding to Eq. (3) for turbulence. (4)

On the other hand, the Fourier spectrum ,Ai2 , is theoretically obtained by applying the Fourier transformation to the auto-correlation function of the rainfall field. The characteristics of the auto-correlation function of the rainfall have been studied by many researchers (1.1. Zawadzki 1974, Ohshima 1992) reporting that the A.C.F. of the rainfall decreases linearly or exponentially. The Fourier spectrumAi2 derived by applying the inverse Fourier transform to the exponential A.C.F. is shown as, (5)

(6)

In this research equation 6 is also used to express the Fourier spectrum of the rainfall, which can express both equations 4 and 5. That is, equation 6 with k=2 is reduced to equation 5 and equation 6 with as=O means equation 4. These expressions of the Fourier spectrum give clues about the discussion of the rainfall simulation.

Areal average rainfall ao Various rainfall simulators mentioned before can produce rainfall in time and space. However they cannot properly simulate the rainfall field corresponding to a certain return period T. The return period, an average recurrence interval of rainfall, is usually defined for point rainfall depth of an event. In considering the spatial distribution of rainfall, the return period should

U. MATSUBA YASHI ET AL.

350

be analyzed for rainfall of a specified area because rainfall has centrifugal characteristics originating from the hierarchical structure of the rain band, the convective cell, and etc. Therefore the maximum value of spatially averaged rainfall for a certain area is strongly dependent on the area considered. These characteristics are usually discussed as DAD analysis. In the Fourier series, arJ2 in equation 3 is the spatial average value of rainfall within the area where the Fourier series is spanned. Therefore from the above discussions the ao characterizes the total amount of rainfall in the area and is evaluated for a certain return period through DAD analysis. In other words, through the parameter ao, the statistical characteristics (i.e., return period of the occurrence of rainfall) are explicitly introduced in the simulation model. On the other hand, the parametersAj and 8i determine how the total rainfall (a0l2)L should be distributed in space and time. So these parameters should be carefully evaluated to mimic the stochastic characteristics of rainfall. In this research, however, we concentrate on the discussion of the spatial distribution of rainfall and the time changing of rainfall is not treated.

RAINFALL DATA

Domain of

the Analysis

Date

Duration Type of rainfall

6(29186

1:0012:00 0:0012:00 5:008:00 1:0011:00 4:0014:00

7/10/86 6/30/88 9(20/88 9(25/88

Figure 1. Nagoya radar site and region of rain data

warm frontal rainfall stationary frontal rainfall warm frontal rainfall warm frontal rainfall stationary frontal rainfall

Table 1. Characteristics of rainfall data

Rainfall data used to analyze the Fourier spectrum and the phase angle are the radar rain gage data obtained at the Nagoya Meteorological Observatory (Nagoya City, Aichi, Japan) shown in Figure 1. PPI data are given at grid points of 2.5 km by 2.5 km mesh in a 500 km by 500 km square region at every 7.5 minutes. We analyze five storms which occurred during the years 1986 and 1988. The characteristics of the rainfall are listed in Table 1. Because we do not discuss the time changing of rainfall, spatial distribution of hourly rainfall are analyzed independently with time. In the case of one dimensional rainfall field, rainfall data along a certain grid line are used.

CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL

351

150

~

.§.

~

100 50

iii

a:

0

10

2.5

17.5

25

100

Figure 2-a. One dimensional rainfall distribution of a single peak rain (September 20, 1988, 6:~7:(0) •

~1000~~~--------~--------.

~

~

~

!

.....~............ .. 10

Pha.. Angle Phase AnOIa Difle.. nc:.

e 1 00· 1-

~·~r~
EQ.(6)

'I....... ~

········r·...···················..·_··

....;~~:~~~....................-. ! ....: .::::-..:-._ .::.-:.

+.. . . . ..... . .. ..... -..+~.

0.1 ....

.~

&.:J 0.001

)...3.3 (1.0 .134 ~._._.u~ ....... u..~:. ~s -3.9 a. -0.04

1. ~ i--____..L..:" - '

.l..-~_ _ _ _ _ _ _ _ _

0.1

2

1

Figure 2-b. Fourier spectrum distribution

2.5

10

17.5

25

3

456

7

Wave Number, 'k (km" )

Wave Number. k (km")

32.5

40

Figure 2-c. Distribution of the phase angle and the pbase angle difference

47.5

55

82.5

Distance (km)

70

71.5

85

_I

92.5 100

Figure 3-a. One dimensional rainfall distribution of uniform rain (July 10, 1988, 5:~:OO) • •

180

Phase Angle Phase Angle ~c:e

• "! ....

• ~ _ "'! _

0

81-I 56

'f • '!

•i ..l e i ·· l ••~ i! II", .,···i'J+·· ..······. ·.·..···~··i· ... ~·.......···· . 1 .,. r. ~.. !II. ! - .1

J

: ,

~ .

,. •a!i:

o

=a

•• !: .! ;. ' I · i ~ 1 ! i. ! •• ~ ..... ·.,·--·i.......·i·........!-.......1 .. .~ i " i .. k !

,180.•• : : - - :"!II" !II. :

o

Figure 3-b. Fourier spectrum distribution

:

i : I · •• ~ : -.. ,~ ; . : • ' ri .\. '---1JI..a"i· i·..·..l-r·····~··········!,....···

,90 c,..._·..

Wave Ntmb8r, k (km")

. :.



r "'.· .,T"i ..... R

t

1

~:

234

.,;,,'" :

:

5

6

..... .

7

Wave Ntmber, k (km" )

Figure 3-c. Distribution of the phase angle and the phase angle difference

U. MATSUBA YASHI ET AL.

352

RESULTS OF THE ONE DIMENSIONAL ANALYSIS

In the following discussions, one dimensional rainfall fields are analyzed because the simplified treatment is helpful in understanding the fundamental characteristics of the rainfall. Figure 2 and 3 show typical examples of rain events and the results of the analysis. Figure 2a is an example of uniformly distributed rainfall and Figure 3-a, a rainfall with a single peak. These two types of events are called as "uniform rainfall" and "single peak rainfall" in the following discussions. In between these typical cases, there is a type of rainfall with several peaks in one event and is called "complex rainfall." Both lengths of the rain field analyzed in these figures are 100 kIn.

Fourier spectrum Ail Figures 2-b and 3-b shows the Fourier spectrum A,.2 in relation with the wave number ki for two rainfalls. From these figures, it can be seen that Ai2 decreases with an increase of the wave number ki for both cases. However, the different characteristics are also found for the two types of rainfall, that is, the single peak rainfall has rather large values of Ai2 in almost the whole region of ki and is convex in the low wave number range. On the other hand, the uniform rainfall shows smaller Ai2 values and linear recession in a log-log axis. In the modeling of the Fourier spectrum, three types of relationships (equations 4 to 6) are used based on the previous discussion. One of these relationships is from the analogy of turbulence, where the equation is assumed from Kolmogorov's power law. Another is from the exponential auto-correlation function of rainfall. The other is a combined form of these two relations. These relationships are applied to the observed spectrum and is shown as a solid line for equation 4, as a broken line for equation 5 and as a dotted line for equation 6 in Figures 1b and 2-b. From these figures, it may be concluded that equation 4 is applicable to the spectrum of uniform rainfall for a wide range of wave numbers, but for the single peak rainfall, the solid

SO-r--------------. 40



Sharp Rain

III

Complex Rain

90~--------------------------~ •

m

67.5

~ ~

~ II.. 10

o

0.83

1.7

2.5

3.3

4.2

C

,

Sharp Rain Complex Rain Uniform Rain

45 ~ 22.5 '

0.075

0.15

a Figure 4. Histogram of A.

Figure S. Histogram of a

0.225

0.3

CHARACTERISTICS OF FOURmR SPECTRUM OF RAINFALL

353

line shows a remarkable deviation from the observed data in the lower range of kj. Because the vertical axis is logarithmic, the deviation cannot be ignored. This difference may originate from the non-uniform characteristics of the minfall in nature while equation 4 is derived on the assumption of a unifonn field. Figure 4 shows the histogram of parameter A in equation 4 for 363 cases of rainfall. In this figure, rainfall events are classified into three types, namely, single peak rain, uniform rain and complex rain. This figure shows that a single peak rain has a large value of A compared to uniform rainfall. It is interesting to note that the overall average value A(2.28) is almost comparable to Kolmogorov's theoretical value of 5/3. On the other hand, equation 5 seems to be applicable especially in the lower range of kj. Although the apparent errors in the larger range of kj are big, real errors of the speCtrum are small. Figure 5 shows the histogram of the parameter a. This figure shows that a is large for the single peak rain event and almost zero in the uniform rain event which converges to equation 4. Compared to equation 4 and equation 5, equation 6 is most applicable for both single peak rain and uniform rain because of its high degree of freedom to express the Fourier spectrum. The estimated dotted line has a good fitting in both small and high range of kj. In addition to the deterministic parts discussed above, random fluctuations in the Fourier spectrum is also observed in Figures 2-b and 3-b. The deviation of plots from the deterministic trend is not clear, but it seems to depend on the type of rainfall whether it is a single peak rain, a complex rain or a uniform rain.

Phase angle 8i Figure 2-c and 3-c show the phase angleq; and the phase angle difference .18;(=8;-8;-1). From these figures, an obvious difference can be seen between the single peak rainfall and uniform rainfall. The uniform rainfall shows random scatter in both 8; and .18;. On the other hand, the phase angle 8; of the single peak rainfall shows almost a linear change and consequently ..18; takes a constant value, to understand these differences, it should be recognized that the phase angle 8; determines the location of the peak of the basic wave, and that the other phase angle 8; has meanings with respect to 81. In addition, for the Fourier series where the peak of the sine curve coincides at a certain point which produces a single peak rain, it can be proved theoretically that .18; should satisfy the relation .18j = 81 -nl2. These characteristics can be found in figure 2-c of the single peak rainfall. Although the deterministic component is dominant in the pbase angle of the single peak rain, a random component is also observed around the deterministic part. The randomness in the phase angle, however, is found to be strong in the order of uniform rain, complex rain and weak in the single peak rain.

The magnitude of the stonn and tbe amplitude of tbe Drst term As explalJ1ed above, the spatial distribution of rainfall is described by the average rainfall aol

U. MATSUBAYASHIET AL.

354

2 and the Fourier coefficient Ai. Among these properties, afl2 is determined from DAD analysis and Ai can be determined by equation 4 to equation 6 for adequate parameters. Parameter Ai, however, also cannot be independent of the rainfall magnitude. Here we focus on the AI, the amplitude of the filst term as the representative parameter, and relate it to the areal average rainfall. Figure 6-a and figure 6-b show the relationship between Al2 and the square of the areal average rainfall for sharp and uniform rainfall. Although some scatter was

300

300

200

200

100

100

o

o

50

100

150

Square of Average Rain «mm/h~)

Figure 6-a. Relationship of A/ to r2 at a single peak rain

o

200

o

50

100

150

20(

Square of Average Rain «mm/h)2)

Figure 6-b. Relationship of A/ to r2 at a uniform rain

observed, almost linear relationships can be assumed. It can also be agreed with that the slope of the relationship for single peak rainfall is steeper than that for uniform rainfall. Because Figure 6-a and Figure 6-b are plotted by selecting typical homogenous and single peak rain events, complex rain plots may fall in between these two linear relationships. For the purpose of simulation these results promise to determine not only aol2 but also Ai based on the return period of the design rainfall. CHARACI'ERISTICS OF TWO DIMENSIONAL RAIN FIELD

The two-dimensional field is described by equation 1. Among the parameters in equation 1 the Fourier spectrum Aij2 is expressed in equation 7 by multiplying equation 5 applied to two directions. This expression is based on the assumption of independence of the process in two directions. This simplified approach, however, can express heterogeneity of the field which is reported by Zawadzki (1974), Matsubayashi et al. (1987) and Oshima (1972). A2 _

16 r20

2 X 2 k2) U-( ax2 + k xl a y + yl

(7)

Figure 7-a and 8-a are the typical examples of the single peak rainfall and uniform rainfall respectively. Figure 7-b and 8-b are the obsetved Fourier spectrum distributions and Figure 7c and 8-c are the distribution estimated by equation 7. In Figure 7-a and 7-c, the Fourier spectrum in the small kx zone are relatively large which means that the rainfall field is prolonged in the N-S direction. This assures that equation 7 can express heterogeneous

CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL

355

Figure 7-a. Two dimensional (2-D) rainfall distribution of single peak rain (September 20, 1988, 6:00-7:00) 82.0

78.71

;::: E

§

a.=O.060 a ,=0.33

E 2 U

"c.

(/)

Wave Number,k,(km,1)

Wave Number,k,(km")

Figure 7-c. Estimated 2-D spectrum

Figure 7-b. Observed 2-D spectrum Wave Number,k y (km

-I

) 0.961

Wave Number,k y (km- 1) .26

~~~~

'E :!!.

Figure 7-d. 2-D phase angle

(J'
Figure 7-e. 2-D phase angle

(JYij

356

U. MATSU BAYASH IET AL.

Figure 8-a. Two dimensional (2-D) rainfall distribution of uniform rain (September 25, 1988, 11:00-12:00)

a.=O.0l2 a, .... O.070

23.14

.1

21.47

Wave Number,ky(km"'J

Figure 8-b. Observed 2-D spectrum

Figure S-c. Estimated 2-D spectrum

Wave Number,k y (km- 1 )

~~~~~~.26

-

Figure S-d. 2-D phase angle

()"ij

Figure S-e. 2-D phase angle

()yij

357

CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL

characteristics of the field. It can also be seen that they have similar characteristics with the onedimensional analysis. These results shows the applicability of equation 7 to express the twodimensional Fourier spectrum. As for the phase angle, Figure 7-d, e and Figure 8-d, e shows the distribution of 8xij and Byij of single peak rainfall and uniform rainfall. The thick and thin lines are contour lines of positive and negative phase angles. A cyclic change of the phase angle is observed in the single peak rainfall, on the contrary, homogenous rainfall shows a random distribution. These characteristics correspond to the ones observed in the one-dimensional case.

SIMUlATION OF THE ONE·DlMENSIONAL RAINFALL FIELD Based on the characteristics of the rainfall described above, the simulation procedure for the 120

~

.s

88.75

cr

26.25

~ iii

)..-3.9 a s -O.04

57.5

-5

Averl9t of Rat na 13.6 mm/h

2.5

10

17.5

25

32.5

40

47.5 55 62.5 Oi stence (lem)

70

n.5

85

92.5

100

Figure 9. Example of simulation for a single peak rain

~

120

). -Z.7

s as-O.OZ

E 88.75

.s ~iii

cr

57.5

Averl9t of Rain-9 .ee mm/h

26.25 -5

2.5

10

17.5

25

32.5

40

47.5 55 62.5 Dlstence (km)

70

n .s

85

92.5

.-100

Figure 10. Example of simulation for uniform rain one-dimensional rainfall field is proposed here. The two-dimensional case is not presented because the data analyzed are not sufficient to obtain reliable characteristics of the parameters. The procedures are as follows: 1) Evaluate the areally averaged rainfall aol2, the parameter of spectrum distribution as, A.s and the phase angle (JJ of the first term. 2) Evaluate Ai2 from the relationship between Ai2 and average rainfall.

358

U. MATSUBAYASHIET AL.

3) Calculate Al.2 distribution based on equation 4,5 or 6 with random noise which will express the scatter observed in Figures 2b and 3b. 4) Calculate the 8; by using ti(Ji 8; - 8;-1 81 - 9lr and a certain scatter for single peak rainfall and by using uniform random distribution for uniform rainfall. 5) Calculate rainfall distribution by equation 2. Figures 9 and 10 show two examples of simulation for single peak rainfall and uniform rainfall respectively. In these figures, especially in Figure 9, it is found that rainfall is negative in some points for the single peak rainfall case. This is an intrinsic characteristic of the Fourier series and is difficult to remove. But the effect of these negative values is small enough in practice. These two results, compared to the observed rainfall events, are feasible as the natural rainfall.

=

=

CONCLUSIONS In this paper, the Fourier series is utilized to simulate the design rainfall. It is easy to introduce the return period to the simulated rainfall. The characteristics of the Fourier coefficient and phase angle are discussed in which they play an important role in reproducing the spatial distribution of the rainfall. Results obtained here are summarized as follows; 1) The return period can be explicitly included in arJ2 through DAD analysis. 2) The Fourier spectrum for uniform rainfall can be formulated by a similar relationship as Kolmogorov's law of turbulence. On the other hand, the spectrum for the single peak rainfall with a single dominant peak can be derived from the exponential auto-correlation function. 3) The formulation incorporated both Kolmogorov type and exponential auto-correlation type can be adopted to almost all rainfall. 4) The Fourier coefficient of the first term can be linearly related to the areally averaged rainfall. Therefore, these coefficients can also be related to the return period. 5) The phase angle varies randomly for uniform rainfall and linearly change with the number of terms for the single peak rainfall. 6) It is shown that the proposed simulation procedure can reproduce rainfall field with similar characteristics with natural rainfall.

REFERENCES 1) Austin, P.M. & Houze, R.A.(1972) "Analysis of the structure of precipitation patterns in New England", Jour. of Applied Meteorology, Vol. 11 ,926-935 ,. 2) Amorocho, J. & Wu, B.(1977) "Mathematical models for the simulation of cyclonic storm sequences and precipitation fields", Jour.Hydrology ,32,329-345. 3) Bras, R.L. & Rodoriguez-Iturbe, 1.(1984) "Random Functions and Hydrology", Adison Wesley Publication, Menlo Park, California. 4) Corotis, R.B.(1976) "Stochastic considerations in thunderstorm modeling", Jour.of the Hydroulic Division ,ASCE , HY7 , 865-879.

CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL

359

5) Hobbs, P.V. (1978) "Organization and structure of clouds and precipitation on the mesoscale and microscale in cyclonic storm", Review of Geophysics and Space Physics, 16, No.4,741-755 6) Kavvas, M.L., Saquib, M.N. & Puri , P.S.(1987) "On a stochastic description of the timespace behavior of extratropical cyclonic precipitation field", Stochastic Hydraulics, Vol.1, 3752. 7) Matsubayashi, U. and Takagi F. (1987) "On the probabilistic characteristics of point and areal rainfall", Proc. of International Conference on Hydrologic Frequency Modelling, 265275. 8) Matsubayashi, U.(1988) "On the simulation of the rainfall of the extratropical cyclones", Bull. Nagoya University Museum, 4,81-94. 9) Meija, J.M. & Rodriguez-lturbe, 1.(1974) "On the synthesis of random field sampling from the spectrum: An application to the generation of hydrologic spatial process", Water Resour.Res., Vo1.10, No.4, 705-711. 10) Oshima, T.(1992) "On the statistical spatial structure of rainfall field", Graduation thesis, Civil Engineering Department, Nagoya University. ll)Waymire, E. , Gupta, V.K. & Rodriguez-Iturbe, 1.(1984) "A spectral theory of rainfall intensity at the mesoscale", Water Resour.Res., Vo1.20, No.10, pp.1453-1465. 12) Zawadzki, 1.1.(1973) "Statistical properties of precipitaion patterns", Journal of Applied Meteorology, Vo1.12 ,459-472.

PART VIII TOPICS IN STREAMFLOW MODELLING

CLUSTER

BASED

PATTERN

RECOGNITION

AND

ANALYSIS

OF

STREAMFLOWS

T. KOJIRI1, T.E. UNNY2, U.S. PANtP lDepartment of Civil Engineering, Gifu University, Gifu, Japan, 501 11 2Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G 1 3Department of Civil Engineering, Lakehead University, Thunder Bay, Ontario, Canada P7B 5E1 Traditional methods of streamflow analysis and synthesis are based upon information contained in individual data. These methods ignore information contained in and among groups of data. Recently, the concept of extracting information from data groupings through pattern recognition techniques has been found useful in hydrology. For streamflow analysis, this paper proposes several objective functions to minimize the classification error encountered in currently used techniques employing minimum Euclidean distance. The relevance of these functions has been tested on the streamflow data at the Thames river at Thamesville. Specifically, three objective functions considering the properties of shape, peak, and gradient of streamflow pattern vectors are suggested. Similar objective functions can be formulated to consider other specific properties of streamflow patterns. AIC, intra and inter distance criteria are reasonable to arrive at an optimal number of clusters for a set of streamflow patterns. The random initialization technique for the K-mean algorithm appears superior, especially when one is able to reduce initialization runs by 20 times to arrive at optimal cluster structure. The streamflow synthesis model is adequate in preserving the essential properties of historical streamflows. However, additional experiments are needed to further examine the utility of the proposed synthesis model.

INTRODUCTION Various kinds of data related to hydrologic variables such as precipitation, snow fall and streamflows are measured at a number of points using various equipment with the_view of assessing and controlling the water resources systems. Several techniques exist for separately handling the time sequenced and multi-point data. However, one can extend 363 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 363-380. © 1994 Kluwer Academic Publishers.

T. KOJIRI ET AL.

364

such techniques to time sequenced multi-point data by considering all kinds of data at a measurement point to form time sequenced data vectors. Such vectors can then easily be treated and analyzed as pattern vectors [panu et al. (1978) and Unny et al. (1981)]. Based on the consideration of spatial and temporal correlations, one can classify time sequenced spatial pattern vectors corresponding to precipitation or streamflows for the extraction of representative reference vectors. Similarly, the differences among pattern vectors can be utilized to classify them by incorporating information related to precipitation, meteorology, geology, physiography etc. Consideration of groups of data makes the process of estimation and prediction easier. It is in this vein that the capability of pattern recognition techniques in handling the time sequenced multi-point data becomes readily useful. A pattern recognition system (PRS) utilized by Panu et al (1978) and Unny et al (1981) for streamflow analysis and synthesis was based on the minimum Euclidean distance concept. In their investigations, it was found that in some cases, the minimum distance concept tends to misclassify streamflow patterns. In such cases, additional constraints were invoked to minimize the classification error. It is noted that the misclassification of streamflow patterns by PRS is caused by the consideration of the entire shape of patterns in the minimum distance concept. To overcome such difficulties in the PRS for classification of streamflow patterns in particular and hydrologic patterns in general, the following classification functions are suggested.

OBJECTIVE FUNCTIONS FOR CLASSIFICATION Streamflow patterns inherently possess several characteristics. Among them, the most obvious is the peak flow. Other characteristics could be long periods of low flows. The following objective functions are based on some of these characteristics. Each of these functions considers a specific property of streamflow patterns. These functions are later shown to be effective in dealing with specific problems of streamflow analysis and synthesis such as in cases of flood and drought conditions.

Objective function-one [OFl] This function examines the shape aspect of streamflow patterns in the form of individual distances as follows. OFl [Xi,

zj]

(1)

Where, Xi(t) is observed or transformed data value at time, t of the ith pattern vector; zj(t) is the value of the jth reference vector (or cluster centre) at time, t. The absolute

CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS

365

difference at each element is normalized. A small value of OFI indicates strong similarity between the observed pattern vector and the reference vector. It is noted that OFI determines the degree of similarity based on the variations in each element rather than the entire shape of the pattern vector.

Objective function-two [OF2)] The peak discharge is the single most important characteristic influencing any flood flow analysis and planning of flood control structures. The function defined below examines the peak flow component of a pattern vector and consequently facilitates the classification of all the patterns related to flood flows. OF2[Xi, zj]

=

Ixi pet) - zj pet) z j p (t)

(2)

1

The subscript, p indicates location of the peale

Objective function-three [(OF3] The streamflows tend to rise and fall sharply (Le., steep gradient) in response to rain or snowmelt conditions. Such rises and falls are more pronounced in daily rather than in monthly streamflows. Further, the rises and falls are milder (Le., mild gradient) during the low streamflow periods. The gradient can thus distinguish the streamflow patterns with sharp fluctuations from those with low fluctuations, while such streamflow patterns may have the same value of the OFl function. The objective function OF3 based on normalized gradient is given below. OF3[Xi,Zj]

=max[ I(Xi(t)-xi(t:l»-(~j(t)-zj(t+l» t

l

P(zJ(t)-zJ(t+l»

I]

(3)

Where, {3 represents the normalizing factor for comparing the above three functions (OFI, OF2 and OF3) in the same order of magnitude. In some situations, one may need all the above functions collectively to improve upon the classification process. In such cases, one can formulate an aggregate objective function (OFa) as follows. OFa [X i, Z j] =max[oFl (x i , Z j) , OF2 (x i , Z j) , OF3 (x i , Z j)]

(4)

The aggregate function collectively involves all the above three functions. Therefore, this function can be used for simultaneously classifying streamflow patterns corresponding to different events, measurement points, and seasonal variations.

366

T. KOJIRI ET AL.

CLASSIFICATION PROCEDURE FOR STREAMFWW PATTERNS The objective functions OFt, OF2, OF3, and OFa are used in the K-mean algorithm for classification of streamflow patterns. The manner in which the K-mean algorithm is applied for classification is described in the Appendix. Further, the bias in selection of initial centres of clusters is avoided by using the random initialization of K-mean algorithm suggested by Ismail and Kamel (1986). STRUCTURAL RELATIONSHIPS IN MULTIVARIATE DATA For processes obeying linear transition among observed data at different time periods, one would obtain the same number of clusters (or reference vectors) containing the same number of pattern vectors within a specified time period. Most of hydrologic processes are inherently non-linear and as a result in an actual process, one may obtain a different number of clusters, or the clusters may have different combinations of pattern vectors. The structural relationships among various clusters within a process or among processes are evaluated through the concept of goodness of fit. Secondly, one defines the conditional probability of occurrence, p(j/j') of a reference vector [Suzuki (1973)] as follows: k(j)

p(j/j') = n(j/j')/L, n(j/j')

(5)

j=l

Where, n(j/j') is the number of pattern vectors associated with the cluster, j given j' has occurred. k(j) is the number of clusters considered for the analysis. It is advantageous to develop structural relationships among clusters exhibiting higher correlations within a process or among processes. Such structural relationships are, in tum, utilized in the prediction or simulation of streamflow patterns. The Markovian structure among clusters is obtained as follows: k(u)

p(j/j') > L, p(u/u')

V u excluding j

(6)

u

k(j)

k(u)

k(j)

k(u)

j=l

u, u~j

j

u, w.j

i.e., [n(j/j')/L, n(j/j')] > L, n(j,u)/L, L, n(j,u)

(7)

CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS

367

DEVEWPMENT OF SIMULATION AND PREDICTION ALGORITHMS

Simulation algorithm A process having no correlation structure among measurement points, events, and/or seasons can be simulated independently. However, the streamflow patterns exhibit correlation among them and therefore, can be synthesized by following a procedure suggested by Panu and Unny (1980a, 1980b), where the conditional probability of occurrence of pattern vectors and the normal distribution of the intra distance are utilized. In this paper, streamflow patterns are considered to belong to two seasons and are simulated as follows: Step 1:

Generate a sequence of clusters according to the Markovian probability of their occurrence.

Step 2:

Synthesize each cluster to its pattern vector by using a multivariate normal distribution.

Step 3:

Test whether the elements of a synthesized pattern vector lie within their specified limits. If not, synthesize another pattern vector until its elements are found within limits.

Step 4:

Return to Step 3, until an acceptable pattern vector corresponding to each cluster in step 1 is found.

Prediction algorithm Assuming the membership functions to be exponentially distributed and utilizing the concept of fuzzy inference, the pattern vectors are predicted [Kojiri and Ikebuchi (1988)]. Beyond the observed season, the serial sequences are predicted by combining the fuzzy inference with the expectation method. In general, a real time prediction of a pattern vector is used for forecasting flood or drought events. A pattern vector is forecast based on the value of OFI between the actual observed pattern vector and its representative reference vector as follows: (8)

(9)

368

T. KOJIRI ET AL.

Further, assuming that the fuzzy membership function of each cluster has the same weights as the frequency of occurrence of a cluster, the membership function is represented as follows.

Vj

=exp { (-a j

k(i)

h j D&'served) /

Ei h (i) }

(10)

Where hj denotes the frequency gained in the classification procedure and a;; is a constant depending on the logic situations of the distance i.e., large, medium, and small related to l)iobserv.... One can then predict the pattern vector based on the fuzzy inference technique [Kojiri et al (1988)] as follows:

Predicted Pattern Vector =

k(j)

k{j)

j

j

E Xiredicted/ E

Vj

(11)

APPUCATION OF THE METHODOWGY: A CASE STUDY The Thames river basin covering 4300 krn2 area at Thamesville was selected to test the applicability of the proposed pattern synthesis and forecasting procedures. The monthly discharge and precipitation records are available from October 1952 to September 1967. The mean monthly discharge values are used in the analysis. Based on the correlogram and spectral analysis, the discharge data was divided into two seasons: a dry season from October to March, and a wet season from April to September. In general, every seasonal segment appears to be different from the rest, and the variation in standard deviation for some months is very large. The seasonal segments (or pattern vectors) are now clustered into groups to derive the structural relationships among them. The K-mean algorithm is used for grouping the seasonal segments. A random initialization technique [Ismail and Kamel(1986)] is used to achieve the global optimum. Because, the behaviour of the K-mean algorithm is influenced by several factors such as the choice of initial cluster centres, the number of cluster centres, the order in which seasonal segments are considered in clustering process, and the geometrical properties of seasonal segments. Several test runs indicated that four clusters would be adequate to capture the relationships among and within various seasonal segments. In general, there exists ISC4 combinations to group 15 seasonal segments into four clusters in each season. To find out the minimum possible

CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS

369

run of the K-mean algorithm for optimal cluster configurations, 200 runs of the K-mean algorithm were made to group 15 seasonal segments into four clusters. The value of OF1 was evaluated for each run [Figure 1]. From this figure it is apparent that a significantly small value of OF1 has occurred twice in 200 initial runs of the K-mean algorithm. These significantly small values can be attributed to a situation when four clusters have attained optimal cluster configuration, i.e., a condition when the intra distance DK(K) is minimum and inter distance EK(K) is maximum. Therefore, the number of initial conditions could be appreciably less [Table 1] for various combinations. Further, this table also contains the values of the intra distance DK(K), inter distance EK(K) and the Akaike Information Criteria (AIC) [see; Appendix] for the OF1. The values of DK(K), EK(K) and AIC are plotted against the number of clusters in Figure 2. An examination of the figure and the table indicates that for a case of four clusters, and a reasonable number of 100 initialization runs, the value of AIC is minimum, the intra distance is continuously decreasing up to four clusters and the rate of decrease is very small from four to eight clusters, and the inter distance is fluctuating but is maximum for the fourcluster case. Based on such considerations, it was assumed reasonable that four clusters sufficiently describe the variability of pattern vectors in both the seasons. Considerations of intra and inter distances and the values of AIC provide a useful but somewhat inflexible method of obtaining optimal number of clusters for a set of given pattern vectors. 4 3.5

....... .....

u..

Q. c 0

3 2.5

:p (.)

c

::J

u..

2

4)

~ (.)

1.5

4)

"JS 0 0.5 0

0

20

40

60

80

100 120 Run Number

140

160

180

Figure 1. Sequence of Values of Objective Function One [OF1].

200

T. KOJIRI ET AL.

370

Table 1. Summary of AIC, Intra, and Inter Distances Using OF!

Number of Clusters

Run Number

Intra Distance

Inter Distance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

2 50 70 100 150 250 300 300 250 150 100 70 50 2 1

2.811 2.373 2.007 1.250 0.942 0.942 0.942 0.942 0.436 0.436 0.436 0.299 0.203 0.181 0.000

0.000 0.399 0.735 0.876 0.716 0.686 0.472 0.681 0.967 0.433 0.433 0.433 0.344 0.344 0.344

Ale n/a

20.966 16.199 14.084 15.124 16.383 17.489 18.792 18.880 20.741 22.406 24.243 26.128 28.059 30.000

4

30

3.5 en

3

5i

2.5

-...
en

25 20

i5

~ "0 fa

i

15 0

2

<

1.5

10

1 5

0.5 0

0

2

4

6 8 10 Number of Clusters

12

14

0

16

Figure 2. Optimal Number of Clusters as Function of DK(K.), EK(K.), and AIC

CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS

371

Another method of obtaining optimal number of clusters is through multioptimization technique. In this technique, the target (goal) intra distance (DK(K» is defined as minimum and the target inter distance (EK(K» as maximum of the intra and inter distances of all the feasible clusters. The values of intra and inter distances and the associated target distances when plotted [Figure 3] define the transformation curve (TC). Because, all the conditions for the multiplication factor ('Y) are not satisfied, the indifference curve (IC) becomes a straight line parallel to the line passing through the points defined by the cluster-l and cluster-IS. Incidently, these points also exist on the ends of the transformation curve. The optimum solution again lies at the four-cluster case.

3~-----------------------------------' C (Goal: de, eo)

2.5

a>

2

0

c:

!9 Ul

0

1.5 OplimaJ Poinl

L-

...-c:a>

/

---~---- ___41

1

Transformotlon Clrve (TC)

~

--------3~

'2 - - - - _ _ _ .!..rId~ferent::e Ctrve QC)

B

o o

(~,

ell

0.5

A (d l l e 1 ) - - :- - - - - - - ____ ~"

1

1.5 2 Intra Distance

----~ 2.5

1 3

Figure 3. Optimal Number of Clusters as Function of Multi-Optimization Criterion.

372

T. KOJIRI ET AL.

Based on the results of above considerations, the constraints for the K-mean algorithm are obtained [Table 2]. The values of constraints are found to be less than seven tenth of the maximum value of DK(K) and greater than two tenth of the inter distance EK(K). The optimum number of clusters is obtained as the minimum number satisfying the constraints that such a number should not be greater than half the total number of pattern vectors. In cases, where neither the inter distance nor the intra distance satisfies the constraints, the inter distance takes priority over the intra distance. Table 2. Constraints Used in AIC, Intra and Inter Distances for Obtaining Optimal Number of Clusters Classification Method

Optimal Cluster Number

Constraints

> K-means Algorithm

Intra-constraint < 0.7 x max {intra (1 - IS)}, and half the total number of pattern Vectors.

4 Inter-constraint > 0.2 x [max {inter (2- IS)} - min {inter (2-1S)}] + min{inter (2-1S)}

Multi-Optimization

4

None

AlC

4

None

The same optimum number of clusters was obtained using the K-mean algorithm for various cases of the objective functions [Table 3]. The objective functions OFa and OF! render the same structure for optimal number of clusters because the resulting values of OFa are strongly influenced by the function OFl. However, OF2 function related to peak and OF3 function related to gradient give different structure to optimal number of clusters. These functions evaluate properties of pattern vectors such as occurrence of peak flows or gradient between successive events and as a result, deals with properties which are least correlated. It is in this vein that these functions will provide optimal structure of clusters in specific situations such as flood or drought analysis. The Markovian transition from one cluster to another is summarized in Table 4. The cluster centres in each season are exhibited in Figure 4. As each reference vector is unique, the OFa function has been effective in classifying streamflow data, especially for peak considerations. It is noted that if one were to consider drought characteristic, one would replace the OF2 function to reflect the low flow characteristics.

CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS

373

Table 3. Configuration of Optimal Number of Clusters for Various Objective Functions Discharge Data Set

Objective Function

Number of Clusters

Dry Season

OFa

4 (optimum)

Wet Season

OFa

4 (optimum)

Dry Season

OFI

4 (optimum)

2 (optimum) Dry Season

OFJ 4

2 (optimum) Dry Season

OF2 4

Cluster Configuration Cluster-I: Cluster-2: Cluster-3: Cluster-4:

13 3,6,8,9 1,2,4,5,7,11,14 10, 12, 15

Cluster-I: Cluster-2: Cluster-3: Cluster-4:

13, 14, 15 11 1,2,3,5,6,8,9,10,12 4

Cluster-I: Cluster-2: Cluster-3: Cluster-4:

13 10,12,15 3,4,6,8,11 1,2,5,7,9,14

Cluster-I: 1,3,4,6,9,10,11,12,13,14,15 Cluster-2: 2,5,7,8 Cluster-I: Cluster-2: Cluster-3: Cluster-4:

1 2,4,6,7,9,10,11,12,14,15 3,5,8 13

Cluster-I: 1,2,3,4,5,6,7,10,13,15 Cluster-2: 8,9,11,12,14 Cluster-I: Cluster-2: Cluster-3: Cluster-4:

8,9,12,14 11 1,2,5,6,10,15 3,4,7,13

Based on the above cluster configuration and their intra and inter structural relationships, streamflow patterns were synthesized for the Thames River at Thamesville. The observed and synthesized Markovian transition probabilities for various clusters are summarized in Table 5. In this table, the variation between the observed and synthesized Markovian structure is less than 5 % . In other words, the Markovian structure is preserved in the synthesized streamflow patterns. A few sample realizations of synthesized streamflow patterns are exhibited in Figure 5. The variations in these realizations indicate the flexibility of the proposed procedure in synthesizing the extreme as well as the normal streamflow characteristics. The results of the forecast model are given in Figure 6. The forecast sequence at three sequential time stages from April, May and June 1966 are made on the assumption that these data points are not known. The forecast model needs further improvements.

T. KOJIRI ET AL.

374

120

Ca) 100

:@" CO)

80

<

.s (I)

01

60

la

.s::. 0

.!!!

0

40

20

0

"'I" ............... ..

0

2 3 4 5 Elements of Pattern Vectors

120

. ., l \

(b)

" !\

!

100

~

. .

:§: CO)

.s.

.

~

\

.. !

80

, \

~

f

. ,

~

: :

60

" \

!

\

.'

'

2" :

40

',' '0'

t·"

: 1"

I •

'"

'"

3 ."......... ...... ,

20

.;

..

"

'-

\ \ , \ 1

\

'" •••••

0

0

.\

\

:

.c. 0

is

~

!

(

CD

6

~"'"

I

\ . - - .. - - - - - .... -

"""

/

................... J-#"

...... -...... -~-:.-:.:--:.::---::-:.--..-.-.--.............. ..

3 4 5 2 Bleaents of Pattern Vectors

6

Figure 4. Representative Reference Vectors for (a) Dry Season [Oct. to March] and (b) Wet Season [April to September].

CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS

375

Table 4. Summary of Historical and Simulated Markovian Probability of Occurrence of Various Clusters in a Season Case (a): Probability of Occurrence: Dry Season to Wet Season /"

Clusters of Wet Season /"

1

2

3

4

~

1

0.0 (0.0)

0.0 (0.0)

1.0 (1.0)

0.0 (0.0)

.... tI:l

2

0.496 (0.5)

0.0 (0.0)

0.26 (0.25)

0.244 (0.25)

0.135 (0.143)

0.0 (0.0)

0.602 (0.571)

0.263 (0.286)

0.0 (0.0) 0.330 (0.333) 0.316 (0.333) 4 HistOrIcal values of probalJihties are given in parenthesIs.

0.354 (0.334)

.... c:

o

~ ~

~~ 3

Uel ~ote:

Case (b): Probability of Occurrence: Wet Season to Dry Season Clusters of Dry Season

/'

/'

1

2

3

4

.... §

1

0.0 (0.0)

0.660 (0.667)

0.340 (0.333)

0.0 (0.0)

2t1:l

~ ~

2

0.0 (0.0)

0.0 (0.0)

0.0 (0.0)

1.0 (1.0)

='

3

0.134 (0.143)

0.144 (0.143)

0.563 (0.571)

0.159 (0.143)

0.259 (0.250) 0.502 (0.500) 4 0.0 (0.0) Historical values of probabliities are given in parenthesis.

0.239 (0.250)

o

rIl

rIl

....

0

O~ -lote:

Table 5. Summary of Observed and Simulated Probability of Occurrence of Clusters of Wet Season given that the Occurrence of Cluster-3 of Dry Season Cluster Number

Observed Probability

Simulated Probability

Percent Error

1

0.143

0.136

4.8

2

0.143

0.138

3.5

3

0.571

0.577

1.1

4

0.143

0.149

4.2

...

Note. The values of Simulated probablhtles 10 thiS table are based on 1000 synthesized realizations of streamflows.

T. KOJIRI ET AL.

376

120 100 '0

80

Cli

(

S

60

Q)

Cl

"-

co

.J::. 0

UI

40

is

20

'\ ..', " ,,, '

~

0

-6

3rd Cluster

2nd Cluster

-4

-2

--

Observed

0

2

4

6

8

10 12 14 16 18 20 22 24 Simulated

lime in Monltls

Figure 5. A Sample of Synthesized Realizations of Streamflows.

50 45

~ (') (

S Q)

~

.J::. 0

UI

is

U:GENO

40

-+--

35

-)«

30

~~

1=5

..·., · ··· ,,,, ··· ...,. · ·· .,.,

Predlellon TIme

,,

1=6

-~-

---7

25 20

,

............. ,

,

15

,, ,

,

10

,, ~

0

2

://

\"

,, ,

,,

5

0

Observed

1=.4

~'

---. '. -

3 Season 2

4

5

6

7

8

TIme in Monltls

Figure 6. A Sample of Forecasted Streamflow Patterns.

9 Season 1

10

11

CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS

377

CONCLUSIONS Several objective functions are proposed to improve upon the existing pattern recognition (PRS) system for streamflow pattern analysis and synthesis. Specifically, three objective functions considering the properties of shape, peak, and gradient of streamflow pattern vectors are proposed. Similar objective functions can be formulated to consider other specific properties of streamflow patterns. AlC, intra and inter distance criteria are reasonable to arrive at optimal number of clusters for a set of streamflow patterns. The random initialization technique for the K-mean algorithm appears superior, especially when one can reduce 20 times the initialization condition runs to arrive at an optimal structure of clusters. The streamflow synthesis model is adequate in preserving the essential properties of historical streamflows. However, additional experiments are needed to further examine the utility of the proposed synthesis model. REFERENCES Ismail, M.A. and M.S. Kamel (1986) Multidimensional Data Clustering Using Hybrid Search Strategies. Unpublished report, Systems Design Engineering, Univ. of Waterloo. Kojiri, T., Ikebuchi, S., and T. Hori (1988) "Real-Time Operation of Dam Reservoir by Using fuzzy Inference Theory". A paper presented at the Sixth APD/IAHR Conf. held at Kyoto, July 20-22, 1988. Panu, U.S., Unny, T.E. and Ragade, R.K. (1978) "A Feature Prediction Model in Synthetic hydrology Based on Concepts of Pattern Recognition". Water Resources Research, Vol. 14, No.2, pp. 335-344. Panu, U.S., and Unny, T.E. (1980a) "Stochastic Synthesis of Hydrologic Data based on Concepts of Pattern Recognition: I: General methodology of the Approach". Journal of Hydrology, VoL, 46, pp. 5-34. Panu, U.S., and Unny, T.E. (1980b) "Stochastic Synthesis of Hydrologic Data based on Concepts of Pattern Recognition: I: Application to Natural Watersheds". Journal of Hydrology, VoL, 46, pp. 197-217. Suzuki, E. (1973) Statistics in Meteorology, Modern Meteorology No.5, Chijin-syokan Co., 4th Edition, pp. 254-261, [in Japanese]. Unny, T.E., Panu, U.S., MacInnes, C.D. and Wong, A.K.C. (1981) "Pattern Analysis and Synthesis of Time-dependent Hydrologic Data". Advances in Hydroscience, Vol. 12, Academic Press, pp. 222-244.

T. KOJIRI ET AL.

378

APPENDIX Classification procedure

By using the K-mea.ns algorithm, the reference vectors at K cluster centres are obtained as follows: (i)

define K initial cluster centres, in other words, the tentative reference vectors. The arbitrary pattern vectors are available for these centres. The following matrices are defined to explain the procedure. z(j,1) z(j,2)

(12)

Z(j,u) =

z(j,6) x(i,1) x(i,2)

(13)

X(i,t) = x(i,6)

Where, Z(j,u) is the jth reference vector at uth iterative step in the K clusters, and X(i) is the pattern vector consisting of data point x(i,t), t=I,2, ... ,6. (ii)

at uth iterative step, if OFa[X(i) ,Z(j,u)]


,Z(n,u)]

for n=1, 2, •. ,Kandn~ j.

(14)

then the pattern vector, X(i) belongs to the cluster, j. (iii) calculate the maximum distance of the cluster, j as follows. DK(j, u)

= :(1) [OFa (X(i)

, Z(j, u)]

(15)

The new centre at the cluster, j is decided among the pattern vectors belonging to the cluster, j as follows. Z(j, u+1)

=1/N(j)

N(j)

E x(i) i=l

(16)

CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS

379

Where, NO) is the number of the pattern vectors in the rearranged cluster, j.

= Z(j,u), the iteration stops.

(iv)

if Z(j, u+ 1) iteratively.

(v)

calculate the maximum intra distance, DK(K) and the distance between the centres i.e., the inter distance, EK(K) according to DK(K) at K clusters. EK(K)

=. ~i~""/[OFa(ZK(jIU)IZK(jl,u»] J , ] IJ""J

DK(K) =

(vi)

Otherwise, go back to the step (ii),

mr [

DK(j 1 u) ]

(17)

(18)

go back to the step (i) with the next number of clusters, (K + 1). Otherwise, the iterations are completed.

Upon completion of the above procedure, the optimum number of clusters should be decided using criteria as described below. [1] The first criterion is to give the thresholds to the objective function and choose the minimum number of clusters. For example, (i)

the intra distance, DK(K) which is similar to the objective function at cluster centres, is less than 3.

(ii)

the inter distance, EK(K) is greater than 1.

(iii)

the number of clusters is less than half the total number of the pattern vectors.

The inter distance is calculated as the mean values of distances for same combinations of the reference vectors, because any reference vector is possible to be the centre of the objective function [Equation (10)]. [2] The second criterion is to decide the optimum number of clusters through the multioptimization technique. The objective function is formulated as the vector as follows.

--min]

[DK(K) EK(K) "max

(19)

When the restriction of EK(K) is relaxed in the optimization and the value of EK(K) is calculated according to the value of DK(K) in the same cluster number K, each combination [DK(K), EK(K)] is plotted in Figure 3.

T. KOJIRI ET AL.

380

Considering the arc AB as the pareto optimum, (also called as transformation curve (TC», one can convert this curve on the equivalent coordinates. The arc AB represents one part of TC, which is defined by the limited number of clusters. Coordinate of EK(K) is multiplied by the rate II BC II1II AC II, because the points A and B have the same weight and the same difference from the goal, which has been found to be the most desirable position [Kojiri et al (1988)]. The multiplication factor 'Y is calculated as follows. y="/«e2-eO)2_ (el-eo)2)/«dl-dO)2- (d2-dO) 2) subject

to e2> e1 and d1 >d2, or e2 <e1 and d1
(20)

(21)

Where, (eO, el, e2) and (dO, dl, d2) are the elements of EK(K) and DK(K) coordinates of the points A, B and C. On an equivalent coordinates, the shortest point from the goal becomes the optimum. The optimum number of clusters is decided by the corresponding cluster number. The element dO is taken as the minimum value among all the probable intra distances or the value zero such that all pattern vectors are same. The element eO is taken as the maximum value among the probable inter distances or the intra distance in the case of cluster number 1, where all pattern vectors are in the significant range of reference vectors. If the necessary conditions are not satisfied, the indifference curve (IC) is drawn parallel to the line which passes through points A and B. [3] The third criterion is to estimate the distribution of the pattern vectors using the Akaike Information Criterion (AlC). Assuming that the K-means algorithm gives the optimum value of the objective function for pattern vectors distributed normally around the centre, the maximum log-likelihood of the cluster, j is represented as follows.

I:

N(j)

w(j) =

[(log21t/4) (OFa(x(i),Z(j,u»2]

(22)

~

As the whole information is given by summarizing W(j) against j, the optimum number of the clusters is decided as the number which denotes the minimum value of the following equation among all the clusters. K

AIC= ~ w(j) + 2K

- min

(23)

Where, the second term means the number of the parameters treated in the objective function. In the multi-optimization and the AIC, it is not necessary to consider the constraints to arrive at the optimum solution. However, the K-means algorithm has the flexibility according to the observed pattern vectors.

REMUS, SOFTWARE FOR MISSING DATA RECOVERY

PERRON, HI, BRUNEAU, p.2, BOBEE, B.I, PERREAULT, L.I I INRS-Eau, University of Quebec, Institut national de la recherche scientifique, Carrefour Molson, 2800, rue Einstein, Quebec (Quebec), Canada, G1X 4N8 2 Hydro-Quebec, Division Hydrologie, Place Dupuis (17 e etage), 855, Ste-Catherine east, Montreal (Quebec), Canada, H2L 4P5

INTRODUCTION In order to manage adequately their water resources, Hydro-Quebec often uses simulation of energy at different points of the hydrological network. This simulation is based on monthly means computed from daily observed flows. However, it is possible that some daily values are missing. Hydro-Quebec will then reject the calculated monthly mean flow when four or more daily observations are not available or if they seem incorrect. Furthermore, the monthly means may be rejected for many consecutive months at certain sites. Also, when a basin is large, more than one station is needed to obtain a good estimation of flows at sites of reservoirs. As very few stations have complete series on a long period, it is therefore very important to be able to estimate these missing values in order to obtain reliable estimates from the energy production simulation models. The missing values for a given site are estimated by multiple regression using data from other sites. Until recently, Hydro-Quebec used the software REMUL, which is an adaptation ofHEC-4, developped by the US Army Corps of Engineers (Beard, 1971). HEC-4 and REMUL suffer from many weaknesses mainly due to the theoretical hypotheses which must be verified prior to using the regression models. Some of these are: • • • •

No correction or alternative method is available to treat multicollinearity problems; No multivariate method is available to consider correlations between dependent variables; No procedure for model validation through residual analysis is available; It is assumed, without validation, that observations follow a log-normal distribution;

If theoretical assumptions are not fulfilled, data may be incorrectly recreated and the interpretation of the results may be fallacious. Therefore, in order to overcome those weaknesses, ReMuS has been developed at INRS-Eau, within a partnership project with Hydro-Quebec and NSERC. The first part of this paper shows how it is possible to make data recovery using multiple regression. In the second part, we discuss problems caused by multicollinearity and the solution proposed in ReMuS, that is the ridge regression. Also, we explain the procedure that gives an optimal value of the ridge parameter k. In the third part, we 381 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 381-393. © 1994 Kluwer Academic Publishers.

H. PERRON ET AL.

382

present the multivariate regression. A procedure that considers the relation between dependent variables. We show how the parameters are estimated and the data reconstituted. Finally, in the last part, we present the software ReMuS and all its characteristics.

THE MULTIPLE REGRESSION In order to reconstitute missing data, Hydro-Quebec uses multiple regression on data from neighboring stations. The main purposes of the regression are • •

To establish a functional relationship between a variable Y and a set of explanatory variables X}, X2,··, Xn. To predict the unknown variable Y from known values of X]> X 2, .. , Xp

The estimation of parameters and the reconstitution of data (prediction of Y) are described in sections 1.1 and 1.2.

Estimation of parameters In the multiple regression procedure used by Hydro-Quebec for extending data series, the relation between the dependent variable Yand the explanatory variables Xl, X2, .. , Xp is linear, i.e. given by the following expression: (l.) The parameters, Pi, must be estimated from observed historical data series. This is done by the least square (LS) method, which consists in minimizing the squared residuals ei :

Ifwe let:

y = nxl

n Y2

;n

X

nx(p+l)

[11

X 21

:

:

x"

= ..

1 x nl

x\2 X 22

:

x n2

I n

... ... xx" 2p

b

=

(p+l)xl

xnp

PI :

and

(2.)

e

nxl

=

m

(3.)

Pp

it is well-known that the LS-estimators of Pi are given by :

b = (X'Xr 1 X'y

(4.)

Prior to making the data reconstitution, it is necessary to examine the residuals to verify that basic assumptions are fulfilled, especially that :

REMUS, SOFfWARE FOR MISSING DATA RECOVERY

• • •

383

residuals are normally distributed; residuals are independent; the variance of the residuals is constant.

If the analysis of residuals indicates that the underlying assumptions are violated, it is usually possible to correct the problem by an appropriate transformation of the variables. Two common situations where a transformation is needed are when : 1- the relation between the dependent variable and the explanatory variables is not linear. In this case, one may transform the explanatory variables to linearize the relation; 2- the residuals are not normally distributed and/or their variance is not a constant. One may apply a transformation (for instance Box-Cox, 1964) to the dependent variable Reconstitution of data

Not only the prediction of the dependent variable is important for Hydro-Quebec, but also extensions of data series, which preserve the mean and the variance of the observed series. To this end, one computes first the value predicted by the model: (5.) It can be shown that the mean of the estimated value ofY is equal to that of the observed series. However, the variance is not reproduced by this estimator. Fiering (1923) proposed to add the random term given by :

(6.) where ui is a standard normal variate and:

=L (Yi - yy . n

SeE

(7.)

;=1

The data obtained by the equation : (8.) preserve the mean and the variance of the n observations Yj. Y2• ...• Yn used in the regression.

384

H. PERRON ET AL.

RIDGE REGRESSION Ridge regression (Hoerl and Kennard, 1970a) is a technique used to circumvent problems caused by multicollinearity. The estimators obtained by ridge regression are biased, but more stable than those determined by ordinary regression when multicollinearity is present. In that case, bias is not necessarily a disadvantage.

Estimation of parameters The problems of inverting the matrix X'X in the classical regression model (eqn I) when multicollinearity is present can be transposed to the matrix rxx constructed from the standardized variables XI,.Xn. Ridge regression estimators are obtained by introducing a constant k?O in the normal equations of the LS procedure:

(9.) where b R = (b~, b:, ... , b:) is the vector of standardized parameters estimated by ridge regression, Ip is the p x p identity matrix, and ry.x the vector YX obtained from the standardized variabels Xj, .. X.IJ. and Y. A constant k IS thus added to each element on the diagonal in the matrix rXX' This facilitates the inversion of the matrix. The solution of the system of equations depends now on the constant k. (10.) The value of the k is related to the bias of the estimators. If k=O, equation (10) is equivalent to (4), and the ridge estimators correspond to those obtained by ordinary LS. When k>O, the estimators are biased, but more stable than the LS-estimators. As for an ordinary regression, the analysis of residuals can be done, the data can be transformed if necessary, and finally missing data can be reconstituted by ridge regression following the same procedure as in the case of ordinary LS analysis.

Determination of k It can be shown (Hoerl and Kennard, 1970a) that increasing the value of k leads to increased bias of b R , but its variance decreases. In fact, it is always possible to find a value of k such that the ridge estimators have smaller mean square error than the ordinary LS estimators. However, the choice of the optimal value of k is difficult. A commonly used method for determining k is based on a graphical inspection of traces of the estimates of the p parameters as function of k. Figure 1 is a typical example of ridge traces for a model with three independent variables. In general, the values of the estimated parameters can fluctuate considerably when k is close to zero, and can even change sign. However, as k increases, the values of the estimated parameters stabilize. In practice, one examines the ridge traces and chooses graphically the smallest value of k in the zone where all traces show reasonable stability (Hoerl and Kenard, 1970b). However, Vinod (1976) showed that this procedure may lead to an overestimation of k, and devised an alternative method by which k is estimated automatically. This procedure uses the index ISRM defined by :

REMUS, SOFfW ARE FOR MISSING DATA RECOVERY

ISRM =

385

L [S;A -k ) P

;=1

2

L~ j=1 Aj

2

-1 1

(11.)

+k

where AI' A2 , ... , A p are the eigenvalues of the matrix r.\X. The index is zero if the explanatory variables are uncorrelated. Vinod (1976) suggests to use the value of k which corresponds to the smallest value of the index ISRM.

b

o

k Figure 3.1. Example of a trace for a model with 3 variables.

MULTIVARIA TE REGRESSION If one is interested in reconstituting monthly means at q neighboring sites (several dependent variables), then one could perform q independent regression analyses. Proceeding as in section 1, one could thus obtain q values which preserve the mean and the variance. However, this so-called parallel approach does not reproduce the correlation that may exist between the q sites. This can lead to important errors and to loss of information if the results are used for decision making in a regional context (Bernier, 1971). To avoid these kind of problems and to extract as much information as possible from observed data, the reconstituted data should reflect the relationship among them. This can be done using a multidimensional model which considers the q sites simultaneously. The multidimensional regression technique is implemented in ReMuS, allowing for conservation of the structural correlation when data are reconstituted at several sites in a region. However, the method has two major constraints which may limit its practical applicability: • •

The q variables must be function of the same set of explanatory variables. Only concomitant values of the q dependent variables can be used.

H. PERRON ET AL.

386

The multivariate regression model Let ~, li, ... , ~ be a set of q dependent variables, and XI' X 2 , ••• , X p be p explanatory variables. Assu~e that we have n .correspon~ing measurements of J:'li' Y2i' ... , Yqi and Xli' X 2i ' ... , X pi , 1 =1, 2, ... , n (for mstance, discharges measured dunng n years at p+q sites). Moreover, the values of the explanatory variables are assumed known exactly. Hence, using matrix notation, the multidimensional regression model can be written in the following form:

y=

qxn

b

X

qx(p+l)(p+l)xn

+ qxn e

(12.)

where

Y

qxn

YI2

=[Y" Y:I

YI3

Y22

Y23

Yql

Y q2

Y'"]

Y2n

:

:

Y q3

=[ (p+l)xn

XII 1

=

qx(p+l)

[P,"

/320

qxn

:

x p2

Xp3

/311

/312

/321

/322

[."8:

=

8 21

1

/3ql

813

...

8 22

8 23

...

8 q3

•••. Y n]

XI

x2

...

bl

'"

xn ]

bpJ

/3qp

8 12

8 q2

-

P" ] /3~p = [b o

/3q2

:

Y2

xpn

: /3qo

e

.:.]-[

xl3

: x pl

b

XI2

YI

Yqn

1

X

_ [

-

&,.]

8 2n

:

_ [

- el

e2

...

en]

8 qn

The matrix Y contains n column vectors YI' Y2' ... , Yn whose q elements correspond to measurements of the q dependent variables for a given period. The X matrix contains n column vectors xl> x2 ' ... , xn with p+ 1 elements. The first element of each of these vectors is equal to one, whereas the others correspond to the p explanatory variables for a given period. The b matrix contains the column vectors /30' /31 , ... , /3P' The first vector corrresponds to the intercept and each of the following vectors corresponds to a given

387

REMUS, SOFTWARE FOR MISSING DATA RECOVERY

explanatory variable (f3ij' j -::1= 0, is the parameter of the explanatory variable Xj for the dependent variable Yi). Finally, the e matrix contains the n vectors &1' &2' ... , &n of error terms. We assume that each of the residual vectors is multi dimensionally normal distributed with zero mean and covariance matrix I:

e; ""

N( 0, s), qxl

q>
Vi

therefore, there is, for this model, a non-zero correlation between error terms corresponding to different dependent variables.

Estimation of parameters In order to estimate the parameter matrix b, i.e. to determine the matrix B

flu

flIP

fl20

flll fl21

fl22

fl2p

flqo

flql

flq2

flqp

flJO B

qx(p+l) =

A

-- [b 0

bl

... b p]

(13.)

the LS method is once again invoked. Srivastava and Carter (1983) show that the estimator ofB is given by

B=[b o b l

bp ]

=)T)(,(){){,)-I

(14.)

The estimators /3; of flij are jointly distributed according to a multidimensional normal distribution with ~ean and covariance matrix given by and

(15.)

where a;- and ail are elements of the matrix (xx't l ,and CTu and CTij are elements ofthe matrix 1:. Note that the parameter estimators are unbiased and correlated, reflecting the correlation that exists between the dependent variables. If q independent multiple regressions had been made, this correlation would have been zero, that is, for the two variables 1; and ~, we would have:

Cov{Pij

,P.J

= 0, j = 0, 1, ... , p

(16.)

Once the parameters for each dependent variable have been obtained, their significance should be tested. The software ReMuS permits one to test, for each explanatory variable, if the corresponding parameter vector is equal to the null vector. For a given variable X k , the following hypotheses are tested:

H. PERRON ET AL.

388

=P2k =.. , =Pqk = 0

Ho :Plk

HI: at least one Pik

against

:1;

0

The test is based on the statistic F;, :

(n-p-q-l) (I-U) k F.k -q Uk

(17.)

where:

IVkl IXI is the determinant of the matrix X; -IVk+Wkl' =IqY[In - X'( XX' t X ]Y'I In is the identity matrix of dimension n;



U -



Vk

k

q,

• Wk =IqBCk[C~(xx'tckxr C~B'Iq,

C k is a column vector of dimension p+ 1 having the kth element equal to 1 and 0 elsewhere.

For a given significance level, a, the null hypothesis is : • •

accepted if Fk ~ ~.n-p-q+1 (1- a) rejected if F;, > ~.n-p-q+1 (1- a)

where ~n_p-q+I(I- a) is the (I-a)-quantile in an F distribution with n-p-q+1 degrees of freedom.'A comprehensive review of this test is given by Srivastava and Carter (1983). In addition to the F;, (k = 0, 1, ... , p) statistics, ReMuS gives the exceedance probability associated with those values (p-values). The test permits one to examine the importance of a given explanatory variable, X k • Acceptance of the null hypothesis implies that there is no relation (at an a-level) between the variable X k and the q dependent variables. There is thus no reason to use that variable in the model. On the other hand, if the hypothesis is rejected, there is a relation between X k and at least one of the dependent variables and X k should be used in the model. Reconstitution of data In a given period, for example a year, where the dependent variables are missing, we want to reconstitute the vector YI = (YII' Y21, ... , Yql ) from the observed vector XI =(XII' X21' ... , Xpl) by means of multidimensional regression in a way that preserves the mean, the variance, and the correlation structure observed in the series of dependent variables. The same principle as in the case of multiple regression is used here. In fact, we first compute the prediction, YI , and then add a random vector drawn from a multidimensional normal distribution to obtain the reconstituted data vector, YI' More precisely, we have the following expression :

389

REMUS, SOFrWARE FOR MISSING DATA RECOVERY

f310 f31l

f312

f3IP

y = B X +d = f320 f321 f322 qx~ qx(p+l)(p+l)xl qx~

f32P

[X'']

f3qp

X pl

f3qo f3 q l

f3 q 2

X2/

+

Y21 0"]··· - [~,,] .. .

021

Oql

(18.)

Yql

where the vector 0 1 has a multidimensional normal distribution with zero mean and covariance matrix 1:6 defined by L" = -1-Y[In - X'(XX't qxq ll-P

X]y,

(19.)

In ReMuS, the vector 01 is generated using a technique described in detail in Devroye (1986, pp. 563-566), and in Law and Kelton (1982, p. 262). The introduction of the random vector (used among others by Bernier, 1971) ensures that the mean, the variance, and the correlation of dependent variables are preserved. This, of course, is only true if the basic assumptions are valid. In fact, the remarks on the validity of the assumptions made in the section on multiple regression also apply to the multidimensional case. THE SOFTWARE REMUS

The software ReMuS was developed in order to overcome some of the deficiencies in REMUL, which was previously used by Hydro-Quebec. The following improvements have been introduced in the new software:

- Ridge regression. In order to cope with possible multicollinearity between the independent variables, ReMuS allows the use of the ridge regression (Hoer! and Kennard, 1970 a). This procedure permits the user to study the value of the parameters as a function (trace) of a positive constant k, which is added to the diagonal of the correlation matrix. The user may toggle quickly between the choice of the independent variables and the graph of the traces. A theoretical help feature is added to guide the user in the choice of variables. - Automatic optimal value of "k". If the user wishes so, ReMuS suggests an optimal value for k (in ridge regression), depending on the chosen variables. - Mutil'ariate regression. When generating the missing values, a conventional multiple regression does not take into account the correlation between the dependent variables. This is why we have included multivariate regresssion in this software. This type of regression will simultaneously find p different models corresponding to p dependent variables. Random numbers generated from a multvariate normal distribution are subsequently added to the predictions to preserve the correlation between the p dependent variables. - Testing the hypotheses. One of the important weaknesses ofREMUL and HEC-4, is that they do not check the following hypotheses:

390

H. PERRON ET AL.

• residuals follow a normal distribution; • residuals are independent random variables; • residuals possess constant variance.

ReMuS includes many graphical tests allowing the user to examine those hypotheses. Transformations. Residuals may be normalized by a Box-Cox transformation of the dependent variable. ReMuS also gives the user a choice of transformations (the most widely used in practice) of the independent variables in order to obtain a linear relationship with the dependent variable. Moreover, ReMuS is a user-friendly software that provides many other tools to help the user in the modeling phase: • • • •

correlation matrix; the Y vs X graphic; the graphic of concomitances; on-line theoretical and technical help.

Multiple regression in ReMuS As implemented in ReMuS, mUltiple regression permits the user to determine the appropriate model for reconstituting the variable Y. ReMuS provides two different methods for choosing independent variables:

• •

manual, where the explanatory variables are chosen by the user; automatic, where the explanatory variables are introduced by stepwise regression.

After the estimation of a given model, its adequateness can be graphically visualized. The output of the regression is : • • • •

the value of the parameters Pi; test of significance of the parameters; analysis of variance; regression model.

It is also possible to perform an analysis of residuals.

Ridge regression in ReMuS In order to cope with possible multicollinearity between the independent variables, ridge regression (Hoer! and Kennard, 1970a) has been introduced in ReMuS. Its application in ReMuS is similar to that of manual multiple regression. The user can at any time visualize the ridge traces on the screen. This will help him make the appropriate choice of independent variables and of the ridge parameter, k. The user can, with a single touch, switch between the menu in which the choice of independent variables is made and the ridge traces.

REMUS, SOFfWARE FOR MISSING DATA RECOVERY

391

ReMuS contains a procedure which permits the user to automatically select the optimal choice of the constant k as a function of the chosen independent variables. Having chosen the explanatory variables and the constant k, the user can invoke the model computation procedure, examine the results graphically, and analyse the residuals just as in the case of multiple regression. If k is set equal to zero, one obtains the same result as with mUltiple regression. Analysis of residuals In multiple (or ridge) regression, it is important to verifY the basic assumptions concerning the residuals: • • •

residuals are normally distributed; residuals are independent random variables; residuals have constant variance.

ReMuS produces graphics of : • • •

residuals on probability paper (normal); residuals as functions of predicted variables; residuals as function of time.

It can also show the residuals as function of an independent variable which permits the user to identifY the variables that convey information to the model.

Reconstitution of data The reconstitution of data is based on known explanatory variables. Equation (8) permits one to preserve the mean and the variance of the observed data series. However, the user can choose to omit the random term, 0;. In this case, the reconstituted data are those obtained directly from the regression. ReMuS produces the following output, which can be used to verify the quality of the reconstitution: • • •

mean of the original data and of the reconstituted data; variance of the original data and of the reconstituted data; coefficient of variation of the original data and of the reconstituted data.

We have included the Wilcoxon's (1945) test in ReMuS which can be used to verify if the means of observed and predicted data are significatively different. Likewise, Levenne's (1960) test for equal variance in two data sets is available in ReMuS.

Multivariate regression in ReMuS We have introduced a multidimensional regression procedure in ReMuS in order to treat the case where correlation exists between several dependent variables. This Kind of regression permits to estimate several models simultaneously and to preserve the correlation structure, when data at several sites are reconstituted. The choice of variables is done as in the other types of regression (manually), with the exception that here the user has the option to introduce several dependent variables at the same time.

392

H. PERRON ET AL.

The output produced by ReMuS consists of the parameters corresponding to a given independent variable and given explanatory variable. We have introduced a test to verifY that at least one of the parameters corresponding to a given explanatory variable is significantly different from zero.

Some other characteristics of ReMuS

ReMuS contains several tools which can help the user in his choice of variables, models, etc.:

graphic of Y versus X, that is the relation between the dependent variable and each of the explanatory variables graphic of concomitance, which may help to choose the explanatory variables correlation matrix Box-Cox transformation, which permits the user to normalize the residuals other classical transformations : • 1/ X •



11 X 2

10gX

• .JX 2

• X which permits the user to linearize the relation between the explanatory variables and the dependent variables technical as well as theoretical on-line help; can be used for monthly, annual, weekly (or other time period) means and for other type of data (ex. flood flows vs rain, basin area, etc ... ).

A note on the implementation The software ReMuS has been implemented in PASCAL for use in a DOS environment. Advanced programming techniques (object oriented, graphical librairies, etc ... ) has permitted to make it highly user-friendly.

CONCLUSIONS As discussed in the introduction, HEC-4 and REMUL are based on hypotheses witch are not always valid. This may result in incorrect data reconstitution. The software ReMuS, which includes basic functions similar to those in HEC-4 and REMUL, has been developed in order to cope with the following problems : • • • • •

multicolinearity between explanatory variables (corrected with Ridge regression); correlation between dependant variables (corrected with multivariate regression); model validation (corrected with residual analysis and graphics); assumption that the observations follow log-normal distribution (corrected with the possibility to use several transformation methods); difficult to visualize relation between variables (corrected with graphics).

These additions make ReMuS a powerful tool to reconstitute missing data and to extend short data series in hydrology as well as in many other domains.

REMUS, SOFfWARE FOR MISSING DATA RECOVERY

393

REFERENCES Beard, L.R (1971). HEC-4 Monthly Streamflow Simulation. The Hydrologic Engineering Center, Corps of Engineers, US Army, Davis, California 95616. Bernier, J. (1971). Modeles probabilistes a variables hydrologiques mUltiples et hydrologie synthetique. International Symposium on Mathematical Models in Hydrology, Warsaw. _ Box, G.E.P., Cox, D.R (1964). An analysis of transformations. Journal of the Royal

Statistical Society, Ser. B, 211-252.

Devroye, L. (1986). Non-uniform random variate generation. Springer-Verlag, New York. Feiring, M.B. (1963). Use of Correlation to Improve Estimates of the Mean and Variance. United States Geological Survey, Professionnal Paper 434C. Hoerl, A.E. and Kennard, RW. (1970a). Ridge regression: Biased estimate for non orthogonal problems. Technometrics, 12, 55-67. Hoerl, A.E. and Kennard, RW. (1970b). Ridge regression: Applications to non orthogonal problems. Technometrics, 12, 69-82. Law, A.M. et Kelton, W. (1982). Simulation Modeling and Analysis. McGraw-Hill, Inc. Levene, H. (1960). Robust tests for the equality of variances, in Contributions to Probability and Statistics, ed. I. Olkin. Palo Alto, Stanford University Press, 278-292. Srivastava, M.S. and Carter, E.M. (1983). An Introduction to Applied Multivariate Statistics. North-Holland, New York. Vinod, H. D. (1976). Application of new ridge regression methods to a study of Bell system scale economies. Journal of the American Statistical Association, 71, 835-841. Wilcoxon, F. (1945). Individual comparison by ranking methods. Biometries, 1, 8083.

SEASONAliTY OF FLOWS AND ITS EFFECf ON RESERVOIR SIZE

R.M. PHATARFOD 1 and R. SRIKANTHAN2 lDepartment of Mathematics, Monash University Oayton, Victoria, Australia 3168 2Hydrology Branch, Bureau of Meteorology Melbourne, Victoria, Australia 3000 In this paper we consider the effect of seasonality of streamflows on the reservoir size for within-year systems. We give two measures to quantify the seasonality evidenced by the mean seasonal flows, and show how these enable us to group the various variations in seasonal means into a small number of groups. The l?robability of emptiness of the reservoir for different reservoir sizes IS determined analytically by using the technique of the bottomless dam, for a range of Cv's and two draft ratios. Regression equations linking probability of emptiness and the two measures of seasonality are derived. 1.

INTRODUCTION

Ever since Rippl's (1883) pioneering work on the determination of the size of a reservoir able to meet a constant demand of water from it, a substantial amount of literature has ~rown on this problem, referred to as the Reservoir size-yield-reliability relationship. Of course, Rippl's method - The Mass Curve Method - did not consider reliability of supply. The implicit but erroneous assumption was that with the determmed reservoir size, supply was guaranteed all the time, i.e. the reliability would be a hundred percent. Later on, however, it was realized that statements about supply would be necessarily probabilistic, and reliability of supply was explicitly involved, say e.g. in the works of Hazen (1914) and Sudler (1927). Considerable amount of l?rogress in this area was made by the Russian enBineers Savarenskiy (1940) and Kritskiy and Menkel (1940), Moran (1959) and his followers, and through the idea of synthetic hydrology - Fiering (1967). Currently, storage size-yield-reliability relationships take many forms. At one end of the spectrum is the determination of the reliability of supply for a specific reservoir situation, with specific streamflow characteristics and specific demand values. Typically, this involves simulation of the water supply system behaviour using either the historical record or synthetic streamflow traces. These methods are extremely flexible, and are able to take into account factors such as evaporation, seepage, sedimentation, as well as draft varying from month to month. However, they do not 395 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 395-407. © 1994 Kluwer Academic Publishers.

396

R. M. PHATARFOD AND R. SRIKANTHAN

impart much knowledge about overall reservoir system behaviour. At the other end are results - normally analytical and employing probability results pertaining to cumulative sums of random variables - which are of general applicability but do not take into account the specific characterIstics of the situation in hand; these methods, generally referred to as "back-of-the-envelope" methods, are useful for a preliminary estimation of reservoir size as well as for giving an overall view of the behaviour of the system. One such result is (see Gould (1964)), S P -

t 2 C2

P v 4 (I-a)

(1)

where Sp is the storage size in terms of the mean annual flow J.I., a is the annual yield as a fraction of the mean annual flow (also called the draft ratio), Cv is the coefficient of variation of the annual flows, and tp is the pth percentile of a standard normal as the steady-state probability of the reservoir variable with P being empty, i.e. the probability of failure of being able to deliver supply (giving (1 - P)100 as the percentage reliability of the system). Another result of the type is (see Phatarfod (1986)), S(p)

= ¥-;

S(O)

(2)

which gives the inflation factor required when the annual flows have a serial correlation p , as compared to when they are independent. The above two results are valid for over-year systems where the effect of seasonality of flows is damped out. There are some situations, however, when we are interested only in a preliminary estimate but simple and elegant expressions such as (1) and (2) are not available. In such situations, one takes recourse to charts or tables, quantitatively relating the various quantities of interest. A prime example of such results is the charts of Svanidze (see Kartvelshvili (1969») giving the reservoir size against the coefficient of variation of annual flows, for different draft ratios, reliability and annual serial correlation coefficients, for flows having the general three-parameter gamma distribution - the Kritskiy and Menkel distribution. In this paper we consider the effect of seasonality of flows on the reservoir SIZe for within-year systems, and for reasons given below, our results belong to the middle category just mentioned, i.e. are of fairly general aPflicability yet not simple enough to be expressed in analytical forms 0 the type (1) and (2). It is known, of course

that a reservoir with flows which have a seasonal variation would need to be larger than one without seasonal variation for the same reliability, yield and annual flow

SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE

397

characteristics such as the mean and the coefficient of variation. However, to the authors' knowledge there hasn't been any study which quantitatively relates reservoir size to seasonality of flows. It is not difficult to see why this should be so. First, methods which have been successful in deriving relation of the type (1) and (2) cannot be applied here. The reservoir content process is a random walk process between two barriers - the empty reservoir and the full reservoir - and for over-year systems, these barriers are fairly wide apart, so that the random walk process is effectively a cumulative sum process, and thus results such as the Central Limit Theorem are applicable. The tp in (1) and the factor (l+p)/(I-p) in (2) are, indeed, due to this effect. Such is not the case for within-year systems where these .barriers are rather close together. Secondly, it is difficult to quantify seasonality since a measure chosen would depend not only on the individual seasonal values but also on their order. For fivers with seasonal flows, the seasonality is reflected not only in their means, but also in their variances, skewnesses and It is obvious that the means would have the serial correlations. greatest effect, and in this paper we shall concentrate only on the means. We simplify the treatment somewhat by considering seasonal, i.e. three-monthly flows rather than monthly ones. We consider two measures to quantify seasonality. This is done in section 2. Assuming, for simplicity, that the seasonal flows have the negative binomial distribution we calculate for draft ratios of 67% and 50% and Cv of the annual flows, the steady-state for various values of probabilities of emptiness of the reservoir by using the method of the bottomless dam - see Phatarfod (1980). This is done in section 4. We then fit regression equations for these probabilities against the two measures. The fits are extremely good. These regression equations thus form the defming relationships between probabilities of emptiness, reservoir size and the seasonality of flows. 2. MEASURES OF SEASONALITY One of the ways to quantify seasonality of mean flows would be to take the mean flow of the ith season (say a month), i = 1,2,3,...,12, as #Ii

=

n

+ A cos (1fi/6 + 9)

(3)

9 can be taken to be zero by suitable The value of the phase adjustment of the water-year. A then remains the sole measure of seasonality of flows. However, very few mean seasonal flows follow a sinusoidal pattern. We therefore take another approach in this paper. It should be stressed that the paper is of an exploratory nature giving an outline of a possible line of approach, and does not work out a full solution.

398

R. M. PHATARFOD AND R. SRIKANTHAN

First we take the number of seasons to be four rather than the customary twelve. Since the method of obtaining the probability of failure used in the paper requires that we take the constant seasonal draft as the unit of volume, (thus making the total annual draft equal to four units) we have the mean annual flows for the two cases of draft ratios o( 0.67 and 0.50 , to be 6 and 8 units respectively. Let· us consider in detail the case of the mean annual flow of 6 units. Let a,b,c,d be the mean seasonal flows of the four seasons, so that a+b+c+d = 6. Theoretically, a,b,c,d can assume any non-negative values, subject to the condition a+b+c+d = 6. However, to keep the number of cases to be considered fairly manageable we need to put some restrictions on the values of these quantities. Accordingly, we assume that a,b,c,d can be zero or a multiple of 1/2. In effect, we are assuming, for example, that if a seasonal mean is less than a 1/4 (in terms of mean annual flow equal to 6) we can take it to be zero. To avoid dealing with fractions, however, we shall assume, in the discussion dealing with seasOnality in this section, that the mean annual flow is 12 units. Thus calling _the four seasonal means A,B,c,n, the means form a sequence A,B,c,D such that A+B+C+O = 12 where now A,B,C,O can be zero or a positive integer. It can be shown that - see Feller (1967) - that the number of such possible sequencies is ISS = 455. At the two extremes are the cases 3,3,3,3 (the non-seasonal case) and 12,0,0,0 (the extremely seasonal case). They also include sequences such as 7,5,0,0 and 6,1,3,2 etc. Note, the actual sequences of seasonal means for the four cases above are (1.5,1.5,1.5,1.5), (6,0,0,0), (3.5,2.5,0,0) and (3,0.5,1.5,1) respectively. What is required is to reduce the number of these separate cases to manageable proportions, by grouping them into a smaller number of groups having common values of suitable measures of seasonality. First we use a device which effectively reduces the number of cases to about a quarter of the total number, 455, of cases. From the point of view of reliability, we need to consider only that season for which the probability of emptiness of the reservoir at the end of that season is the maximum, and since the method of derivation of probability of emptiness used in the paper, is geared to deriving this probability at tile end of the last season, we can effectively rearrange a sequence of seasonal means such that the last element corresponds to that season. For example, if in terms of a conventional water-year, the sequences of mean flows are 1,6,2,3 or 2,3,1,6 or 3,1,6,2, we shall, in fact, take our sequence as 6,2,3,1 the cyclic permutation of the three sequences. This device reduces the number of cases to 116. To devise measures relevant to the probability of emptiness at the end of the last season, let A' ,B' ,C' ,0' be the excess or deJ>letion from the mean of the seasonal means 3, i.e., let A' = A - 3 etc. Let u1 = A' , u2 = A' +B' , u3 = A' +B' +C' , u4 = A' +B' +C' +0' = O. Let M

399

SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE

and m be the maximum and the minimum respectively of the sequence u1,u2,u3u4. Then the Range R' = M-m, and T' = C' +0' are two measures which seem to be relevant to the present problem. For example, for the sequence 3,3,3,3, we have R' =T' =0 , whereas for the sequence 12,0,0,0, we have R' = 9, T' = -6. The values of R' ,T' for other sequences fall in the ranges (0,9) and (0,-6) respectively. We show that the 455 different- sequences fall into 28 groups, with sequences in each group having the same values of R' and T' . For reasons of limitation of space, we do not show all the sequences for all values of R' and T' . Table 1 shows the sequences for values of R' = 3,4,5 and T' = -2,-3,-4. For each sequence of seasonal means we work out the probability of emptiness of the reservoir for different reservoir sizes. This is done in the next section. Let us now consider the case- of draft ratio 50%, i.e. when mean annual flow is 8 units. One way to deal with this case is to follow the same procedure as above, i.e. to take seasonall~eans A,B,c,o such that A+B+C+D = 16. This entails considering <; = 969 separate cases. Instead, we adhere to the 455 cases considered before. Since the value 12, (A+B+C+D = 12) taken as the mean annual flow there actually means that the mean annual flow is 8, (a+b+c+d = 8) the actual seasonal means are two-thirds the values shown for any sequence A,B,C,D, i.e. a = 2A/3 , etc. This means that the sequence of (5,3,2,2) actually represents the sequence seasonal means shown as (10/3,2,4/3,4/3), giving the total 8. 3. INFLOW MODEL We assume distribution:

that

the

seasonal

flows

have

Pr[X = r] = -nCrpnqr r = 0,1,2,...; q = I-p

the

negative

binomial

(4)

with the same value of p for all the seasons. The values of n for the four seasons are denoted by nl,n2,~,n4' The annual flow also is negative binomial with the same value of p and with n = nl+~+n3+n4' to be denoted by k. The mean of X for the above distribution is nq/p and Cv is (nqrl.l2. The probability generating function (p.g.f.) is P(,) = pn/(1-q,)n. We fust take values of k and p such that the annu8i flows have a specific mean " and Cy- Table 2 gives the cases considered. The p.g.f. of the seasonal flows are n. n. Pi(I) = P 1/ (1 - q,) 1 i = 1,2,3,4,

(5)

R. M. PHATARFOD AND R. SRIKANTHAN

400

TABLE 1 SEQUENCES OF SEASONAL MEANS FOR VALUES OF R',

T' SHOWN

3

4

-2

6240, 6231, 6222 5340, 3540, 4440

7130, 7140

8040, 8031

-3

6330, 6303, 6321 3621, 6312, 3612 4521, 5412, 4512

7230, 7221 2721, 2712

8130, 8121

T'

R'

6420, 4620, 5520, 7320, 3702,

-4

6402, 6411, 5502, 3720, 7311,

5

4602 4611 5511 7302 3711

8220, 8202 2820, 2802 8211, 2811

where ni are so chosen as to give the required seasonal means. For example, for the case where p = 0.33 , k=4 , if the seasonal means n 1 = 2, n2 = 1, n3 = 2/3, form the sequence 3,1.5,1,0.5, we have n4 =

1/3.

It should be noted that for the method of determining the probability

of emptiness given in this paper it is not necessary to have the annual flows or the seasonal flows following a negative binomial distribution. This distribution was chosen because it is analytically easy to deal with as well as being the discrete analogue of the gamma distribution, a distribution which is commonly chosen to model and streamflows. Note, however, that from the relations, IS = kq/p kq = we have Cv = (ISP 1/2; thus there are lower bounds for the value of Cv ' namely 0.4082 (for IS = 6) and 0.3536 (for IS = 8). There is however, no upper bound.

lIe; ,

r

4. PROBABIUTY OF EMPTINESS OF TIIE RESERVOIR The exact steady-state probability of emptiness of the reservoir, of various sizes, fed by seasonally dependent inflows can be obtained by a method due to Moran (1959). However, this method which deals with matrix operations involves fairly' large scale computations since these matrices are different for different reservoir sizes. Instead, we derive here an approximate value of the probability of emptiness obtained by assuming the reservoir to be bottomless and taking the probability of depletion greater than or equal to an integer K to be

401

SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE

TABLE 2 Values of parameters p and k for given p

0.667

k

12

Cv 0.5000

0.400

0.333

0.200

0.100

6

4

3

1.500

0.667

0.6455

0.7069

0.9129

1.2910

0.5774

0.667

Cv

Ii

=6

0.500

16

8

0.4330

0.5000

k

and Cv

0.500

(i) Annual mean p

Ii

(ii) Annual Mean

, Draft ratio 67%

0.385

0.333

0.200

0.111

5

4

2

1

0.5701 Ii

=8

0.6124

0.7906

1.0607

, Draft Ratio 50%

the probability of emptiness of the reselVoir of size K. It was shown by Phatarfod (1979) that the probability of emptiness so obtained is fairly close to the exact value obtained by considering the reselVoir to be finite. The requisite result - see Phatarfod (1980) - is as follows: Suppose we have N seasons. Let the inflow during the ith season Xi with probability generating function (p.g.f.), Pi (8), be and let the release be one unit per season. If the i = 1,2,...N, N mean annual flow, E(r i Xi) is greater than N, it is known that the equation

Jt =IPi(8)

= 8N

has

N

roots

81'8 2,... 8N

in the unit

circle. Denote the depletion from the top of the reselVoir at the end of the Nth season by Y, and let uj = Pr.(Y = j]. Then,

uJ" =

N

r

k=1

.

~8~ , j = 0,1,2,...

where the C's are uniquely given by,

(5)

R. M. PHATARFOD AND R. SRIKANTHAN

402

(6) An approximate varue of the probability of emptiness of a reservoir of

.

SIZe

K·IS given . b Y P E =I - 't'K-l t. j =0 uj .

To use the above result for our case, (N = 4), with the seasonal flows having negative binomial distributions, we take, for given values of p and k, values of nl,~,n3n4 such that the seasonal means form a specified sequence. Using these Pi(8) in (6), the probability of emptiness (or failure) of the reservoir was calculated for all the 116 sequences of seasonal means for all the cases considered in Table 2. It was found that within each group with the same values of R' and T'. they were fairly close to each other. 5. EFFECT OF SEASONALITY ON PROBABIllTY OF FAILURE

First, to remove the dependence of the two measures on the unit of measurement, as well as for convenience, we introduce the adjusted measures R = R'!9 and T = T' /-6. The range for both R and T is (0,1). Note, because of the way we selected the unit of measurement in Section 2, the above relations are valid for both the cases of draft values, and in general we have R = 4R' /3", and T = -zr, /"" with R' and T' calculated as in Section 2. To obtain an effective relation between PE , the probability of emptiness or failure of the reservoir and Rand T, (for a given combination of p and k and reservoir size S) we tried fitting regression equations of PE on various functions of R and T. The best fit obtained was for a relation of the form PE = fJ + '1T + sR2 . Table 3 give the required regression coefficients fJ, '1, S for all the cases considered. Figure 1 gives the probability PE plotted as a function of R, for different values of the reservoir size Sf"" Cv and draft ratio 67%, and Figure 2 gives similar graphs for the draft ratio 50%.

403

SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE

TABLE 3 Values of parameters fJ , .., , 6 for various cases.

0.5000 0.5774 0.6455 0.7069

0.0249 0.0567 0.0952 0.1360

0.0032 0.0080 0.0127 0.0170

0.0726 0.1004 0.1183 0.1288

(i) Draft Rat. 0.67 S/p=1.0

0.5000 0.5774 0.6455 0.7069

0.0134 0.0350 0.0642 0.0975

0.0014 0.0035 0.0056 0.0071

0.0376 0.0602 0.0786 0.0923

(ii) Draft Rat. 0.67 S/p=1.167

S/p 0.5000 0.5774 0.6455 0.7069

0.0071 0.0214 0.0427 0.0690

0.0011 0.0033 0.0058 0.0083

0.0192 0.0345 0.0479 0.0584

(iii)Draft Rat.0.67,S/p=1.33

2.000 2.167 2.333 2.500

0.0801 0.0646 0.0521 0.0421

0.0060 0.0047 0.0037 0.0030

0.0345 0.0284 0.0230 0.0185

(iv)Draft Rat.0.67,Cv=0.9129

S/p 2.833 3.000 3.167 3.333

0.1570 0.1400 0.1250 0.1120

0.0060 0.0053 0.0047 0.0043

0.0283 0.0254 0.0227 0.0201

0.4330 0.5000 0.5701 0.6124

0.0091 0.0241 0.0508 0.0710

0.0076 0.0127 0.0187 0.0224

0.0530 0.0826 0.1069 0.1200

(v) Draft Rat.0.67, Cv =1.2910 (vi)Draft Rat.0.5, S/p=0.50

0.4330 0.5000 0.5701 0.6124

0.0023 0.0088 0.0229 0.0350

0.0021 0.0049 0.0092 0.0121

0.0254 0.0467 0.0711 0.0899

(vii)Draft Rat.0.5,S/p=0.625

0.5701 0.0100 0.0023 0.0398 0.6124 0.0165 0.0036 0.0563 0.7906 0.0740 0.0155 0.0996 (viii)Draft Rat.0.5,S/1'=0.75

R. M. PHATARFOD AND R. SRIKANTHAN

404

Cv

(i)

0.30

0.707

0.24

P

0.646

0.18

0.577

0.12

0.500

0.06 0.00

0.0

0.2

0.4

0.6

0.8

1.0

R

Cv

(ii)

0.20

0.707

0.15

0.646 p

Q.10

0.577

0.05 0.00

0.0

0.500 0.2

0.4

0.6

0.8

1.0

R

Cv

(iii)

0.15

P

0.12

0.707

0.09

0.646

0.06

0.577

0.03

0.500

0.00

0.0

0.2

0.4

0.6

0.8

1.0

R Fig 1: Probability of failure. p. as a function of R (Draft ratio = 0.670) (The upper curve corresponds to T=1. the lower to T=O). (i) Reservoir Size S/1'=1.000 (ii) Reservoir Size S/p.=1.167 (iii) Reservoir Size S/p.=1.333

405

SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE

Cv

(i)

0.20

P

0.612

0.15

0.570

0.10

0.500 0.433

0.05 0.00 0.0

0.2

0.4

0.6

0.8

1.0

R

Cv

(iI)

0.15

P

0.12

0.612

0.09

0.570

0.06

0.500

0.03

0.433

0.00

0.0

0.2

0.4

0.6

0.8

1.0

R

Cv

(ill)

0.20

0.791

Ol5 P

OlO

0.612 0.570

0.05 0.00

0.0

0.2

0.4

0.6

0.8

1.0

R Fig 2: Probability of failure, p. as a function of R (Draft ratio=O.50) (The upper curve corresponds to T=1. the lower to T=O). (i) Reservoir size S/,,=0.5 (ii) Reservoir size S/" =0.625 (iii) Reservoir size 51,,=0.75

R. M. PHATARFOD AND R. SRIKANTHAN

406

6. CONCLUSIONS The seasonality of flows, for the case of four seasons, is quantified by two measures, R and T; if one considers the situation where we have more than four seasons, say months, then the measure R would remain the same, but, perhaps, T would be modified to be the depletion of the last three seasons. For different values of these two parameters and draft-ratios of 50% and 67%, and for six values of Cv ' the probabilities of emptiness of the reservoir are calculated The method used for this calculation for different reservoir sizes. was analytical, using the al?proximating technique of the bottomless reservoir. These probabilities enable(l regression equations to be formulated, linking probability of emptiness to R, for various values of T, Cv ' draft-ratio, and reservoir size. REFERENCES

Feller, W. (1967) "An Introduction to Probability theory and applications, 3rd ed. YoU, John Wiley, New York. Fiering, M.B. (1967) Cambridge, Mass.

Streamflow

Synthesis,

Harvard

Univ.

its

Press,

Gould, B.W. (1964) Discussion of Paper by Alexander, In Water Resources Use and Management, Melbourne University Press, Melbourne; 161-164. Hazen, A. (1914) "Storage to be J>rovided in impounding reservoirs for municipal supply". Trans. Am. Soc. Civ. Engrs. 77, 1539-1640. Kartvelshvili, NA (1969) Theory of Stochastic Processes in Hydrology and River Runoff regulation, Israel Program for Scientific Translation, Jerusalem. Kritskiy, S.N. and Menkel, M.F. (1940) "A generalized approach to streamflow control computations on the basis of mathematical statistics" (in Russian) Gidrotekhn. Stroit., 2, 19-24. Moran, PAP. (1959) The Theory of Storage, Methuen, London. Phatarfod, R.M. (1979) "The Bottomless dam" J. Hydrol., 40, 337-363. Phatadod, R.~. (1980) ''The Bottomless dam with seasonal inputs" Austral. J. Statist., 22, 212-217. Phatadod, R.M. (1980) ''The effect of serial correlation on Reservoir size" Water Resources Research, 22, 927-934. Rippl, W. (1883). ''The capacity of storage-reservoirs for water-supply" MID. Proc. Instn. CIV. Engrs. 71, 270-278.

SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE

Savarenskiy, AD. (1940) Gidrotekhn. Stroit., 2, 24-28.

Metod

tascheta

407

regulirovaniya

Stoka.

Sudler, C. (1927) "Storage required for the regulation of streamflow" Trans. Amer. Soc. Civ. Engrs. 91, 622-660.

ESTIMA TION OF THE HURST EXPONENT hAND GEOS DIAGRAMS FOR A NON·STA TIONARY STOCHASTIC PROCESS

GERMAN POVEDA and OSCAR J. MESA Water Resources Graduate Program Facultad de Minas, Universidad Nacional de Colombia Medellin, A.A. 1027 Colombia

The Hurst effect is approached from the hypothesis that there is a fundamental problem in the estimation of the Hurst exponent, h. The estimators given throughout the literature are reviewed, and a test is performed for some of those estimators using i.i.d. and a nonstationary stochastic processes. The so-called GEOS diagrams (R,,*/nO.5 vs. n) are introduced as very powerful tools to determine whether a given time series exhibit the Hurst effect, depending on the value of the scale of fluctuation. Various cases of the test model are presented through both the GEOS and GEOS-h diagrams. Results indicate that indeed there are problems in estimating h, and in some cases it could be due to an erroneous estimation when using the classical estimators. A proposed estimator gives better results which confirms the pre-asymptotic behavior of the Hurst effect.

INTRODUCTION The Hurst exponent, h, has become one of the most important scaling exponents in hydrology, transcending its old presence in hydrology and reaching status in the recent literature on chaos and fractals (Mandelbrot, 1983; Feder, 1988; Schroeder, 1991). In hydrology the whole paradox of the Hurst effect (Hurst, 1951) has received a renewed attention due to the implications and the physical significance of its existence in geophysical and paleo-hydrological time series (Gupta, 1991; Poveda, 1992), and also because we have shown that the existence of the Hurst effect is not such a widespread universal feature of time series, neither geophysical nor anthropogenic (Poveda, 1987; Poveda and Mesa, 1991, Mesa and Poveda, 1993). In part 2 we make a brief introduction on the Hurst effect. In part 3 some of the approaches given to explain the paradox are mentioned. In part 4 we review the estimators of h, and the hypothesis of the Hurst effect as the result of an incorrect estimation of h, is developed. Section 5 presents the so-called GEOS and GEOS-H diagrams for a non-stationary stochastic process. And part 6 presents the conclusions. 409 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 409-420. © 1994 Kluwer Academic Publishers.

410

G. POVEDA AND O. 1. MESA

THE HURST EFFECT The Hurst effect has been extensively studied in hydrology since the original paper by Hurst (1951), and therefore its classical definition will not be developed here (see Mesa and Poveda, 1993 or Salas et al., 1979, for detailed reviews). Let us define the Hurst effect as an anomalous behavior of the rescaled adjusted range, Rn", in a time series of record length n. For geophysical phenomena and "anthropogenic" time series Hurst (1951) found the power relation Rn" = anh, with a=0.61 and the mean value of h=O.72. For processes belonging to the Brownian domain of attraction it can be shown that the expected value and the variance of the adjusted range is (Troutman, 1978; Siddiqui, 1978, Mesa and Poveda, 1993): (1)

(2)

I

where the scale of fluctuation, 9, is given as (Taylor, 1921; Vanmarcke, 1983)

tp(~)d~

8

=

w: T-

Tr(T)

(3)

n8(0)

where p is the autocorrelation coefficient, r is the variance function of local averages, and g(O) is the normalized one-sided spectral density function at zero frequency. The discrepancy between the average value of h=O.72 obtained by Hurst for different time series and the asymptotic value of h=0.5 for i.i.d. processes (9=1) is known as the Hurst effect. There is a more precise definition of the Hurst effect (Bhattacharya et al. 1983) in terms of the functional central limit theorem that suggests examining the behavior of sample values of Rn"/nb with n, which we have called as GEOS diagrams, developed later on.

APPROACHES TO THE PROBLEM Different types of hypotheses have been set forth to explain the paradox are reviewed in Mesa and Poveda (1993), and a brief review of the models proposed to mimic the Hurst effect (preserve h>O.5) is made by Boes (1990) and Salas et al.(1979b). Basically, the problem has been explained as the result of violations of the functional central limit theorem hypotheses: a) the correlation structure of geophysical processes, b) a preasymptotic transient behavior, c) non-stationarity in the mean of the processes, d) selfsimilarity, e) fat tail distributions with infinite second moments. In addition to these, we have examined the possibility of an incorrect estimation of the Hurst exponent.

ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS

411

Non-stationarity of the mean of geophysical time series has been found in several phenomena (Potter, 1976). This means that either their central tendency changes in time, or it exhibits sudden changes (shifting levels). This last idea has re-emerged in the context of climate dynamics (Demaree and Nicolis, 1990), trying to explain climatic variability in Central Africa as a result of recurrent aperiodic transitions between two stable states whose dynamics is governed by a non-linear stochastic differential equation. Bhattacharya et al. (1983) showed that the Hurst effect is asymptotically exhibited by a process X(n) formed by weakly dependent random variables perturbed with a small trend, as the following: K(n)

= Yen) + c(m+n)p

,

(4)

where Y(n) is a sequence of iid random variables with zero mean and unit variance, and c and m are integer constants. The value of Pis tightly linked to the asymptotic value of the Hurst exponent, h, in the following way: for - 00 < P : ; -0.5 then h = 0.5; for -0.5< P < 0 then h = 1 + P; for P = 0 then h=0.5; and for P> 0 then h=1.

ESTIMA TORS OF THE HURST EXPONENT Different estimators have been proposed in the literature in order to determine the value of h in finite time series. Each of them has been linked to the hypotheses presented to explain the paradox. In this section we make a review of these estimators in order to test their performance for both iid and non-stationary processes (equation 4). In his original work, Hurst (1951) used three different estimators: Estimator 1. The slope of the line passing through the point (log 2, log 1) and the center of gravity of sample values of log ~. vs. log n. Estimator 2. For each of the i sample values of ~', an estimator Kj is given as Kj

= log

R; 1 10g(nj2)

(5)

Estimator 3. For all sub-samples of length n, the average value of ~. is used in K

= log

R; 1 log ( n/2 )

(6)

Chow (1951) questioned the adequacy of the linear relationship between log ~. and log n passing through the point (2, 1) in logarithmic space. He proposed an estimator of h (our estimator number 4) as the least-squares slope regression for sample values of log ~. vs. log n. That procedure applied to Hurst's data led to the relationship ~·=0.31no.87. Estimator 5. Mandelbrot and Wallis (1969) suggested an estimator H defined as the least-squares slope regression including all subsets of length j, 5 ::;; j ::;;n. Estimator 6. Wallis and Matalas (1970) proposed a modified version of the estimator 5, using the averaged values of ~'. Both the estimators 5 and 6 are biased in the sense that they exhibit a positive asymmetrical distribution that diminishes as n increases, and they also exhibit a large variance. Estimator 7. In the chronology of the h estimation history, Gomide (1975) gives a turning point because his estimator, YH, does not deal with least-squares slope

G. POVEDA AND O. J. MESA

412

regression. It is based on the expected value of R,,* for iid processes, in such a way that YH '" (log

R; -

log (itfi.)

/ log n

(7)

Estimator 8. Using the functional central limit theorem, Siddiqui (1976) introduced the asymptotic result for the expected value of R" * for ARMA (p,q) process. Based on that result he suggested the SH estimator SH '" (log

a'"

fiT2

R; -

yo-l{1.

log

a) / log n

(I-te,) (l-t4lJ)-1 '~I

(8)

(9)

J~I

where 'Yo is the ratio of theoretical variance of the process and the noise variance. The i •S and (j>;'s are the paraeters of the corresponding ARMA process. From (S) a similar estimator, J, can be given as

a

[",(logR; -log(e1t/2»)/logn

(10)

Poveda (1987), showed that a large set of geophysical time series exhibit values of SH which are close to O.S. This result is in agreement with analyses developed in the context of Bhathacharya et al.'s (1983) definition: a value of SH=O.5 (SHO.S) implies that, for the largest value of n, the sample value of R,,*/nO.5 is exactly (below) (above) its expected value, which can be derived from (1). As a result, it turns out that the Hurst effect is not such a widespread feature of geophysical time series. Estimator 8 is also tantamount to the slope SH of the regression line of R,,* vs. n (log space), impossing a fixed intercept: log (a1t/2)0.5. Therefore, this seems to confirm our hypothesis of the incorrect estimation of the exponent as the cause of the Hurst effect. As a matter of fact, these results confirm a pre-asymptotic behavior in the relation R,,* vs. n before than the behavior h=O.5 settles. Estimator 9. Anis and Lloyd (1976) introduced an estimator for h in the case of iid normal distributed processes, as a function of the sampling interval n, as h (n) '"

logE(R:+ 1 ) - logE(R:_ 1 ) --:----:--=-:---:-----..,.-~

(11)

log (n + 1) - log (n - I )

and they showed that, for these processes, the expected value of R,,* is

E(n -r)lf2

E(R:) = [P(n-l/2)] nO.s r( n /2 ) ,=1

r

(12)

Estimator 10. Sen (1977) developed an analytical procedure to evaluate the expected value of the estimator 1 and 6, for the case of small samples of normal independent random variables as

ESTIMATION OF THE HURST EXPONENT h AND GEOS DIAGRAMS

E(K)

=

=

413

E(R*) II

log(nI2)

(13)

1 2 log(n!2) [1t n(n -l)f.5

r[(n+l)/2] E(n-r)lf2 r

(n /2)

,...1

r

Estimator 11. McLeod and Hipel (1978) proposed two estimators of h. One is based on the result obtained for E(R,.*) by Anis and Lloyd (1976), which is K" = log E(R;) I log (nI2)

(14)

where the value of E(R,.*) is evaluated according to (12). Estimator 12. The second estimator proposed by McLeod and Hipel consists of a modified version of Gomide's (1975) YH estimator, as follows

YH' = (log E(R;) - logJ1t/2) I log n

(15)

in this case E(R,.*) is also obtained from (12). Estimator 13. Salas et al' (1979a, b) introduced an estimator similar to that one of Anis and Lloyd (1976), in the form = log E(R;+j) - log E(R;-J)

H II

10g(n+J) - log(n-J)

(16)

Evaluation of the h estimators for i.i.d. processes Some of the estimators of h that have been proposed have been evaluated, and results appear in Table 1. According to those results, the following conclusions can be drawn: - Sen (1977, p.973, Table 1) presents an erroneous result for estimator 10 (Table 1, column 3) and the corrected values are shown here in Table 1, column 5. Also, estimator 12 shows differences in Poveda's (1988) results compared with those of McLeod and Hipel (1978), as can be seen in columns 6 and 7 of Table 1. - Note that similar results are obtained with estimators 10 and 11 for values of n ~ 200. Despite the fact that estimator 10 was introduced for small samples , it produces the same results as estimator 11 for n large. It is too simple to show their analytical equality as n goes to infinity. - Results obtained with estimators 9 and 13 differ for n=250, 500, and 2500. The differences are due to simulated samples used to evaluate the latter one, as the former one is an exact estimator.

G. POVEDA AND O. J. MESA

414

TABLE 1. Evaluation of h estimators for Li.d. processes n

Estimator 9 Anis and Lloyd (1976)

Estimator 10 Sen (1979)

Estimator 10 Poveda (1987)

Estimator 11 Mcleod and Ripel (1978)

Estimator 12 Mcleod and Hipel (1978)

Estimator 12 Poveda (1987)

Estimator 13 Salas et al. (1979a, b)

10 25 50 100 250 500 1000 2500 5000 10000 15000 20000

0.627 0.584 0.561 0.543 0.528 0.520 0.515 0.512 0.506 0.506 0.504 0.502

0.69 0.65 0.63 0.62

0.655 0.649 0.635 0.622 0.606 0.596 0.587 0.578 0.572 0.566 0.563 0.561

0.687 0.657 0.639 0.623 0.606 0.596 0.587 0.578 0.572 0.566 0.563 0.561

0.432 0.481 0.497 0.5049

0.382 0.445 0.468 0.481 0.489 0.493 0.496 0.498 0.499 0.499 0.499 0.499

0.627 0.548 0.561 0.543 0.531 0.522 0.515 0.509

-

i

l J

0

• 0

<> 0 0

I

II <>

<>

0

I

0

0

<> 0

-

<> 0°

iS IIi· 118 ,.,':

I 1;0

-

'I

I

<08 0 0

-

<>

81

et

0

o

0

0

~

~,

I

<> <> <><><><>~o

S <> 0 0 egg

0

0 8<>0 0

0

-I

I

I

I

I

d 102

1IIIiI

103

II

10 4

Figure 1. GEOS diagram, Bhatthacharya et al model, c=lO, m=25,

j

!

,

~~5

~=-1.0.

Evaluation of h estimators using a non-stationary stochastic process.

A non-stationary process such as the one given by (4) permits one to know the asymptotic Hurst exponent, depending on the value of ~. We used that model to generate synthetic time series of 20,000 terms to evaluate some of the aforementioned estimators of h, for different values of ~. The estimators of h used are those numbered as 1, 2, 3, 6, 7 and three other estimators described as follows.

415

ESTIMATION OF THE HURST EXPONENT h AND GEOS DIAGRAMS

Estimator 14At-is a modified version of Gomide's (1975) estimator 7, as

YH"

= ~og R;

- log

bell) I log n

(17)

Estimator IS. The least-squares slope regression of all sample values of log R",' vs. log m, taking only values of m larger than or equal to n. Estimator 16. Analogous to estimator 15, but in this case using averaged values of R", * for each value of n. Equation (4) was used to generate simulated non-stationary sequences, with different sets of parameters c and m. Detailed analyses were conducted using two different groups of parameters, the ftrst one for c=1 and m=I,OOO, and the second for c=3 and m=O. For the trend we used the values of /3= -1.0; -0.5; -0.4; -0.3; -0.2; -0.1; 0 and 0.5. As an illustration, Tables 2 and 3 show the results of the different h estimators for the cases of c=l, m=I,OOO, /3=-0.3 (h=0.7), and c=3, m=O, /3=-0.2 (h=0.8), respectively. TABLE 2. Estimators of h. P=-O.3, h=O.7, e=1, m=l,OOO n

5 10 25 50 100 250 500 1,000 2,500 5,000 10,000 15,000 20,000

Estimator 1 Estimator 2 Estimator 3 Estimator 6. Hurst (1951) Hurst (1951) Hurst (1951) WaJlis and Matalas (1970) 0.482 0.680 0.617 0.751 0.704 0.544 0.565 0.594 0.577 0.541 0.571 0.586 0.581

0.706 0.678 0.647 0.628 0.613 0.599 0.590 0.572 0.554 0.547 0.563 0.586 0.581

0.717 0.690 0.657 0.637 0.620 0.605 0.594 0.575 0.556 0.547 0.564 0.586 0.581

Estimator 7. Gomide (1975)

Estimator 14.Poveda (1987)

0.654 0.622

0.134 0.377 0.414

0.268 0.384 0.445

0.604 0.589 0.576 0.566

0.560 0.549 0.479 0.466

0.467 0.477 0.488 0.491

0.551 0.535

0.502 0.497

0.485 0.478

0.522 0.525 0.537 0.542

0.471 0.504 0.520 0.518

0.476 0.497 0.520 0.518

.

Estimator Estimator 15. Poveda 16. Poveda (1987) (1987) 0.574 0.553 0.532 0.521 0.521 0.497 0.505 0.551 0.662 0.797 0.826 0.453 .

0.541 0.537 0.536 0.536 0.540 0.551 0.576 0.626 0.709 0.800 0.792 0.453 .

For values of /3 different from 0 (existence of trend), the obtained results confirm a poor performance of all the estimators, except for the estimator 16 which reproduces with good accuracy the asymptotic results of h according to the value of /3, although the pre-asymptotic interval is variable. With the ftrst set of parameters c=l, m=1000 the estimator gives very good results for values of n ~ 2500, and in the second simulation there is a variable interval n, from which the asymptotic result of h is reached. Again, these results seem to confirm the hypothesis of the Hurst effect as a pre-asymptotic effect. DIAGRAMS GEOS AND GEOS·H

As it was mentioned before, the more precise definition of the Hurst effect deals with the convergence in distribution of Ru"nb, with h > 0.5 (see Bhatthacharya et al., 1983).

G. POVEDA AND O. J. MESA

416

Recently, based on that definition we have introduced the so-called GEOS diagrams (R,,"nO. 5 vs. n) and the GEOS-H diagrams (R,,"nb vs n, with h > 0.5) (Poveda,1987; Mesa and Poveda, 1993). Based on those diagrams is a statistical test of the existence of the Hurst effect in a given time series. The asymptotic distribution of R,,"nO. 5, for processes belonging to the Brownian domain of attraction have a mean (I..t') and a standard deviation (a') derived from (1) and (2) (Siddiqui, 1976; Troutman, 1978; Mesa and Poveda, 1993). Convergence of sample values of R,,"n°.5 into the asymptotic interval given by 11' ± 2a' permits one to accept the hypothesis of non-existence of the Hurst effect. Thus, the estimation of e becomes a fundamental issue for processes with a finite scale of fluctuation (see Vanmarcke, 1983; Poveda and Mesa, 1991; Mesa and Poveda, 1993). On the other hand, divergence of sample values of R,,*'nO.5 from that interval does not permit the rejection of the hypothesis of non-existence of the Hurst effect in a time series. TABLE 3. Estimators of 11. n

Estimator I Hurst (1951)

Estimator 2 Hurst (1951)

Estimator 3 Hurst (1951)

5

0.482

0.706

0.717

10

0.680

0.678

0.690

~=-0.2,

11=0.8, c=3, m=O

Estimator 6. Wallis and Matalas (1970) ..

Estimator 7. Gomide (1975)

Estimator 14. Poveda (1987)

0.1343

0.2680

0.5739

0.5668

0.3773

0.3843

0.5544

0.5665

Estimator 15. Poveda (1987)

Estimator 16. Poveda (1987)

25

0.616

0.647

0.657

0.654 0.621

0.4137

0.4454

0.5361

0.5688

50

0.750

0.628

0.638

0.604

0.5601

0.4665

0.5273

0.5228

100

0.703

0.613

0.619

0.589

0.5486

0.4772

0.5217

0.5815

250 500

0.593

0.599

0.604

0.576

0.4780

0.4881

0.5157

0.6027

0.564

0.589

0.593

0.5661

0.4652

0.4913

0.5354

0.6407

1,000

0.596

0.572

0.575

0.5517

0.5035

0.4849

0.5996

0.7099

2,500

0.585

0.556

0.558

0.5359

0.5848

0.4800

0.7538

0.8275

5,000

0.539

0.547

0.548

0.5235

0.4687

0.4769

0.9586

0.9588

10,000

0.596

0.574

0.577

0.5320

0.5269

0.5088

0.9909

0.9105

15,000

0.612

0.6\2

0.613

0.5528

0.5449

0.5449

0.2905

0.2905

20,000

0.602

0.602

0.602

0.5626

0.5374

0.5374

For the case of the simulated sequences obtained using (4), the estimation of the scale of fluctuation, e, makes no sense because its value is a function of time, and the ergodic property fails to hold. Nevertheless, the qUalitative behavior of the sample values of R,,"nO. 5 in the GEOS and GEOS-H diagrams was examined, for different values of c and m. Some of the obtained results are shown in Figures 1 to 5. Figure 1 shows GEOS diagram for the case c=lO, m=25 and ~=-1.0 (h=0.5). It is clear that sample values of R,,*'n°.5 converge into the asymptotic interval 11' ± 2a'· For the case of iid processes (11'=1.2533, a'=0.2733). The effect that trend produces on the iid process is clearly observed in Figure 2 (GEOS for c=10, m=25, ~=-0.3, h=0.7). Sample values of R,,"nO. 5 are contained within the asymptotic interval 11* ± 2a' corresponding to the underlying iid process, except by a notorious bifurcation due to the trend itself. In this case there is a clear evidence of the Hurst effect. This shows the power of GEOS diagrams.

ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS

417

15

00/

000 00

10 0

0

~

I

sL 0

10

'"

l

10

1

0 0000

IIIU_

I

0 10

0

~

~

0

§ ~ $8

0 000

10

10

~

I

lOS

Figure 2. GEOS diagram, Bhatthacharya et al model, c=lO, m=25, /3=-0.3.

~

"l

i "1

o o

o

3

0 0

I

8

0 0 0

0 0

0

g

0 0

0 0

II .~

°T o.L

0

I

o 0

8

0

a8

00

80

0 0

10

n

0 00

e

0 $00

O$C\>oo°""'l,

10

lOS

Figure 3. GEOS diagram, Bhatthacharya et al model, c=l, m=1000, /3=-0.3, h=0.7.

G. POVEDA AND O. J. MESA

418

The parameter c indicates the relative weight of the trend. The stronger the trend the slower the convergence to the asymptotic value of R..·/n h • Figures 3 and 4 illustrate this point: the theoretical limit values are 0.19 and 19, respectively, and the only difference in the parameters is in the value of c. Notice that in Figure 3 the limit is already reached, whereas in Figure 4 not yet.

l

"

'I

jill

i

j

''']

"I

/

6.-

I

00 00 00 00 0 0 0

~ l

I

j

~I

0 0

I

0/

I

2~

0

0

<paP

o~ooe~ 0 og

I I II iii, o

0 10

00

00 00

10

10

n

<> <> 00 0 00

<> <>

o 8

1

--,

j

0

°0 0 10

Figure 4. GEOS diagram, Bhatthacharya et al model, c=lOO, m=lOOO,

105

~=-0.3,

h=0.7.

To illustrate the possibility of error in the estimation of h on the POX diagram, Figure 5 represents the same set of data as in Figure 4. The least-square slope is close to 1.0, whereas the theoretical exponent is 0.7. Evidently, this is due to a pre-asymptotic behavior that does not distinguish between the asymptotic slope and the slope of the pre-asymptotic tendency towards the limit from below.

CONCLUSIONS The non-stationary model given by (4) facilitates the performance of experiments on the estimation of the Hurst exponent, h, and allows one draw conclusions about the existence of the Hurst effect. The estimation of the Hurst exponent, h, has been found to be a delicate an sensitive task. Most of estimators given in the literature make a poor performance according to the experiment developed in this work. Especially for the cases where bO.5. This behavior is explained due to the treatment of the regression intercept in the relation

419

ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS

~01_---'----'-L_LLLL~~2---...L-L-L...L...L.LL;~3--L-_L-L~~~~'-L---L-L'..L'~05 n

Figure 5. POX diagram, Bhatthacharya et al model, c=IOO, m=IOOO,

~=-O.3.

~. vs. n (log space). The intercept is not independent of the slope in the relation Rn' =anh. This consideration is violated in most of the estimators proposed in the literature, and in the whole approach to the Hurst effect. Indeed, estimator 16 (Poveda, 1987) provides better results because it is concentrated in the region of values of n where the preasymptotic behavior has disappeared, or at least is less dominant, and therefore the values of both the intercept and the slope (the exponent) are statistically fitted to the limit values. This work has shown that the estimation of the Hurst exponent h is not a trivial problem: there are many difficulties involved, and therefore the tests of hypothesis that use the GEOS and GEOS-H diagrams constitute a more direct and conclusive tool to identify the presence of the Hurst effect. Possible errors in the estimation of h using least-squares slope in the POX diagrams are produced by pre-asymptotic effects, and the shortness of the record. Estimation of e provides an answer to that problem, even in the case of a short record. For in that case nle, a measure of the record length, will be small. This fact is not apparent in the POX analyses. The Hurst effect holds its importance in hydrology and geophysical time series modeling and prediction due to its links with self-similar processes, fractals and multifractals.

420

G. POVEDA AND O. J. MESA

REFERENCES Anis, A.A. and Lloyd, E. H. (1976) "The expected value of the adjusted rescaled Hurst range of independent nonnal summands", Biometrika, 63, 111-116. Battacharya, R. N., Gupta, V. K. and Waymire, E. (1983) "The Hurst effect under trends", Jour. Appl. Probab., 20, 3, 649-662.

Boes, D. C. (1988) "Schemes exhibiting Hurst behavior", in J. N. Srivastava (ed.) Essays in honor of Franklin A. Graybill, Elsevier, 21-42. Chow, V. T. (1951) "Discussion on 'Long-term storage capacity of reservoirs' by H. E. Hurst", Trans. A.S.C.E. pap. 2447, 8~802. Demaree, G. R. and Nicolis, C. (1990) "Onset of the Sahelian drought viewed as a fluctuation-induced transition", Q. J. R. Meteorol. Soc., 116, 221-238. Feller, W. (1951) "The asymptotic distribution of tbe range of sums of independent random variables", Ann. Math. Stat., 22,427-432. Gomide, F. L. S. (1975) "Range and deficit analysis using Markov chains" Hydrol. Pap. No. 79, Colorado State University, Fort Collins, 1-76. Gupta, V. K. (1991) "Scaling exponents in hydrology: from observations to theory". In: Self-similarity: theory and applications in hydrology. Lecture Notes. AGU 1991 Fall meeting, San Francisco, 1991. Hipel, K. W. and McLeod, A. I. (1978) "Preservatiou or the rescaled adjusted range, 2. Simulatiou studies using Box-Jenkins models". Water Res. Res., 14,3, 509-516. Hurst, H. E. (1951) "Long-term storage capacity of reservoirs", Trans. ASCE, 116,776-808. Mandelbrot, B. B. (1983) The/ractal geometry o/nature. Freeman and Co., New York. Mandelbrot, B. B. and Wallis, J. R. (1969) "Some long-run properties of geophysical records", Water Res. Res., 5, 2, 321-340. Mesa, O. J. and Poveda, G. (1993) "The Hurst pbenomenon: The scale of fluctuation approacb", Water Res. Res., in press. McLeod, A. I. and Hipel, K. W. (1978) "Preservation of the rescaled adjusted range. 1. A reassessment of the Hurst phenomenon". Water Res. Res., 14, 3, 491-508. Potter, K. W. (1976) "Evidence for nonstationarity as a physical explanation of the Hurst phenomenon", Water Res. Res., 12, 5, pp. 1047. Poveda, G. (1992) "Do paleoclimatic records exhibit the Hurst effect"" Fourth international conference on Paleoceanography (Abstracts), GEOMAR, Kiel, Germany. Poveda, G. (1987) "El Fen6meno de Hurst" (The Hurst phenomenon, in spanish). Unpublished Master Thesis. Universidad Nacional de Colombia. Medellin, 230 p. Poveda, G., and Mesa O. J. (1991) "Estimaci6n de la escala de fluctuaci6n para la determinaci6n del fen6meno de Hurst en series temporales en hidrologia" (Estimating the scale of fluctuation to determine the Hurst effect in hydrological time series, in spanish). II Colombian Congress on Time series analysis. Bogota, Universidad Nacional de Colombia. Salas, J. D., Boes, D. C., Yevjevich, V. and Pegram, G. G. S. (1979a) "On the Hurst phenomenon", in H. J. Morel-Seytoux (ed.) Modeling hydrologic processes. Water Resources Publ., Fort Collins, Colorado. Salas, J. D., Boes, D. C., Yevjevich, V. and Pegram., G. G. S. (l979b) "Hurst phenomenon as a pre-asymptotic behavior", Jour. 0/ Hydrology, 44, 1-15. Siddiqui, M. M. (1976) "The asymptotic distribution of the range and other functions of partial sums of stationary processes", Water Res. Res., 12,6, 1271-1276. Taylor, G. I. (1921) "Diffusion by continuous movements". Proc. London Math. Soc. (2),20, 196-211. Troutman, B. M. (1978) "Reservoir storage with dependent, periodic net inputs". Water Res. Res., 14, 3, 395-401. Vanmarcke, E. (1983) Randomjields: analysis and synthesis. The M.1.T. Press, Cambridge. Wallis, J.R. and Matalas, N.C. (1970) "Small sample properties of H and K estimators of the Hurst coefficient h", Water Res. Res., 6, 1583-1594.

OPTIMAL PARAMETER ESTIMATION OF CONCEPTUALLY·BASED STREAMFLOW MODELS BY TIME SERIES AGGREGATION

P. CLAPSl and F. MURRONE2 IDept. of Environm. Engineering and Physics, University ofBasilicata Via della Tecnica, 3 Potenza 85100 - Italy 2Dept Hydraul., Wat. Resour Manag. and Environm. Eng., Univ. of Naples "Federico II" Via Claudio, 21 Napoli 80126 - Italy In the framework of an integrated use, among different scales, of conceptually-based stochastic models of streamflows, some points related to efficient parameter estimation are discussed in this paper. Two classes of conceptual-stochastic models, ARMA and Shot Noise, are taken under consideration as equivalent to a conceptual system transforming the effective rainfall into runoff. Using these models, the possible benefits of data aggregation with regards to parameter estimation are investigated by means of a simulation study. The application made with reference to the ARMA(1,1) model shows advantageous effects of data aggregation, while the same benefits are not found for estimation of the conceptual parameters with the corresponding Shot Noise model. INTRODUCTION

Streamflow time series modeling is generally intended as the closest possible reproduction of the statistical features displayed by the phenomenon under investigation. This is certainly what is needed in the majority of the practical cases for which time series are analyzed, for instance planning and management of water resources systems. Practical needs have led, in the last decades, to a prevailing "operational" approach to time series modelling, in which little space has been left to the analysis of physical, observable aspects in riverflow series. On the other hand, a physically based approach to this problem addresses the reproduction as well as the interpretation of the features of the phenomenon. One of the requirements for a correct reproduction of the runoff process is that the model related to a given scale must be compatible with the models referred to smaller or aggregated scales. Even out of a conceptual approach, the problem of determining stochastic models for aggregated data has received so far limited attention. Among the few papers in this field, Kavvas et al. (1977),Vecchia et al. (1983), Obeysekera and Salas (1986) and Bartolini and Salas (1993) are worth mentioning. With regard to the above requirement, using conceptually based models allows the basic advantage that the information related to a conceptual parameter can be transferred from larger to smaller scales, because its conceptual meaning does not depend on a particular time scale. Therefore, derivation of stochastic models from a general conceptual representation of the runoff process is a first step towards integration of models among different scales. 421 K. W. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 421-434. © 1994 Kluwer Academic Publishers.

422

P. CLAPS AND F. MURRONE

Claps and Rossi (1992), and Murrone et al. (1992) identified stochastic models of streamflow series over different aggregation scales starting from a conceptual interpretation of the runoff process. In this conceptual-stochastic framework there are conceptual parameters common to models related to different scales. The point of view characterizing this framework, which is summarized in the next section, is that the analysis of streamflow series should be extended beyond the scale at which data are collected, taking advantage of information available from models of the aggregated data. The question arise if there is a particular time scale (and, consequently, a particular model) leading to an optimal estimation of a given parameter. The choice of an optimal time scale is important because aggregation of data tends to reduce correlation effects due to runoff components with small lag time with respect to the effect produced by components with high lag time. At the same time aggregation reduces the number of data and, consequently, the quality of estimates. In the above approach, Claps and Rossi (1992) and Murrone et al. (1992) considered a limited number of time scales, such as annual, monthly, and T-day (with T ranging from 1 to 7) and showed that conceptual parameters of models of monthly and T-day runoff are more efficiently estimated using different scales of aggregation. An attempt to introduce a more systematic procedure in the selection of the optimal time scale for the estimation of each parameter is made in this paper. In this direction, simulation experiments are performed with regards to ARMA (Box and Jenkins, 1970) and Shot Noise (Bernier, 1970) stochastic models equivalent to a simple conceptual model of the runoff process.

CONCEPTUAL-STOCHASTIC MODELS AND TIME SCALES

The rationale of conceptualization In the approach by Claps and Rossi (1992) and Murrone et al. (1992), formulation of a conceptual model for river runoff is founded on the "observation" of riverflow series over different aggregation scales and on the knowledge of the main physical (climatic and geologic) features of basins. Considering Central-Southern Italy watersheds, dominated by the hydrogeological features of Apennine mountains, distinct components can be recognized in the runoff: (1) the contribution provided by aquifers located within large carbonate massifs, that has over-year response time to recharge (deep groundwater runofi); (2) a component, which is due to both overflow springs and aquifers within geological non-carbonate formations, which usually run dry by the end of the dry season (seasonal groundwater runofi); (3) the contribution by soil drainage, having a delay of several days with respect to precipitation (subsurface runofi); (4) the surface runoff, having lag-time that depends on the size of the watershed (for the rivers analyzed by the Murrone et al. (1992), this lag ranges between a few hours to almost two days). In some cases, the deep groundwater component is lacking, reducing runoff components to three. The snowmelt runoff in the region considered is negligible. The above runoff components assume different importance with respect to the time scale of aggregation, leading to conceptual models of increasing complexity moving from the annual to the daily scale. Bases for conceptual-stochastic model building proposed for the monthly scale (Claps and Rossi, 1992; Claps et aI., 1993) and for the daily scale (Murrone et aI., 1992) are essentially: (1) subsurface and groundwater systems are considered as linear

CONCEPTUALLY-BASED STREAMFLOW MODELS

423

reservoirs, with storage coefficients Kl , K2 , Ka, going from the smallest to the largest; (2) runoff is the output of a conceptual system made up of the above reservoirs in parallel with a zero-lag linear channel reproducing the direct runoff component; (3) when a storage coefficient is small with respect to the time scale considered, the related groundwater component becomes part of the direct runoff term, which is proportional to the system input; (4) the effective rainfall, i.e. total precipitation minus evapotranspiration, is the conceptual input to the system; this variable is not explicitly accounted for in the models, which are univariate; (5) effective rainfall volumes infiltrate into subsurface and groundwater systems at constant rates (recharge coefficients Cl, C2, Ca, respectively) over time. The main issues of model identification for annual, monthly and daily scales are summarized below.

Annual scale Rossi and Silvagni (1980) first supported on conceptual basis the use of the ARMA(l,l) model for annual runoff series, based on the consideration that the correlation structure at that scale is determined by the deep groundwater runoff component. The use of this model for annual runoff modelling was proposed by O'Connell (1971) in virtue of its capacity of reproducing the long-term persistence displayed by annual runoff data. Salas and Smith (1981) showed how a conceptual system composed by a linear reservoir in parallel with a linear channel fed by a white noise input behaves as an ARMA(1,!) process. Given an effective rainfall input It which infIltrates with a rate Calt and whose part (l-ca)It goes into direct runoff, based on the hypothesis that the input is concentrated at the beginning of the interval [t-1, t], the volume balance equations produce

D I - e- lIK 3 D 1-1 -- (1- c 3 e - lIK 3 ) I I - e - lIK 3 (1- C3 ) I 1-1

(1)

where Dt is the runoff in year t. This hypothesis can be removed considering different shapes of the within-year input function (Claps and Murrone, 1993). The hypothesis that It is a white noise process leads to an ARMA(1,l) model

(2) in which dt equals Dt - E[Dtl, cI> and e are the autoregressive and moving average coefficients, respectively, and Et is the zero-mean model residual. Conceptual and stochastic parameters in (1) and (2) are related by: 9 = cI>(1 - c3) (1 - c3cI»

K3 = -1/ In(cI>);

cI>-9 c3 = 1-9

The expression of Ca for uniform within-period distribution of input is

(3)

424

P. CLAPS AND F. MURRONE

<1>-6 (1-6) K3 (1_e- IIK3 )

(4)

The ARMA model residual is proportional to the effective rainfall by means of:

(5) In absence of significant groundwater runoff, Rossi and Silvagni (1980) showed that annual runoff in the hydrologic year is an independent process that follows a Box-Cox transformation of the Normal distribution. The notion of hydrologic year, which starts at the end of the dry season, is important because if a wet season and a dry season can be distinguished, the absence of significant runoff in the dry season determines absence of correlation in the hydrologic year runoff series. Monthly scale

The assumptions recalled above on the role of the different components in streamflow lead to consideration that correlation effects in monthly runoff are due both to long-term persistence, due to the deep groundwater runoff, and to short-term persistence due to the seasonal groundwater runoff. The conceptual system identified by means of these considerations consists of two parallel linear reservoirs plus a zero-lag linear channel. This latter accounts for the sub-monthly response components included into the direct runoff. The share C3It of the effective rainfall is the recharge of the over-year groundwater, with storage coefficient Ka, while C2It is the recharge of the seasonal groundwater, with storage coefficient K2. All Cj and Kj parameters are kept constant. Approximations determined by the latter assumption are compensated for by parsimony in the number of parameters and by the significance given to the characteristics of the input It, considered as a periodic-independent process. Periodic variability of the recharge coefficients C2 and C3 is substantially due to variability in soil moisture, which is a product of rainfall periodic variability. Claps and Rossi (1992) and Claps et al. (1993) have shown that volume balance equations for the conceptual model under exam are equivalent to an ARMA(2,2) stochastic process with periodic-independent residual (PIR-ARMA), expressed as

(6) with dt and Et having zero mean. The formal correspondence between the stochastic and conceptual representations is obtained through the relations:

(7)

(8)

CONCEPTUALLY -BASED STREAMFLOW MODELS

425

(9)

C2

=

-(8 1 - 8 2 ) N + (I - <1>2) M + (1 + 2 e- IIK2 ) (N - M) 2 M (e- 1/K3 _e- 1/K2 )r2

(10)

where N = (1 - cl>1 - cl>2), M = (1 - 81 - 82), r3 = Ka (l-e-11K3 ) and r2 = K2 (1-e-1IK2). In addition, in the conceptual scheme the residual £t; is proportional to the zero.-mean effective rainfall it according to the relation (11)

If the over-year groundwater component is negligible, as for instance in practically impermeable basins, the conceptual system reduces to one reservoir in parallel with a linear channel, underlying a PIR-ARMA(l,l) stochastic process. Probability distribution of monthly effective rainfall is assumed by Claps and Rossi (1992) as the sum of a Bessel distribution (Benjamin and Cornell, 1970, p. 310), arising from the sum of a Poissonian number of exponentially distributed events, and a Gaussian error term. A Box-Cox transformation of non zero data was also proposed by Claps (1992). To preserve the formal correspondence between the conceptual and stochastic representations of the process, neither deseasonalization nor transformation procedures are applied to recorded data.

T-day scale: multiple Shot Noise model The Shot Noise (Bernier, 1970) is a continuous-time stochastic process representing a phenomenon whose value, at a certain time, is determined additively by the effects of a random number of previous point events. This process is determined by knowledge of: (1) the occurrence times of events, 'ti; (2) the input impulse intensity related to the events, Yi; and (3) the response function ofthe system, h(·), describing the propagation in time of the effects of each impulse. The hypotheses made for this kind of process are: (a) the h(·) function is continuous, infmitesimal for t tending to infinity, and integrable; (b) intensities Yi are random variables independent and identically distributed, with finite variance; and (c) event occurrence times 'ti are generated by a homogeneous Poisson process. The process is stationary if its origin tends to -00, meaning that the origin must be far enough from the time under consideration. Runoff D can be thus expressed, in continuous time, as

D{'t)

=

N(1)

'LYih(t-t)

(12)

N(-~)

where N('t) is the counting function of the Poisson process of occurrences. In the conceptual framework considered (Murrone et aI., 1992), the response function hO is a linear combination of responses of the conceptual elements. If the surface network is considered to behave as a linear reservoir, h(·) is expressed as

426

P. CLAPS AND F. MURRONE s/Ko + elK e- s/KI + elK e- s/K2 + elK e- s/K3 h(s) - 0 elK 0 eII 22 33

(13)

with s = 't-'t;.. The basin response is dermed by 8 parameters: the four storage coefficients, Kj , and the four recharge coefficients, Cj, of which only 7 are to be estimated given the volume continuity condition, ~=1. The Cj coefficients represent the share of runoff produced, in average, by each component. To limit the number of parameters and to take advantage by the linearity hypotheses, coefficients Cj and K; are considered constant, i.e. the response function h(·) is kept constant. The process (12) has wmite memory, which represents the current effect of previous inputs to the system. This effect can be evaluated at a fIxed initial time, to = 0, by knowing the groundwater runoff quota at that time. At the beginning of the hydrological year (October 1 in our case), the seasonal and subsurface groundwater contributions are negligible relative to the deep groundwater runoff. Therefore the value Do of discharge at that time can be a good preliminary estimate of the groundwater runoff amount, thus expressing (12) as:

D(t)

= Do

N(t)

e-t/K3 + I,Yj h(t-t j )

(14)

N(O)

The discretized form of the continuous process (14) is obtained by its integration over the interval [(t-l)T, tTl, where t = 1,2,... is the index describing the set of sampling instants and T is the sampling time interval. If the aggregation occurs on a T-day scale and integration is applied according to the linearity and stationarity hypotheses, the following discretized formulation is obtained:

D t -- K 3 e-tT/K3(eT/K3_1)X0

yt

t

h

+~y. ~ t-s+1 s=1

(15)

s

represents the sum of impulses occurred during the interval [(t-l)T, tTl and where the integrated response is expressed as:

(16) 3 c. -Kj [T/K. h = I, e +e-T/K· I

SIT

j=O

1_

2] e -T(s-I)/K· I

'

s>1

The function hs represents the response of the system determined by a unit volume impulse of effective rainfall, uniformly distributed within the interval. When the scale of aggregation T is chosen as considerably larger than the surface runoff lag-time, the surface runoff component can be considered as the output of a zerolag linear channel, which has response function col)(O), with ~(.) as the Dirac delta function. This reduces to six the number of parameters to be estimated. The structure of daily effective precipitation has been represented as uncorrelated, like in Poisson white noise models (Bernier, 1970) or characterizes by Markovian arrival

CONCEPTUALLY-BASED STREAMFLOW MODELS

427

process (e.g. Kron et aI., 1990) or described by models based on the arrival of clusters of cells, such as the Neyman-Scott instantaneous pulse model (e.g. Cowpertwait and O'Connell, 1992). The distribution considered by Murrone et al.(1992) is a Bessel distribution, corresponding to a Poisson white noise probabilistic model. SIMULATION STUDY

Prerequisites to the simulation For the reasons expounded in the introduction, the simulation study undertaken here aims primarily to set a number of basic points in evaluating theoretically the effects of aggregation on parameter estimation. The problem here is not to identify the most correct model (as, for instance, in Jakeman and Homenberg, 1993) but to understand if there are peculiar scales for estimation of parameters of a given model with pre-determined structure, as in Claps et al., 1993. Simple hypotheses in terms of input and system structure were adopted for the simulation, to grasp the basics of the positive or negative effects of aggregation in time. A linear system was considered, which consisted of one linear reservoir, with storage coefficient K, and one linear channel, in parallel, with lag zero. As shown with reference to annual runoff, this system, fed by a stochastic input, is equivalent to an ARMA(1,I) model when the input is a continuous process. For input as a point process this system is equivalent to a single Shot Noise model (as compared to the multiple version arising from the presence of more than one reservoir). For each set of "true" parameters c and K (written in bold) of the linear system, 20000 output data were generated. On the data obtained from Gaussian input, parameters of the ARMA(1,I) model were estimated and expressed, through (3), in terms of conceptual parameter estimates cand K. Shot Noise model parameters were estimated on data generated from Bessel input. The fll'St 10000 synthetic runoff data were not considered in the estimation, as a warm-up length (Salas et aI., 1980, p.356). This length was set well beyond the suggested limits, to deflnitely eliminate possible "starting condition" effects. The recharge coefficient c, indicating the amount of input entering the reservoir, ranged from 0.5 to 1. In the model of annual runoff c is less than 1 while the case c=1 corresponds to the model of a spring (see Claps and Murrone, 1993). The storage coefficient K was set in a range from 2 to 120 time units (t.u.). The 'time unit' is one unit of the time scale at which input and output data are generated and is also called the reference scale. There is no need to express the storage coefficient in terms of hours or days because what is important is to indicate the value of the parameter in terms of a multiple of the scale of generation. Accordingly, time scales at different levels of aggregation are identifled in number oftime units. In a preliminary set of simulations, the effect of the input standard deviation 0 was recognized as null for Gaussian data and practically negligible for Bessel data. For this reason, only one level of input variability was considered for each distribution, namely 0=113 for Gaussian input and 0=3 for Bessel input. For both cases the mean was set to 1. To allow comparison of parameter estimates made on data obtained with the same "true" values but in different conditions, standard errors of parameters and the explained variance R2 were used. R2 is defmed as 1-0£2/02 , where 0£2 indicates the

P. CLAPS AND F. MURRONE

428

residual variance (taken as the variance of the surface runoff component in the Shot Noise model) and (J2 indicates the variance of the synthetic runoff series. Application

Main points to focus with the aid of simulations are: (1) In which manner the resolution of a linear reservoir depends on the relative mean (coefficient c) of its output with respect to total runoff? (2) Is there a preferential scale for the estimation of the storage coefficient? Results of parameter estimation on simulated data, reported below, suggest a number of comments. ARMACl,l) model

c

The following comments arise from estimation of and K through the ARMA(l,l) model: 1. Results of parameter estimation, reported in Table 1a to 1c (referred to c=O.5, c=O.8 and c=l, respectively), show that aggregation reduces the variance of the fraction of input not entering the reservoir, producing higher values of the explained variance R2. The obvious exception is the case c=l, in which there is no pure white noise component. For the case c=l, the model fitted to the data is still the ARMA(1,l), for it is the most general model of a single linear reservoir with generic within-period form of the input function (Claps and Murrone, 1993). This adds information in providing an estimate of c, obtained through (4). 2. A progressive increase in the standard error of the estimates also occurs with the aggregation, due uniquely to the decrease in the number of data. Table 2 shows that estimations made on the reference scale (1 t.u.) over limited samples produce standard errors greater than the corresponding standard errors for data aggregated on 7,15 and 30t.u. 3. For cd, K is clearly underestimated. This tendency becomes more noticeable with increasing K and with decreasing c. Values found for R2, which decreases in the same circumstances, reflect the poor estimate of K. A tendency toward a preferential scale for parameter estimation is not recognizable from the results shown in Table 1a and lb. 4. More understandable results are obtained by estimating K and C on scales aggregated in unit steps, from 1 to 15 t.u. In this regard, Figures 1-3, clearly show that when K is much greater than the reference scale, aggregation produces better conditions for parameter estimation. The progressive increase in K and c up to a sill (Figure 1), give sufficient indications of this benefit. Therefore, the preferential scale, must be the one in correspondence of which the sill is reached (7 t.u. for this case), as a trade-off between the increase in R2 and the increase in the standard error of estimates. Based on the results reported above, it seems that the preferential scale decreases with increasing c and with decreasing K (in general one should speak in terms of nondimensional preferential scale, Le. divided by K). Figure 2, with c=O.5 and K=15 that would confirms this tendency showing a substantial constancy both in K and indicate that the sill is reached at the reference scale. On the other hand, when c=l the reference scale is the best one for estimation regardless of K, since quality of estimates

c,

429

CONCEPTUALLY ·BASED STREAMFLOW MODELS

degrades with aggregation (see the decrease of c and R2 in Table Ie and the decrease of c in Figure 3). TABLE la. Estimations from: ARMA(I,I) model, Gaussian input, c = 0.5 (scale in t.u.) :t{(t.u.)

C

1 7

10.49 104.5 96.18 83.31 84.12

0.227 0.490 0.508 0.528 0.538

29.08 67.72 61.82 57.12 57.82

0.461 0.522 0.537 0.565 0.557

13.14 14.78 18.63 25.99

0.515 0.492 0.505 0.483

15 30 60 1 7

15 30 60 1

7

15 30

e

A

scale

A

K=120 0.909 0.935 0.856 0.698 0.490

K=60 0.966 0.902 0.785 0.592 0.354

K=15

0.927 0.623 0.447 0.315

e

A

:t{(t.u.)

C

0.884 0.877 0.727 0.457 0.171

0.002 0.029 0.063 0.100 0.111

17.01 89.53 81.30 72.00 73.20

0.322 0.508 0.524 0.548 0.553

0.938 0.805 0.588 0.272 0.012

0.008 0.051 0.098 0.135 0.111

22.74 34.42 36.08 36.30 36.67

0.509 0.513 0.537 0.559 0.523

0.855 0.380 0.154 0.041

0.033 0.090 0.098 0.078

7.22 6.33 7.63

0.529 0.491 0.471

A

K=90 0.943 0.925 0.832 0.659 0.441

0.917 0.853 0.677 0.383 0.102

K=30

0.957 0.914 0.816 0.657 0.660 0.393 0.438 0.092 0.195 -0.084

0.004 0.037 0.078 0.117 0.117 0.018 0.074 0.117 0.128 0.069

K=7

0.871 0.744 0.059 0.331 0.048 0.083 0.140 -0.083 0.046

TABLE lb. Estimations from: ARMA(I,I) model, Gaussian input, c = 0.8 (scale in t.u.) :t{(t.u.)

C

1

83.24 119.9 120.4 109.8 109.6

0.784 0.799 0.811 0.816 0.791

55.27 64.34 67.23 67.80 62.46

0.809 0.806 0.812 0.812 0.750

15.57 14.36 15.57 27.79

0.813 0.780 0.733 0.596

7

15 30 60 1 7

15 30 60 1 7

15 30

e

A

scale

A

C

0.067 0.273 0.398 0.456 0.386

71.23 95.48 96.02 90.78 87.66

0.795 0.806 0.814 0.820 0.781

0.982 0.910 0.126 0.897 0.565 0.369 0.800 0.266 0.448 0.643 -0.039 0.436 0.383 -0.1920.273

29.45 30.44 33.87 40.41 36.44

0.811 0.796 0.787 0.747 0.654

7.59 6.59 5.51

0.814 0.743 0.762

K=120

0.988 0.946 0.943 0.746 0.883 0.507 0.761 0.164 0.578 -0.079

K=60

K=15

0.938 0.707 0.300 0.614 -0.008 0.384 0.381 -0.168 0.255 0.340 -0.039 0.137

e

A

:t{(t.u.)

A

K=90

0.986 0.934 0.929 0.683 0.855 0.413 0.719 0.068 0.504 -0.141

0.089 0.317 0.430 0.462 0.350

K=30

0.967 0.835 0.205 0.795 0.289 0.414 0.642 0.017 0.399 0.476 -0.109 0.303 0.193 -0.188 0.126

K=7

0.877 0.481 0.398 0.345 -0.207 0.255 0.066 -0.265 0.093

P. CLAPS AND F. MURRONE

430

TABLE lc. Estimations from: ARMA(I,I) model, Gaussian input, c = 1.0 (scale in t.u.) scale

A

I{(t.u.)

~

«i>

C

1 7 15 30 60

128.8 124.1 132.7 135.2 115.7

1.000 0.985 0.968 0.938 0.878

1 7 15 30 60

66.11 62.92 65.88 73.47 63.00

1.000 0.971 0.937 0.884 0.791

1 7 15 30

16.14 14.97 15.08 29.16

1.000 0.893 0.793 0.602

K=120 0.992 0.945 0.893 0.801 0.596

K=60 0.985 0.895 0.796 0.665 0.386

K=15

0.940 0.627 0.370 0.358

R2

A

I{(t.u.)

«i>

C

-0.981 -0.301 -0.260 -0.252 -0.285

0.996 0.937 0.863 0.752 0.546

98.27 94.22 99.47 104.7 89.46

1.000 0.980 0.957 0.920 0.848

-0.980 -0.299 -0.255 -0.221 -0.253

0.993 0.878 0.747 0.581 0.326

32.74 30.81 32.13 41.82 36.34

1.000 0.943 0.879 0.781 0.663

-0.969 -0.303 -0.266 -0.030

0.969 0.585 0.309 0.144

7.46 6.82 4.98

1.000 0.812 0.858

K=90 0.990 0.928 0.860 0.751 0.511

K=30 0.970 0.797 0.627 0.488 0.192

~

R2

-0.980 -0.300 -0.258 -0.242 -0.277

0.995 0.917 0.823 0.691 0.458

-0.977 -0.299 -0.253 -0.158 -0.196

0.985 0.767 0.555 0.348 0.134

K=7

0.875 -0.947 0.933 0.358 -0.304 0.330 0.049 -0.304 0.103

TABLE 2. ARMA(1,I) model: standard errors of estimates made on different scales compared to standard errors of estimates made on limited samples (c=0.5, K=60 t.u.) n. of data

«i>

1448 666 333

0.902 0.785 0.592

aggregated std. err. std. err. ~ «i> ~ 0.805 0.588 0.272

0.0309 0.0636 0.1180

0.0429 0.0838 0.1407

«i> 0.6473 0.4936 0.2146

limited sample std. err. std. err. ~ ~ «i> 0.6061 0.4596 0.1833

0.1826 0.345 0.5067

0.1906 0.3526 0.5124

Shot NQilm mQd~1 The fust consideration arising from the observation of Tables 3-4 and Figure 4 is that aggregation has quite different effects on the Shot Noise model estimates than for the ARMA model. With the Shot Noise model there are no evident benefits arising from aggregation, since best estimates are always obtained at the reference scale. The increasing bias of the estimated values of both parameters with aggregation does not leave much room for other considerations. This outcome could be due to the alterations that aggregation induces in the impulse occurrence and intensity and reflects some peculiar characters of this class of models. The positive aspect of this behavior is that even large storage constants can be identified (with some bias) at the reference scale.

431

CONCEPTUALLY-BASED STREAMFLOW MODELS

Another interesting aspect is that the increase of c reduces negative effect of This could be due to the reduced alteration of the white aggregation on the estimate noise component, with aggregation. occurring when c increases.

c.

0.6 <.>

!l

I

0.5 . 0.4 0.3 0.2 0

4

6 8 10 aggregation scale (t.u.)

12

14

16

4

6 8 10 aggregation scale (tu.)

12

14

16

150 ~

100 .

~:J

50

i

0 0

2

Figure 1. Arma(1,l) model: Parameter estimates on aggregated data (c=O.5, K=120 t.u.).

0.54r----,.---..---~--,..----___,_--__._--_,_-___,

<.>

i

~

0.52 0.5·

:J 0.48 O.46L-----'-----'---.L-----'----'-----'---":----'

o

~

aggregation scale (l.u.) 30 25 ~

'0

~ il

•••••• ,

••••••••••••••• _.! ••••••••••••••~ •••••••••••••• ,"

20 .......... .. _....

15 10 ~L-----'-2--~4~-~6--~8~--1~0--~12~-~1~4-~16

aggregation scale (l.u.)

Figure 2. Arma(1,l) model: Parameter estimates on aggregated data (c=O.5, K=15 t.u.).

P. CLAPS AND F. MURRONE

432

u

0.99

...

J"

0.98

.........

0.97 0.96 0

4

2

6

10

8

12

14

16

12

14

16

aggregation scale (t.u.)

135 ~

J"

130 125 ............. -: ... 120 115 0

4

2

6 8 10 aggregation scale (t.u.)

Figure 3. Arma(1,l) model: Parameter estimates on aggregated data (c=l, K=120 t.u.). TABLE 3. Estimations from: Shot Noise model, Bessel input, conceptual model of one reservoir with a linear channel (K=60 t.u.) c=0.8

c=0.5 A

scale

C

1 7 15 30

0.532 0.704 0.756 0.837

it

48.52 49.54 82.36 214.76

R2

C

0.026 0.210 0.242 0.263

0.813 0.872 0.896 0.912

A

it

55.19 77.68 112.24 261.86

c = 1.0 R2

C

0.092 0.386 0.474 0.419

0.993 0.969 0.953 0.942

A

:It

66.39 110.56 172.23 229.29

R2 0.933 0.796 0.707 0.613

TABLE 4. Estimations from: Shot Noise model, Bessel input, (c=O.5, K=120 t.u.) c=O.5 scale 1 7 15 30

A

C

0.533 0.695 0.766 0.837

it 87.11 69.35 162.39 339.92

R2 0.018 0.158 0.175 0.216

CONCEPTUALLY-BASED STREAMFLOW MODELS

433

0.8 u

~

.§'" ~

0.7 0.6 0.5 0

4

6

8

10

12

14

16

12

14

16

aggregation scale (t.u.) 100

:.: ~

.'E"~"

80 60 40 20 0

2

4

6

8

10

aggregation scale (t.u.)

Figure 4. Shot noise model: Parameter estimates on aggregated data (c=0.5, K=60 t.u.).

FINAL REMARKS In a conceptually-based stochastic framework for the analysis of the runoff data at different scales, a simulation study was undertaken to assess possible effects of aggregation on parameter estimation. A simple linear conceptual system, made up of a linear reservoir and a linear channel, was used to generate runoff data from Gaussian and Bessel input, and a conceptually based ARMA(1,1) model and a single Shot Noise model were respectively fitted to the data, providing estimates of the conceptual parameters. Analysis of the results emerging by re-estimation of "true" parameters by means of these models showed that aggregation plays a significant role in achieving correct estimates for the ARMA(1,1) model. In particular, the optimal aggregation scale is the one at which both estimates of the conceptual parameters attain a "sill" level, which is shown to correspond to the least biased value. On the other hand, aggregation does not produce the same effect on the Shot Noise model, for which the scale of generation was found to be the most significant for parameter estimation. Although a more extensive work is needed to test the effect of aggregation on estimation of parameters of more complex systems, these results constitute an interesting starting point as a theoretical support to the use of integrated conceptuallybased models.

REFERENCES Bartolini, P. and J.D. Salas (1993) "Modeling of streamflow processes at different time scales", Water Resour. Res., 29 (8),2573-2587. Benjamin, J.R. and C.A. Cornell (1970) Probability, Statistics and Decision for Civil Engineers, Mc. Graw Hill Book Co., New York.

434

P. CLAPS AND F. MURRONE

Box, G.E. and G. Jenkins (1970). Time Series Analysis, Forecasting and Control. HoldenDay, San Francisco (Revised Edition 1976). Bernier, J. (1970) "Inventaire des Modeles des Processus Stochastiques applicables ala Description des Debits Joumaliers des Rivieres", Rev. Int. Stat. Inst., 38, 49-61. Claps,P. e F. Rossi (1992) "A conceptually-based ARMA model for monthly streamflows", in J.T. Kuo and G.F.Lin (Eds.) Stochastic Hydraulics '92, Proc. of the Sixth IAHR IntI. Symp. on Stochastic Hydraulics, Dept. of Civil Engrg., NTU, Taipei (Taiwan), 817-824. Claps, P. (1992) "Sulla validazione di un modello dei deflussi a base concettuale", Proc. XXIII Conf. Hydraul. and Hydraul. Struct., Dept. Civil Eng., Univ. of Florence, D.91D.102. Claps, P. and F. Murrone (1993) "Univariate conceptual-stochastic models for spring runoff simulation", in M.H. Hamza (Ed.) Modelling and Simulation, Proc. of the XXIV lASTED Annual Pittsburgh Conference, May 10 - 12, 1993, Pittsburgh, USA, 491-494. Claps, P., F. Rossi and C. Vitale (1993) "Conceptual-stochastic modeling of seasonal runoff using Autoregressive Moving Average models and different scales of aggregation", Water Resour. Res., 29(8), 2545-2559. Cowpertwait, P.S.P. and P.E. O'Connell (1992) "A Neyman Scott shot noise model for the generation of daily streamflow time series", in J.P. O'Kane (Ed.) Advances in Theoretical Hydrology, Part A, chapter 6, Elsevier, The Netherlands. Jakeman, A.J. and G.M. Hornberger (1993) "How much complexity is warranted in a rainfall-runoff model?", Water Resour. Res., 29 (8), 2637-2649. Kavvas, M.L., L.J. Cote and J.W. Delleur (1977) "Time resolution of the hydrologic timeseries models", Journal of Hydrology, 32, 347-361. Kron W, Plate E.J. and Ihringer J. (1990) "A Model for the generation of simultaneous daily discharges of two rivers at their point of confluence", Stochastic Hydrol. and Hydraul., (4), 255-276. Obeysekera, J.T.B. and J.D. Salas (1986) ''Modeling of aggregated hydrologic time series", Journal of Hydrology, 86, 197-219. O'Connell, P.E. (1971) "A simple stochastic modeling of Hurst's law", Int. Symp. on Math. Models in Hydrology, Int. Ass. Hydrol. Sci. Warsaw. Murrone, F., F. Rossi and P. Claps (1992). "A conceptually-based multiple shot noise model for daily streamflows", in J.T. Kuo and G.F.Lin (Eds.) Stochastic Hydraulics '92, Proc. of the Sixth IAHR Inti. Symp. on Stochastic Hydraulics, Dept. of Civil Engrg., NTU, Taipei (Taiwan R.O.C.), 857-864. Rossi,F. and G. Silvagni (1980). "Analysis of annual runoff series", Proc. Third IAHR Int. Symp. on Stochastic Hydraulics, Tokio, A-18(1-12). Salas, J.D. and R.A. Smith (1981) "Physical basis of stochastic models of annual flows", Water Resour. Res., 17(2),428-430. Salas J.D., Delleur J.W., Yevjevic V. and Lane W.L. (1980). Applied Modeling of Hydrologic Time Series. Water Resources Publications, Littleton, Colorado. Vecchia, A.V., J.T.B.Obeysekera, J.D. Salas and D.C. Boos (1983) "Aggregation and estimation oflow-order periodic ARMA models", Water Resour. Res., 19(5),1297-1306.

Acknowledgments This work was supported by funds granted by the Italian National Research Council Group for Prevention from Hydrogeological Disasters, grant no. 91.02603.42.

ON IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES WITH APPLICATIONS TO POLLUTION SPREAD MODELING IN RIVER SYSTEMS

A. KRZYZAK Department of Computer Science Concordia University 1455 de Maisonneuve Blvd. West Montreal, Quebec, Canada H3G 1M8 In the paper identification of nonlinear memoryless cascade systems is discussed. An optimal model of a memory less cascade system is derived and estimated by the kernel regression estimate. Convergence of identification procedures is investigated. Possible extensions to nonlinear dynamical systems of the Hammerstein and Wiener type are also discussed.

INTRODUCTION Identification of pollution spreading in the river systems is an important problem. In order to simplify the modeling process we assume that a river or canal can be divided into segments connected serially and each segment is modeled by a nonlinear mapping describing polution concentration at the begining and the end of the segment. Segment boundaries may correspond to locations of estuaries or industrial dumping sides. In the present model we only describe the steady state behaviour, that is we do not incorporate dynamics into our model, however we will indicate possible extensions in that direction. All measured quantities are assumed to be random with completely unknown distributions. The identified system is assumed to be highly nonlinear. Identification algorithms are nonparamteric and are based on kernel regression estimates. Identification of nonlinear systems remains an important and challenging problem (Billings 1980). Methods based on Volterra and Wiener series expansions are available for identification of general nonlinear systems (Banks 1988). These methods are rather complicated and require selecting the number of terms in the expansion. They also result in highdimensional parameter estimation problem. Nonlinear difference and differential equations require detailed knowledge of the system and often lead to complicated solutions involving phenomena such as bifurcations and chaos. In this context many authors considered specific nonlinear systems and obtained efficient identification algorithms. The simplest configurations include block oriented systems connected in a cascade. Block oriented systems have been studied by Sandberg 435 K. W Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 435-448. © 1994 Kluwer Academic Publishers.

436

A.KRZYZAK

(1991). Identification of a cascade of memoryless subsystems has been studied by Greblicki and Krzyzak (1979) and Krzyzak (1993). The Wiener system consists of a linear dynamic subsystem followed by a nonlinear memoryless subsystem, while the Hammerstein system consists of zero-memory nonlinearity followed by a linear filter. The Wiener model has been applied to signal detection by Masry and Cambanis (1980). The Hammerstein model has been introduced to system identification by Narendra and Gallman (1966) and further studied by Chung and Sun (1988) and Billings and Fakhouri (1979). It has been applied to adaptive control by Kung and Womack (1987) and to identification of biogical systems by Hunter and Korenberg (1986). In the present paper we study in detail optimal modeling and identification of a cascade of memory less subsystems and we briefly mention its extension to dynamical cascade systems such as Hammerstein and Wiener nonlinear systems. A particular version of a cascade memoryless system has been considered by Greblicki and Krzyzak (1979). The authors derived the optimal model of a system consisting of two subsystems in the case when the second subsystem was invertible. They studied weak consistency of identification procedures. We extend these results to a cascade of n not-necessarily-invertible subsystems. We study strong convergence of identification algorithms. Subsequently we apply nonparametric techniques to two configurations of dynamic, nonlinear systems: the Hammerstein system and the Wiener system. We first consider Hammerstein system and then apply the results to the Wiener system by inverting the estimate of memoryless nonlinearity using the estimate of regression function inverse developed for cascade systems. In both systems we identify linear and nonlinear 'components simultaneously from the observations of the inputs and outputs of the whole systems. The linear subsystems are described by the ARMA model, whose coefficients are estimated by the correlation method. The nonlinearities cen be identified by the nonparametric kernel regression estimate. The intermediate signal between the nonlinear and linear components are not measured resulting in the possibility of identyfing both components up to an additive and multiplicative factor. Identification of the Hammerstein system has been studied by many authors. Nonrecursive kernel estimation has been used to identify the system by Greblicki and Pawlak (1986) and Krzyzak (1990). The recursive version has been investigated by Krzyzak (1992). The estimate based on orthogonal expansions has been studied by Krzyzak (1989) and Pawlak (1991). In order to identify Wiener system we use the estimates of linear and nonlinear components similar to the those used in the Hammerstein system. To recover the nonlinear part we utilize the techniques developed for the cascade system identification. The convergence results for the kernel estimate of the Wiener model have been given in Greblicki (1992) and Krzyzak (1993). In nonparametric identification we do not restrict nonlinearities to a class of functions described by a finite number of parameters, such as polynomials or trigonometric functions. The class of nonlinearities we are capable to recover

437

IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES

z

x~

1

1 xl

Sl

.1

z

z 2

~2

S2

1

Sn

n xn

Figure 1: Identified cascade system

~

u1

Ml 1

~21

M2 1

Mn

~n

Figure 2: Model of identified cascade system

by our kernel estimate includes the class of all Borel measurable functions. This class is too large to be finitely parameterized: therefore a nonparametric approach is chosen in this paper. Standard nonlinearities such as dead-zone limiters, hardlimiters and quantizers are included in the class of nonlinearities we can identify by our techniques.

OPTIMAL MODEL OF CASCADE STRUCTURE We consider a system consisting of a cascade of n (n ~ 2) memoryless subsystems as shown in Figure 1. All signals Ul, ••• ,Un are random and of the same dimension, say d. The model of the system from Figure 1 has the same structure shown in Figure 2. Our goal is to find the optimal Mean Integrated Square Error (MISE) model of the system in Figure 1, that is the model minimizing n

Q = EEllXi i=1

-


(1)

where Xo = Uo = X, Ui =
A. KRZYZAK

438

cI>*(Ui-l)

= lI1i(II1Ll(Ui-l))

(2)

where 111 Ll (Uj-l) is an inverse of a component of'Il containing Ui-l.

=

Remark 1 If the whole function 'Il is invertible then 'Ilt 'Il- l , where 'Il- l is an inverse of 'Il. To clarify the stated results consider the following example. Let d = 1 and 'Ill(u) = u 2 for - 1 ~ U ~ 1 and 'Ill(u) = exp(u - 1) for U ~ 1. Then 'Ilt(u) = -v'U for - 1 ~ U ~ 0, 'Ilt(u) = v'U for 0 ~ U ~ 1 and 'Ilt(u) = In(u + 1) for u ~ 1.

Proof of Theorem 1. Consider quality index n

Q = E EIiXi - Oi(X)W ;=1

(3)

Notice that the mapping OJ(X) uses X as its input therefore its input is the input of the whole cascade system. Consequently minimizing criterion (3) in the class of models OJ, i = 1, ... , n is equivalent to minimizing criterion Q in the class of multivariate models ignoring the cascade structure of the model of Figure 2. On the other hand the class of models in (1) is constrained to the class of cascade models shown in Figure 2. Clearly minimization of Q with respect to cI>; is constrained minimization subject to the cascade structure of the class of models unlike minimization of Qwhich is unconstrained minimization in the class of multivariate models. Hence obviously:

Q* = minQ ~ minQ. It is clear that the minimizer of Qis O;(x)

(4)

= 'Il;(x) = E{XiIX = x}.

By definition of 'IlLl and a simple substitution we can see that

Consequently Q = This implies

Q*

for a particular cascade model cI>i(u;-d

= 'Ili('IlLl(Ui-l)).

minQ ~ minQ.

This together with (4) implies that (1) is minimized by (2) concluding the proof. From now on, for simplicity but without losing generality, we will concentrate on the system and the model consisting of only two components: 8 1, 8 2 and M l , M 2 , respectively. Denote X, Xl, X 2 by X, Y, Z and cI>, 'Ill by cI>, 'Il and Ul , U2 by U and T. Assume moreover that cI>-1 exists.

IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES

439

In order to estimate the optimal model (2) we assume that we have a sequence of independent observations (Xt, Yi, Z1),"', (Xn' Yn, Zn) of the random vector (X, Y, Z) and X has distribution J.L. We apply the kernel regression estimate to recover cI> and a combination of kernel regression estimate and regression inverse estimate to recover M 2 •

IDENTIFICATION OF SUBSYSTEM S1 The following kernel regression estimate will be applied to identify M1 cI> () n X

'"'~

YoK(X-Xi)

h = L...=1' '"'~_ K(X-X;) L..._1 h n

n

(

)

5

where K is a kernel and {h n } is a sequence of positive numbers. Estimate (5) has been investigated by Greblicki and Krzyzak (1980), Devroye (1981), Greblicki et al. (1984), Krzyzak and Pawlak (1987), Krzyzak (1986, 1990) and Stone (1982). We can regard (5) as a weighted average of output measurements y;, i = 1" .. ,n. The weights K( XI.;i)/ Ei':1 K(XI.;;) are probability weights (that is they sum to one). They depend nonlinearly on the input point at which we calculate the estimate and on the random input measurements. The measurements close to the input point are generally assigned higher weights than the measurements that are farther away. The weights also depend on the kernel and the smoothing sequence. The idea is·to make weights more concentrated around the input point as the number of observations increases. This is achieved by adjusting the smoothing parameter hn which scales kernel K appropriately. In the estimate we have to select two parameters: a kernel and a smoothing sequence. The choice of the smoothing sequence is more critical than the choice of the kernel, however the carelessly selected kernel may introduce some rigidity into the estimate which in turn may adversly affect the rate of convergence. In order to make the estimate converge we must impose some conditions on the kernel and the smoothing sequence (see Theorem 2 below). The best nonparametric estimates currently available have parameters which automatically adapt to the measurements. The theorem below deals with the pointwise consistency of (5). For the proof refer to Krzyzak and Pawlak (1987). Theorem 2 Let E{IIYW} < oo,s fies the following conditions:

> 1.

Suppose that nonnegative kernel K satis-

c1 H(llxlD ~ K(x) ~ C2H(llx1D cl{llxlI:S;r} ~ K(x) tdH(t)-+O as t-+oo

for some positive constants tion H.

C1 ~ C2

(6)

and c and for some nonincreasing Borel func-

A.KRZYZAK

440

If the smoothing sequence satisfies

hn

-t

0

(7)

n(s-1)/8h~/logn - t 00

then

I1>n(x)

-t

l1>(x) almost surely as n

- t 00

for almost all x mod fl.

Remark 2 Convergence in Theorem 2 takes place for all input distributions (that is for ones having density or discrete or singular or any combinations of aforementioned) and we impose no restrictions on regression function 11>. Examples of kernels satisfying (6) are listed below a) rectangular kernel

I«x) = b) triangular kernel

{10/2

for IIxll S; otherwise

1

I«x) = { 1 -lIxll for

IIxli. S; 1 otherwIse

o

c) Gaussian kernel

I«x) =

1

rn=exp{( -1/2)lIxIl 2 }

v21r

d) de la Valee-Poussin kernel

I«X) =

{

I (8in(X/2))2 2". x/2

1/21r

If hn = n- Ct then (7) is satisfied for 0

if x =J 0 if x = O.

< a < l/d.

The next theorem states sufficient conditions for uniform consistency. This type of consistency is essential for convergence of the estimate of M 2 • The result presented here is an extension of Theorem 2 in Devroye (1978).

Theorem 3 Let ess supx E {IIYW IX = x} < constants r, a, b, r S; b such that

00.

Suppose that there exist positive

(8) where A is a compact subset of Rd , SX,T is the ball with radius r centered at x and ). is the Lebesgue measure. Let I< assume finitely many k values. If

IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES

hn

-t

nh~jlog n

as n

0 - t 00

441

(9) (10)

then

- t 00,

esssup IIn(X) - (X)II-t 0 almost surely A

(ll)

as n - t 00. If K is a bounded kernel satisfying (6) then (ll) follows provided that condition (10) is replaced by nh!d / log n -t 00. (12)

Remark 3 Hypothesis (ll) follows under condition (10) if K is for example a window kernel. Essential supremum is taken with respect to the distribution of the input. Notice that we do not assume that X and Y have densities. Condition (8) guarantees that there is sufficient mass of the distribution in the neighborhood of point x. The condition imposes restrictions on fl and on the shape of A. It also implies that A is contained in the support of fl, that is the set of all x such that fl(Sx,T) > 0 for all positive r. If X has density f then (8) is equivalent to ess infxEA f( x) > O.

Proof of Theorem 3. Define Y = E{YI{lYI:::>nl/2}} and (x) = E{YIX = x}. Let also Kh(x) = K(x/h). Clearly (x) - n(x) = l:i=l (Y; - Y;)Kh(x - X;)/ l:i=l Kh(x - Xi) + l:i=l (Y; - ~(Xi))Kh(X - X i)/ l:i=l Kh(x - Xi) +(l:i=l ~(X;) - (X))Kh(X - X i)/ l:i=l Kh(X - Xi) =I+II+III. By the moment assumption I = 0 a.s. for n large enough. By the result of Devroye (1978, p. 183) there exist finite constants C3, C4 andc5 such that for hn small enough n

P{esssup IIIII > €}:::; P{in(~=Kh(x - Xi) < C3nh:} :::; c4h;;-dexp(-C5nh:). A

A i=l

Term I I needs special care. Suppose that K takes finitely many different values al,"', ak. Vector [K((x - Xd/h),···, K((x - Xn)/h)] as a function of x can take at most (2n )c(d)k values contrary to the intuitive number kn (Devroye 1978, Theorem 1). Constant c(d) is equal to the VC dimension of the class of d-dimensional balls. We thus obtain

P{esssuPA IIII > €}:::; P{(Xl"",Xn) f/. B} +E{IB(XI,"', Xn)ess sup(X1 , ••• ,xn )(2ny(d)k SUPAjEA ,P{I l:i=l (Y; - m(Xi))aj;f l:i=l ajd ~ €IX l , " ' , Xn}

(13)

A.KRZYZAK

442

where B is the set of all (XI,' .. ,xn) E Rdn such that i~f jln(Sz,h n )

;:::

c3h~

and Aj = (ail! ... ,ajn) is a member of partition A induced by the vector [K( (x - X 1 )/ h), ... ,K( (x - X n )/ h)] on n-dimensional space of multivalued vectors each component of which can assume k different values. To bound the second probability on the rhs of (13) we are going to use McDiarmid inequality (Devroye 1991). Let Xl!'" ,Xn be independent random variables and assume that sup If(xI, ... ,Xi, ... Xn) - f(xl! ... ,x~, ... xn)1 ~ c;, 1 ~ i ~ n.

xi,xi

Then n

P{lf(XI,'" ,Xn) - Ef(XI,'" ,Xn)1 ;::: f} ~ 2exp(-2f2 / L:c~). i=l Using this inequality we obtain for the second probability in (13)

P{I Li'=l(l'i - m(X;))aj;!Li'=l ajd ;::: fIXI,'" ,Xn} ~ 2exp( -nf2 / L':=l 2 max; If?la~;/(Li'=l aji)2) ~ 2exp( -nf2 Li'=l aji/ maxi aji) ~ 2exp( -Csnh~) where the last inequality follows from the fact that on set B n

L: aji/ m~x aji;::: i=l



const nh~.

So the second term on the rhs of (13) is not larger than

2(2ny(d)kexp( -Csnh~). The first probability in (13) can be bounded above by P{i~f jln(Sz,h n )

;:::

c3h~} ~ c4h;.d exp( -c5nh~).

Theorem 3 follows from (10) and the Borel-Cantelli lemma. In case when K assumes infinitely many values we can use the stepwise approximation of K and obtain an upper bound for (13)

c7(2ny(d)/h d exp( -c8nh~). The theorem follows from (12).

IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES

443

ESTIMATE OF \]1 In order to obtain consistent estimate of M2 we need to estimate regression \]1 and regression inversion -1. The estimate of \]1 is given by \]1 ( ) n X

=

,,~_ Z1«X-X i ) L.,.,._1· hn

(14)

l:i':1I«X);:i)

The convergence of (14) follows from Theorem 2.

ESTIMATE OF REGRESSION INVERSION -1 As we shall see later we would need a consistent estimate of -1 in order to identify M 2 • The estimate of -1 will be derived from n. Since n may not be invertible even when is we need to define a psudoinverse.

Definition 1 Let : X -- Y where X is a complete space and let s = infxEx II(x)for some y E y. A function + : Y -- X is called a pseudoinverse of if for any y E y, +(y) is equal to any x E X in the set

yll

00

00

A(y) = Ucl({x:II(x)-yll~s+l/n})= UAn n=1

(15)

n=1

where cl(A) denotes closure of set A.

Remark 4 Since cl(An) are closed and nonempty and X is complete set then set A(y) is nonempty and + is well defined. If is continuous and X is compact then +(y) is equal to any x* such that min II(x) ",EX

yll =

II(x*) -

yll

If is invertible then + coincides with -1. The pseudoinverse depends on the norm. Two versions of + will be useful in applications in the case of a scalar function

= YEA(y) inf y

(16)

+(y) = sup y.

(17)

+(y) and

yEA(y)

The next theorem deals with consistency of ~.

A.KRZYZAK

444

Theorem 4 Let <.I> : A - B be a continuous function, A be a compact subset of Rd and let <.I> A denote an image of A by <.I>. If

sup II<.I>n(x) - <.I>(x)ll_ 0 xEA

then as n -

00,

at every y E <.I>A.

The proof will be omitted.

IDENTIFICATION OF SUBSYSTEM S2 Using equation (2) the natural estimate of S2 is given by (18) where <.I>: is pseudoinverse of <.I>n. The following straightforward result is useful in proving the consistency of (18). Lemma 1 If f is continuous and

sup Ilf(x) - fn(x)ll- 0 x

and xn-x as n -

00,

then

Lemma 1 and Theorem 3 imply the convergence of identification algorithm of S2. Theorem 5 Assume that <.I> and \II are continuous on A and esssupx E{IIYWIX}

<

00,

esssupx E{IIZWIX} <

00.

If [{ assumes finitely values and (8-10) hold then

(19) as n - 00 at every u E <.I>(A). If [{ assumes infinetely many values, then (19) follows when in addition [{ satisfies (6) and condition (10) is replaced by (12).

445

IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES

n

xn

cp

Sn

r '"'\

wn

{k;}

\....J

Yn

Figure 3: Hammerstein System

HAMMERSTEN AND WIENER SYSTEMS The outline of the MIMO discrete Hammerstein system is given in Figure 3. The nonlinear memoryless subsystem is described by

(20) where Xn is Rd-valued stationary white noise with distribution f.L and ~n is a stationary white noise with zero mean and finite variance No correlation is assumed between ~n and X n • Assume for simplicity that 'IjJ is a scalar function. The linear dynamic subsystem is described by the ARMA model (assumed to be stable):

at.

Sn

+ a1Sn-l + ... + a/Sn_/

=

Yn =

boWn + b1 W n- 1 Sn

+ ... + b/Wn_/

where 1 is unknown order of the system and Sn is its noise-free output. The linear subsystem can also be described by state equations.

(21) where Xn is an I-dimensional state vector and A is assumed to be asymptotically stable. These conditions imply that Xn and Yn are weakly stationary as long as Wn is weakly stationary. By (21)

E{YnIXn} = d1CP(Xn) + a where a = Ecp(X)cT(I - A-1b).

= m(Xn)

(22)

From equation (21) we obtain a weighting sequence representation 00

Yn = EkjWn - j j=O

(23)

A.KRZYZAK

446

TJn un

rn

{b i }

f 1\ \..L/

tn

?jJ

Zn

Figure 4: Wiener System where ko = dl =f:. 0, ki = cT Ai-lb, i = 1,2"" and I:i:o Ik;l < 00 guarantees asymptotic stability of the linear subsystem. lt follows from (23) that

E{YnIXn} = ko¢>(Xn)

+ (3

where (3 = E¢J(X) I:i:l ki • lt is obvious from the above equation that we can use kernel regression estimate to estimate min (22) and consequently recover ¢> (up to multiplicative and additive factors). The only difference with the problem in Section 3 is that now {Xi, Yi} is a sequence of dependent random variables. For identification procedures and their asymptotics refer to Greblicki and Pawlak (1986) and Krzyzak (1990). Let us now consider Wiener system shown in Figure 4. The nonlinear memoryless subsystem is described by

(24) where for simplicity we assume that Tn is one dimensional output of the linear dynamic subsystem and Zn E R. with distribution f.L and ~n is a stationary white No correlation is assumed between ~n noise with zero mean and finite variance and X n . The linear subsystem is described by the ARMA model:

O'l.

Rn

+ CtRn-l + ... + c/R n- 1 = Tn =

doUn + d1Un- 1 Rn + 'TJn

+ ... + d/Un- 1 (25)

Un is a stationary gaussian noise with distribution f.L and TJn is a stationary gaussian noise with zero mean and finite variance O'~. No correlation is assumed between 'TJn and Un. The linear subsystem can also be described by state equations.

X +l n

=

BXn + eRn (26)

where parameters of the system (26) have the similar form as in (21) and B is asymptotically stable. From (26) we get a weighting sequence representation,

Tn = I:~o ljUn_j Zn = ?jJ(Tn)

+ 'TJn (27)

IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES

447

where 10 = e -I 0, Ii = f Bi-1e, i = 1,2, ... and E~o 11il < 00 guarantees asymptotic stability of the linear subsystem. It can be shown that estimation techniques for inverses of regression functions from Section 5 are be applicable to Wiener system identification (see Greblicki (1992) and Krzyzak (1993)). Wiener and Hammerstein systems can be combined into a cascade of memoryless nonlinear systems interconnected with dynamic linear components. Such models are very general but still simple enough to obtain identification algorithms.

CONCLUSION We considered modeling of pollutants in the river and canal systems by interconnected nonlinear systems. Particular attention has been devoted to cascade of memory less subsystems. Identification algorithms have been given and their strong convergence properties investigated under very mild restrictions on the measurements and parameters. Possible extensions to dynamic systems such as Hammerstein and Wiener systems have been explored. The rates of convergence of the algorithms will be addressed in the subsequent papers.

ACKNOWLEDGEMENTS This research was sponsored by NSERC grant A0270 and FCAR grant EQ 2904.

REFERENCES Banks, S. (1988) Mathematical Theories of Nonlinear Systems, New York, Prentice Hall, 1988. Billings, S.A. (1980) "Identification of nonlinear systems-A survey", Proc. lEE 127, D, 6, 277-285. Billings, S.A. and Fakhouri, S.Y. (1979) "Non-linear system identification using the Hammerstein model", Int. J. Syst. Sci. 10,567-578. Chung, H.Y. and Sun, Y.Y. (1988) "Analysis and parameter estimation of nonlinear systems with Hammerstein model using Taylor series approach", IEEE Trans. Circuits Syst. CAS-35, 1533-1544. Devroye, L. (1978) "The uniform convergence of the Nadaraya-Watson regression function estimator", The Canadian J. Statist. 6, 179-191. Devroye, L. (1981) "On the almost everywhere convergence of nonparametric regression function estimates" , Ann. Statist. 9, 1310-1319. Devroye, L. (1991) "Exponential inequalities in nonparametric estimation", In: Roussas, G. (ed) Nonparametric Functional Estimation, Kluwer, Boston, 31-44. Greblicki, W. (1992) "Nonparametric identification of Wiener systems", IEEE Trans. Information Theory IT-38, 1487-1493.

448

A.KRZYZAK

Greblicki W. and Krzyzak, A. (1979) "Non-parametric identification of a memoryless system with cascade structure, Int. J. Syst. Science 10, 1301-1310. Greblicki W. and Krzyzak, A. (1980) "Asymptotic properties of kernel estimates of a regression function", J. Statist. Planning Inference 4, 81-90. Greblicki, W., Krzyzak A., and Pawlak, M. (1984) "Distribution-free pointwise consistency of kernel regression estimate", Ann. Statist. 12, 1570-1575. Greblicki W. and Pawlak, M. (1986) "Identification of discrete Hammerstein systems using kernel regression estimates", IEEE Trans. Automat. Contr. AC-31, 74-77. Hunter, LW. and Korenberg, M.J.(1986) "The identification of nonlinear biological systems: Wiener and Hammerstein cascade models", BioI. Cybern. 55, 135-144. Krzyzak, A. (1986) "The rates of convergence of kernel regression estimates and classification rules", IEEE Trans. Information Theory IT-32, 668-679. Krzyzak, A. (1989) "Identification of discrete Hammerstein systems by the Fourier series regression estimate" , Int. J. Syst. Science 20, 9, 1729-1744. Krzyzak, A. (1990) "On estimation of a class of nonlinear systems by the kernel regression estimate", IEEE Trans. Inform. Theory IT-36, 1, 141-152. Krzyzak, A. (1992) "Global convergence of the recursive kernel regression estimates with applications in classification and nonlinear system estimation", IEEE Trans. Inform. Theory IT-38, 1323-1338. Krzyzak, A. (1993) "Identification of nonlinear block-oriented systems by the recursive kernel estimate", J. of the Franklin Institute, vol. 330, 605-627. Krzyzak A. and Pawlak, M. (1987) The pointwise rate of convergence of the kernel regression estimate", J. Statist. Planning Inference 16, 1590-166. Kung M. and Womack, B.F. (1984) "Discrete-time adaptive control of linear dynamic systems with a two-segment piecewise-linear asymmetric nonlinearity", IEEE Trans. Automat. Contr. AC- 29, 170-172. Masry, E. and Cambanis, S. (1980) "Signal identification after noisy, nonlinear transformations", IEEE Trans. Inform. Theory IT-26, 50-58. Narendra, K.S. and Gallman, P.G. (1966) "An iterative method for the identification of nonlinear systems using the Hammerstein model", IEEE Trans. Automat. Contr. AC-l1, 546-550. Pawlak, M. (1991) "On the series expansion approach to the identification of Hammerstein systems", IEEE Trans. Automat. Contr. AC-36, 763-767. L W. Sandberg, (1991) "Approximation theorems for discrete-time systems", IEEE Trans. Circuits Syst., CAS-38, 564-566. Stone, C. (1982) "Optimal global rates of convergence for nonparametric regression", Ann. Statist. 10, 1040-1053.

PATCHING MONTHLY STREAMFLOW DATA - A CASE STUDY USING THE EM ALGORITHM AND KALMAN FILTERING

PEGRAMGGS Department of Civil Engineering University of Natal King George V Avenue 4001 Durban, South Africa Water Resource Systems in many parts of the world rely almost exclusively on surface water. Streamflow records are however woefully short, patchy and error-prone, therefore a major effort needs to be put into the cleansing, repair and possible extension of the streamflow data-base. Monthly streamflow records display an appreciable amount of serial correlation, due mainly to the effects of storage in the catchment areas, both surface and subsurface. A linear state-space model of the rainfall-runoff process has been developed with the missing data and the parameters of the model being estimated by a combination of the Kalman Filter and the EM algorithm. Model selection and outlier detection were then achieved by recursively calculating deleted residuals and developing a cross-validation statistic that exploits the Kalman filtering equations. The method used here that relates several streamflow records to each other and then uses some appropriate rainfall records to increase the available information set in order to facilitate data report, can be developed if one recasts the above models in a state space framework. Experience with real data sets shows that transformation and standardization are not always necessary to obtain good patching. "Good" in this context is defined by the crossvalidation statistic derived from the deleted residuals. These in tum are a fair indicator of which data may be in error compared to the remainder as a result of them being identified as possible outliers. Examples of data patching and outlier detection are presented using data from a river basin in Southern Africa.

INTRODUCTION Water resource system designs depend heavily on the accuracy and availability of hydrological data in addition to economic and demand data; the latter being more difficult to obtain. However system reliability is becoming more commonly based on simulation for its assessment. 449 K. w. Hipel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 449-457. © 1994 Kluwer Academic Publishers.

450

G. G. S. PEGRAM

In order to simulate one has to have data to mimic and when records of streamflow in particular are short or patchy, the reliability of the system as a whole can be severely called into question. In southern Africa there has been a strong emphasis on water resource system development and analysis since the savage drought of the early eighties and again in the early nineties until late 1993. The Lesotho Highlands Scheme for hydro-electric generation for Lesotho with a spin-off of water supply bought by the Republic of South Africa bears testimony to the expense and ingenuity required to fuel major economic growth in the region. Part of the water resources analysis programme involved the patching of rainfall data in Lesotho to enable streamflow records to be constructed and in addition the lengthening and repair of existing streamflow records in South Africa. The methodology discussed in this paper was developed to support this endeavour. PATCHING STREAMFLOW VIA REGRESSION Streamflow records are strongly correlated to records of streams which flow in their vicinity and less strongly correlated to rainfall records where the rain gauges are located on or around the appropriate catchments. This is especially so with short-term daily flows but is somewhat better for the monthly data typically used in water resources studies, which primarily involve over-year storage. Where data are missing in a streamflow record it is tempting to use other streamflow and raingauge records to infill the missing data. There are several difficulties which arise when one uses conventional regression techniques. The first and most serious is that there are frequently pieces of data missing in the records of the control stations being used to infill the record of the target station. These gaps may and often do occur concurrently. A conventional regression requires that special steps be taken such as re-estimating the regression for the reduced set by best sub-set selection procedures etc. or alternatively, abandoning the attempt to infill or patch. A second difficulty arises from the strong dependence structure in time, due to of catchment storage, making standard regression difficult to apply to the untransformed flows. The third problem is one of seasonality which violates the assumption of homoscedasticity. A fourth is the problem of nonlinearity of the processes. These problems were addressed by Pegram (1986) but the difficulty of the concurrent missing data was not overcome in that treatment. Most other methods of infilling are defeated by the concurrently missing data problem. THE EM ALGORITHM IN CONJUNCTION WITH THE KALMAN FILTER A powerful method of infilling data and simultaneously estimating parameters of a regression model was suggested by Dempster et al (1977). The EM algorithm exploits estimates of the parameters to estimate the missing data and then uses the repaired data set to estimate the parameters via maximum likelihood; this alternating estimation procedure is performed recursively until maximization of the likelihood function is achieved. Shumway and Stoffer (1982) combined the EM algorithm with a linear state-space model

PATCHING MONTHLY STREAMFLOW DATA

451

estimated by the Kalman Filter to estimate parameters in the missing data context and also as a bi-product to estimate the missing data. This procedure will be referred to as the EMKF algorithm in this paper. It was important to ascertain what characteristics of the rainfall runoff process producing monthly streamflow totals could be maintained without sacrificing the validity of the EMKF approach. Specifically one is concerned about the seasonal variation of the process especially in a semi-arid climate where the skewness of the flows during the wet season and dry season tend to be quite different, as do the mean and variance. The state-space model lends itself to modelling the dependence structure we expect to be in streamflow. Whether the linear aspect of the model is violated can only be tested by examining the residuals of the fitted regression model. Thus the EMKF algorithm has promise in addressing the four difficulties associated with patching monthly streamflow referred to above - the concurrently missing data, the seasonality, the dependence and the nonlinearity. CROSS VALIDATION AND DELETION RESIDUALS The EMKF algorithm of Shumway and Stoffer (1983) provides estimates of the parameters and the missing data and some idea of the variance of the state variables. There still remains the problem facing all modellers as to which records to include in the regression and which model to fit where model is defined by the number of lags in time and the paramaterization. The Ale is a possible answer but presents considerable difficulties in the context of the Kalman Filter when data are missing. An alternative was suggested by Murray (1990) using a method of cross-validation for model selection. This technique has the added advantage that the cross-validation calculation reduces a so-called deletion residual which gives estimates of how good the intact data are in relation to the model, and flags possible outliers for attention or removal. The methodology is fully described with examples in Pegram and Murray (1993). APPLICATIONS OF THE EMKF ALGORITHM WITH CROSS-VALIDATION To demonstrate the efficacy of this approach it was decided to perform the following experiment: take some well-known, intact streamflow and rainfall records hide a section of one of the streamflow records compare the infilled record with the observed, hidden section comment on the appropriate model. Twenty-eight years (1955 to 1983 inclusive) of the streamflow record into the Vaal Dam in South Africa were selected for the experiment together with flow into an upstream dam - Grootdraai - and six rainfall gauges scattered around and in the catchment. Three years of the Vaal Dam flows (1964 to 66) were hidden and re-estimated using the EMKF

452

G. G. S. PEGRAM

algorithm with cross-validation. A variety of combinations of lags varying between 1 and 2 for streamflow and 0 and 1 for the rainfall with between 4 and 6 rainfall gauges being used were the constituents of the suggested model. In Figures 1, 2 and 3 appear three attempts at infilling the missing streamflow data. In Figure 1 the maximum number of lags were used with all available rainguages. The difference between the three figures is in the treatment of the data. The first set is subject to complete standardization i.e. all monthly values were divided by their monthly standard deviations after subtracting the monthly means. In Figure 2, a scaling was performed based on the assumption that the coefficient of variation is reasonably constant throughout the year. Here the series were scaled by their monthly standard deviations without shifting. Figure 3 shows the recorded and estimated flows where the data were untransformed. This version of the model assumes that the parameters in the linear dynamic model are time invariant. It happens that this is the most parsimonious of the three models being specified by only 40 parameters compared to 88 and 64 parameters each for the modelling done for Figures 1 and 2. Comparing the overall estimating performance it appears that the model employing untransformed data depicted in Figure 3 performs best. In Figure 4 are shown the deleted residuals for the 36 months during 1974 to 1976 which were years which most closely corresponded to the "missing" years (deleted residuals are not estimated for missing data but only for intact data). One of the reasons why this section of data was used, is because it includes the largest flow on record - that in February 1974 - which was 2 200 units (millions of cubic metres). The value concerned is shown as a very large deleted residual skewing the regression above the line. This suggests that the nonlinearities have not been satisfactorily handled, however the obvious choice of log-transformation does not eliminate the problem but raises others. CONCLUSIONS A methodology has been suggested in this paper which should provide a useful tool for the water resources practitioner in the search for better ways of repairing and extending streamflow data. ACKNOWLEDGEMENTS Thanks are due to the Department of Water Affairs and Forestry, Pretoria, South Africa, for sponsoring this work and for giving permission to publish it. The software to accomplish the streamflow patching (PATCHS) and the manual to go with it are available from them at nominal cost.

800

f

I Record~.

Months

Series 1

=Standardized)

Figure 1. Comparison of Patched with hidden recorded flows for Vaa1 Dam during 1964-66. The flows are patched using the fully standardised flows in both target and controls.

0

200

it >:E 600 c 0 :E 400

-

~

....

0

I'CII

.! 1000

1200

1400

(Series 1

Comparison - Series 1/Recorded (1964-6)

~ ...,

~

;;

I

;J

>< V,l

I

~

:I:

~

'"0

800

o

200

400

r~ Recorded

Months

- . --Sen~

2I

Figure 2. Comparison of Patched with hidden recorded flows for Vaal Dam during 1964-66. The flows are patched using the scaled (unshifted but divided by monthly standard deviation) flows in both target and controls.

:e

o

c

:;

~600

o u:::

~~

!! 1000

1200

1400

(Series 2 =Scaled)

Comparison - Series 21Recorded (1964-6)

""" """

~

>

~

t:I>

p P

Vt

-

Figure 3.

[ milll __ Recorded



Months

Series 3

(Series 3 = Untransfonned)

Comparison of Patched with hidden recorded flows for Vaal Dam during 1964-66. The flows are patched using the untransfonned flows in both target and controls.

0

200

400

0 ~

:E C

600

800

1000

>-

u::

~

I-

0

'ii

It

1200

1400

Comparison - Series 3/Recorded (1964-6)

~

.l>-

v. v.

I~

;;3

en

~

~

g

II)

-4

·2

0

-



600



,

• 1600

Vaal Monthly Flows

1000

2000



2600

Scatter plot of the deleted residuals and recorded flows for Vaal Dam for the years 1974-76 which include the largest deleted residual in the record. It is positive, thus the data-point concerned has been under-estimated by the model.

+ ....

. - ......

. 2+

6

Figure 4.

>

III

'i

~

Ii

11

D::

II

"';j=

7i

8

10

Deleted Residuals and Vaal Flows (1974-6)

~

~

~

~

~

o o

0\

PATCHING MONTHLY STREAMFLOW DATA

457

REFERENCES Dempster, A. P., N.M. Laird, and D.B. Rubin, (1977) "Maximum likelihood from incomplete data via the EM algorithm", 1. of the Royal Statist. Soc., Ser. B, 39, 1-38. Murray, M. (1990) "A Dynamic Linear Model for Estimating the Term Structure of Interest Rates in South Africa", Unpublished Ph.D. thesis in Mathematical Statistics, University of Natal, Durban. Pegram, G. G. S. (1986) "Analysis and patching of hydrological data", Report PCOOO/OO/4285, by BKS WRA for Department of Water Affairs, Pretoria. 124 pages. Pegram, G. G. S. and M. Murray, (1993) "Cross-validation with the Kalman Filter and EM algorithm for patching missing streamflow data", Resubmitted to Journal of American Statistical Association in January. Shumway, R. H. and D. S. Stoffer (1982) "An approach to time series smoothing and forecasting using the EM algorithm", Journal of Time Series Analysis, Vol. 3,253-264.

RUNOFF ANALYSIS BY THE QUASI CHANNEL NETWORK KDEL IN THE 'l'OYOHIRA RIVER BASIN

H.SAGA

Dept.of civil Eng.,Hokkai-Gakuen Univ.,Chuo-ku,Sapporo 064,Japan T • NISHIMURA

Hokkaido E.P.Co.,Toyohira-ku,Sapporo 004,Japan M.FUJITA

Dept.of civil Eng.,Hokkaido Univ.,Kita-ku,Sapporo 060,Japan INTRODUCTION

This paper describes runoff analysis using the quasi channel network model of the Misumai experimental basin, which is part of the Toyohira River basin. The Toyohira River flows through Sapporo which has a population of 1.7 million. Four multi-purpose dams are located in the TOYohira River basin; thus, it is very important to verify the runoff process not only analytically but also based on field observations. In this study, a quasi channel network model and a three-cascade tank model were adopted as runoff models. Both provide for treatment of the spatial distribution of rainfall. OUTLINE OF THE MISUMAI EXPERIMENTAL BASIN

The Misumai experimental basin is located near Sapporo, Hokkaido, Japan. An outline of the basin is shown in Figure 1. It lies at latitude 42°55' N and longitude 141°16' E. The catchment area of this basin is 9.79km2 and the altitude ranges from 300m to 120Om. The basin contains brown forest soil and is mainly covered with forest. This basin is equipped with five recording raingauges, an automatic water level recorder, soil moisture content meters, a turbidimeter and snow-gauges. RUNOFF MODEL

The physical process model adopted in this No:RalnuuJe research is the quasi channel network model shown in Figure 2. This figure shows flow Misumai exper~ direction as determined from the elevations Figure 1.The mental basin. of adjacent grid points. The smallest 459 K. W. Ripel et al. (eds.), Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 459-467. © 1994 Kluwer Academic Publishers.

H. SAGA ET AL.

460

Figure 2. Quasi channel network

model.

.IrCII)

/

mesh-size is 250x250nl. Each Misuni link of this model has an indeExperilental pendent sub-catchment area. The 100 Basin tank model transforrndng rainfall into discharge in the sub+ catchment is adopted because 80 this model includes the mechanism of rainfall loss. The three-cascade tank model is 80 shown in Figure 3, and the state variables and parameters of this 40 model are defined in the next section. The data from the five raingauges indicate that the 20 distribution of rainfall intensity generally depends on altitude. Figure 4 shows the o 200 400 BOOAltitudeC.) distributions of 9 rainfall FiguEe 4. Distribution of rainfall. events observed in 1989. In a large-scale rainfall, the amount of rainfall is proportional to the altitude, though for a small-scale event, the amount is independent of altitude. The combination of a quasi channel network model and tank model appeared to be effective for treating the spatial distribution of rainfall because a quasi channel network model can treat the rainfall data observed by five raingauges as multi-inputs. The estimation of the velocity of propagation of a flood wave is one of the problems that must be solved in order to apply this model to practical cases of runoff analysis. In this paper, the velocity Vis assumed to be spatially constant along the channel, and not vary with time. Figures 5 and 6 show the

QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN

461

r(II/IO.in)

<:>

0

~IVW"I'T

No.1 l:r=1l3.. E. L=750.

No.1 l:r=51.5 •• E. L=750.

co

co

0

0

T

No.2 l:r=97.5 .. E.L=600.

No.2 l:r=S3.S .. E. L=600.

rr

co 0

0

No.3 l:r=94 •• E. L=510.

rr

co 0

0

No.4 l:r=88 .. E.L=460.

.....

U1U1Ua1

lllllJlIU

No.3 l:r=46.511 E. L=51O.

,..

No.4 l:r=43 .. E. L=460.

<0

0

0

No.5 l:r=49.S .. E. L=390.

No.5 l:r=84.S.. E. L=390. q(II/IOlin)

q(u/lO.in)

V=2.0./sec

V=2.0./sec

Obs er ved

U)

o

o 20 t(hr)

Figure 5.Comparison between

observed and calculated hydrographs.

Calculated

o

20 t(hr)

Figure 6.Comparison between

observed and calculated hydrographs.

H. SAGA ET AL.

462

hyetographs and hydrograph for a heavy storm and weak rainfall respectively. In this calculation, V was assumed to be 2.0 mlsec and the tank model parameters were set by trial and error. In spite of using the same values for the tank parameters and propagation velocity, the calculated results were in good agreement with the observed ones. IDENTIFICATION OF TANK PARAMETERS BY THE EXTENDED KALMAN FILTER

TECHNIQUE

Researchers have developed some mathematical optimization methods for estimating unknown variables, notably, the Powell method and the Davidson-Fletcher-Powell method. In this paper, the unknown parameters are identified by the Extended Kalman Filter technique which automatically estimates the unknown parameter and requires much less corrputer memory. The state variables are the discharge Q and the storage depths in each tank, which are denoted by CI' C2 , and C3 , respectively. The unknown parameters are the coefficients of lateral runoff hole aIu ' aIm' all' a2 and a3 , and coefficients of the permeable hole bl' b2 and br The heights of each hole, hIu ' hIm' hll , h2 and h3 are constant. The continuity equation of the tank model is shown as follows. d;1 ==/1

"" r(/) - qluY(Cl,hlu) - QlmY(Cl, hIm) - QllY(Cl,hll) - il (I)

(1)

d;2

-/2 =il(t)-Q2Y(C2,h2)-i2(/)

(2)

d;3

=/3 =i2(/)-Q3Y(C3,h3)-i3(/)

(3)

where, Qi =ai x (C1 -hi) ,ij '" bj x Cj, Y(C,h) ...

k{tan- C ;h + ~}, (0 < 1

£

<)

Y(C,h) is a Heviside function and £ '" 10-6

By introducing the special function of Y(C,h) into the continuity equation, calculation is considerably eased as there is no need to consider the relationship between the runoff hole height and the storage depth. The discharge is calculated as follows;

The unknown parameters are identified at each time that the discharge is observed. (a) state variables and model parameters state variables C 1 =Xl(/), C2 =X2(/), C3 =X3(/), Q =x4(/) model parameters alu = Xs(t), aIm -= X6(/), all == X7(t), a2 = Xg(t) a3 =X9(t), b 1 "'XlO(t), b2 =xll(t),b 3 =XI2(t)

QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN

463

(b) System equation In this paper, the system equation is described by a set of non-linear ordinary differential equations, which are linearized and transfonned into discrete equations. The resulting set of equations in vector fonn is given as:

Xk+1

where,

=

(~:

)

k+1

=

(~I ~2 ) ( ~: ) k+ ( ~ ) ( :

(Xd = (XI (X 2 ) = (Xs

X2 X6

X3 X7

) +

(~k )

(5)

x4l Xg

X9

XIO XII XI2)T

:discrete transfonning coefficient(varies with time) :vector of system error

~i,r,d Wk

(c) Measurement equation The measurement equation in the Extended Kalman Filter is given by the following relationship: (6)

where,

(Hd =

(

0 0 0 1 ),

(H2) =

(

0 0 0 0 0 0 0 0 )

observed discharge at time k, computed by eq.(4) : measurement error

MODEL VERIFICATION

°[jcl"""""""

i

i

~

i

Through a simulation method, 00 -; (t) O:Discharae by the initial value under the assumption that the en X:Dischar,e h the first value q (t) O:Oiscahrge h the second value true value of the parameters are ° -;: - :Oischarse by the true value known, the approach described on the preceding section is sub~ stantiated. The discharge computed using known values of parameters is assumed to be the observed data and denoted as a solid line in Figure 7. The Inio tial values of the parameters were set up as being different from the true values; The symbol denotes the discharge cal10 20 t(hr) culated by using the initial Figure 7.Results of calculation using E.K.F. values. The identification

: j

"<>"

464

H. SAGA ET AL.

~r---------------------------~

o

81u

0r---------------------------~

00

0

bI

0

------- -

'-

(0. 04) (0. 01)

t

0

~,---------------------------~

o

81m

01- -

"
-

b2

-(o.OO})

(0.

o-1 )

--=

0 CO 0 0

b3

- - --(6'-:-0·05)

-------

0

1-0 00

0 0

82

:Parameters after the first pass :Parameters after the scond pass :True value of parameters

I- \ '-

(0.003) 0 LO

0

83

0

Figure a.Variation of the parameters for tank model.

QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN

465

process by the Kalman Filter was r(u/tO.ln) repeated to refine the estimates of the parameters; parameters J~" '1' • • ••• • • '~;.;;',:: • obtained by the Extended Kalman Filter were used to be the initial value of the parameters for the next identification process. The curve denoted by the symbol "x" is the discharge obtained by using parameters Observed identified after the first pass through the Extended Kalman FilCalculated ter and "0" denotes the discharge obtained after the second pass. Figure 8 shows the time variations of the parameters for the tank model identified by the presented method. The values in parentheses and the dot-dash o to 20 t(hr) line show the true values of parameters. The broken line de- F1~ 9.Comparison between observed and calculated hydrographs notes the estimated parameters using E.K.F. after the first pass; the solid line shows these after the second pass. It is clear that estimates for the unknown parameters converge to the true values. There is another problem in the application of this technique to the quasi channel network model. A sub-catchment area is such a small area that a discharge can not be observable. Consequently, model parameters have to be identified by using data observed in a very small basin. This small basin is situated near the Misumai experimental basin and its catchment area is of 1.7 knl and its topography is similar to the Misumai experimental basin. Figure 9 depicts the hyetograph and hydrograph observed in the small basin. The symbol "x" in this Figure denotes the discharge calculated by using data identified by the Kalman Filter. Figure 10 depicts the observed data in the Misumai experimental basin and the curve denoted by "x" is the discharge calculated by the quasi channel network model and tank model using the parameters shown in Figure 9. There is a slight difference during the recessive period, however, good agreement is obtained during the rising period.

CONCLUSION A model based on the combination of a quasi channel network model and tank model has been shown to be an effective model for treating the spatial distribution of rainfall. The estimates of the unknown parameters of the model converge well toward to the true values in successive, automatic iterations of an Extended Kalman Filter.

H. SAGA ET AL.

466

No. I Lr=39.5u E.L=75011

No.2 L r=42. 5.. E.L=600.

No.3 L r=45. 5u E. L=510.

No.4 Lr=48.5111 E.L=46011

No.5 L r=43. 5.. E.L=390. co q (l1li11 011

in)

V=3.01/sec Calculated Observed U")

00

o

XJ-..J.....L--l-I

I I I I I I I I I LLLI I I I I I I I

10

20 t(hr)

Fi~ lO.Comparison observed hydrograph and results by quasi channel network model.

QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN

467

REFERENCE S. Kobatake, Y. Ishihara (1983): "Synthetic runoff model for flood forecasting", Proc. of JSCE, Vo1.337/II, pp.129-135. (in Japanese) K.Hoshi (1985) : "Fundamental study on flood forecasting study (2) ", Monthly Report of C.E.R.I, No.386, pp.48-68. (in Japanese) Powell,M.J.D. (1964): "An efficient method for finding the minimum of several variables without calculating derivatives" , Corn. Jour. , pp.155-162. Fletcher,R., Powell,M.J.D. (1963): "A rapidly convergent descent method for minimization", Com.Jour.6., pp.163-168. Chao-lin Chiu(1978): "Application of kalrnan filter to hydrology and water resources", T . Yasunaga, K. Jinno, A. Kawamura (1992): "Change in the runoff process into an irrigation pond due to land alteration", Proc. of H.E, JSCE, Vol.36, pp.629-634. (in Japanese)

AUTHOR INDEX

K Kapur, J. N. 149 Kesavan, H. K. 149 Kojiri, T. 363 Krzyzak, A. 435

A Alpaslan, N. 135,177 B Baciu, G. 149 Bardossy, A. 19 Bass, B. 33 Benzaquen, F. 99 Bobee, B. 381 Bodo, B. A. 271,285 Bogardi, I. 19 Bosworth, B. 301 Bruneau, P. 381

L Lachtermacher, G. 229 Lai, L. 99 Lall, U. 47, 301 Lee, T.-Y. 87 Lennox, W. C. 77 Lettenmaier, D. P. 3 Lewandowski, A. 333 Liu, C.-L. 87

C Chili, C.-L. 121 Claps, P. 421 Cohen, J. 33 Corbu, I. 99

M Matsubayashi, U. 347 McLeod, A. I. 245, 271 Mesa, O. J. 409 Murrone, F. 421 Muster, H. 19

D Dillon P. J. 285 Druce, D. J. 63 Duckstein, L. 19

N Nishimura, T. 459

F Fujita, M. 205, 459 Fuller, J. D. 229

p Panu, U. S. 191,363 Pegram, G. G. S. 449 Penn, R. 99 Pereira, B. de B. 105 Perreault, L. 381 Perron, H. 381 Phatarfod, R. M. 395 Poveda, G. 409

G

Goodier, C. 191 H Harmancioglu, N. B. 135, 163 Hashimoto, N. 205 Hayashi, S. 347 Hipel. K. W. 245,271 Huang, G. 33

R Rajagopalan, B. 47, 317 469

470

S Saga, H. 459 Sales, P. R. H. 105 Satagopan, J. 317 Singh, V. P. 135 Srikanthan, R. 395 T Takagi, F. 347 Tao, T. 77,99 Tarboton, G. 47

U Unny, T. E. 363 V Vieira, A. M. 105

W Watanabe, H. 217

y Yamada R. 217 Yin, Y. 33 Yu, P.-S. 87

Z Zhang, S. P. 217 Zhu, M.-L. 205

AUTHOR INDEX

SUBJECT INDEX

C Canada - U.S. Great Lakes Water Quality Agreement 272, 282 Central Southern Italy watersheds 422 change of information 175 circulation patterns (CPs) and classifications 19-21 circular pipe 121, 124-129 Clearwater Lake, Ontario 285 climatic change 1 clustering/clustering analysis 194,196,363 Cold Lake, Alberta 38, 44 Columbia River Basin, Washington and Oregon 317,322-323 conceptual-stochastic models 422-427, 434 convergence of identification algorithms 436,440,448 cost effectiveness of monitoring network 138, 140, 142-143, 147 cross series (CS) model 193, 195 cross validation 451 cumulative periodogram 263, 265, 266, 268 curse of dimensionality 301

A Abelson-Tukey trend test 245, 248 Akaike information criterion 369, 380, 451 aggregation of data 422 Aras river basin, Eastern Turkey 168, 171 atmospheric variables 3 autocorrelation 279-280 autocorrelation function (ACF) 349, 352, 358,264,267 autoregressive (AR) 340-341 autoregressive integrated moving average (ARIMA) 223 autoregressive moving average (ARMA) 106107,110,116,230,234,239,334335, 340, 342, 422, 424, 427-431, 434,445-446 autoregressive moving average exogenous (ARMAX) 106-107, 111, ll6 auto-series (AS) model 193-194 B backfitting 276 'back-of-the-envelope' methods 396 back propagation 229, 231-232 backward propagation 207-208 Bayes/Bayesian 11-12, 149-155, 161 Bayesian decision theory 142 bin filter 276 bode plot 338-340 Box-Jenkins model identification 334 Box-Jenkins models 229-231, 237 Box-Cox transformation llO, ll6 Brillinger trend test 245-248 British Columbia (B.C.) Hydro 63, 65-66, 68-69,72 Butternut Creek, N.Y. 205, 215-216

D DAD analysis 350,354,358 data vs. information 137, 141, 147, 164 de-acidification trends 285 decision-making 80-81 decision support system for time series analysis 263 dynamic efficiency index (DEI) 179, 187 E

East River watershed, China 82-83 efficiency of water and wastewater treatment plants 177-178 ELETROBRAS 105, 109, ll6 471

SUBJECT INDEX

472

EM algorithm 450-451 emptiness of reservoirs 400-402 English River at Sioux Lookout, Ontario 196 entropy 119 error back propagation algorithm 221-222 estimators of hurst exponent 409-415 European circulation patterns 21-23, 27 expected forecast 99 exploratory data analysis 334 extended Kalman filter 462-463, 465

F Fei-Tsui reservoir, Taiwan 90, 93 fluctuation analysis 224-225 fluid-mechanics principle 121, 133 forecasting 61, 229-230 forward-propagation 206-208 fourier series 347-348, 350 fourier spectrum 348-349, 351-352, 355356,358 frequency transfer function 335 frequentist 150 fuzzy inference 205, 213-215, 367 fuzzy rule (FR) 19, 21, 24, 27-30 G

hidden Markov model (HMM) 11 heuristic forecast 99-102 Hurst effect 409-411,415-416, 418-419 Hurst exponent 409-415, 418-419 hybrid backpropagation (HBP) 237 hydraulics 121 Hydro-Quebec, Canada 381-383 hysteresis effects 280 I identification of nonlinear systems 435 infilling of missing data 191 inflow forecasting 99 information based design 142 information content 139 interval runoff prediction 215

J joint probability density function (joint p.d.f.) 301 K Kalman filter 449,465 Kalman filtering theory 223 k-d tree 304, 306 kernel density estimator (k.d.e.) 301-302, 304-305, 307, 310 kernel probability density estimators 48-

Gaussian kernel 304 general circulation models (GCMs) 34, 38, 44 generalized additive models 276 GEOS diagrams 409-410, 419 GEOS-H diagrams 409, 415-419 global atmospheric general circulation models (GCMs) 4-6, 9, 12-15 global circulation model (GCM) 20 Grand and Saugeen Rivers, Ontario 273, 275-276, 279-282 Great Lakes precipitation 263-265 Great Salt Lake, U.S.A. 302-303, 306 grey prediction model (GPM) 34, 39-40, 44 grey theory 34-38

L 'lag components' 230, 234 learning 229, 231, 233, 235 learning process 221 linearity and nonlinearity 436 linearized spectrum 338-344 linearized transfer functions 335,337-338 locally weighted regression (lowess) 317, 321-322, 324-325, 329

H hammersten systems 436, 445-447

M Mann-Kendall trend test 245,248,249

54

kernel regression estimate 435-437, 439 k-mean algorithm 366, 368-369, 372, 378 knowledge based classification 23 Kolmogorov's power law 348, 352-353, 358 kriging 317-319, 322, 324-325, 329

SUBJECT INDEX

marginal cost model 67 Markov process 73, 78-79 Markovian 194 Markovian arrival process 426-427 Markovian probability 367, 372, 375 maximum entropy principle 149, 151, 160161 mean integrated square error (MISE) 437 memoryless cascade system 436-439 minimum cross-entropy principle 149, 151, 155, 161 minimum euclidean distance 364 minimum phase property 336 Misumai experiment basin, Japan 459 monotonic trend 245 Monte Carlo method 222, 317 moving average 338-339 multicollinearity 381, 384 multiple regression 382-383 multivariate autoregressive moving average (MARMA) 106-107,114-116 multivariate autoregressive moving average exogenous (MARMAX) 107 multivariate regression 385-389, 391 N Nagoya Meteorological Observatory, Aichi, J~u3W

473

open-channel 121, 129-133 optimality 138 'order' and 'disorder' 183

p parameter estimation 421 patching streamflow 449 pattern recognition system 364 periodic independent residual ARMA (PIRARMA) 424-425 pH 287-288, 291, 295 phase angle 348, 351, 353, 355-356 pollution spreading in rivers 435 Porsuk river basin, Turkey 167-168 precipitation 47, 317, 347 prediction 237, 367-368, 383 prediction of daily water demands 217 probabilistic forecast 99, 102 Q quasi channel network model 459 R rainfall 347 rainfall forecasting model 87, 92 rainfall-runoff forecasting model 87-89, 9293 real-time operations 80-81 redundancy of information 164, 167, 169, 1~

network efficiency and flexibility 141-142 regression equation 402, 406 ReMus 381-382, 385, 387-392 neural network (NN) 19,21,25,27-30,203 Nile River flows 266 reservoirs i Brazil 109 NKERNEL 302, 307 reservoir size 395 nonhomogeneous hidden Markov model (NHMM)ridge regression 384-385 8, 11-12 risk 33 non linearity 217,222 Rocky Island Lake, the Mississagi River 103 nonparametric estimation 48 nonparametric statistical estimation 301 runoff analysis 459 non point source water pollution 271 runoff prediction 205 non stationary stochastic processes 409, s 414-415 Schwartz's Bayesian criterion 90 numerical evaluation 78, 84 seasonality 395 nutrient phosphorus 272 seemingly unrelated autoregressive moving average (SURARMA) 106-107, o 112-114, 116 Ontario Hydro 99

474

Seka Dalaman Paper Factory, Turkey 184 sensitivity analysis 226, 234, 236 simulation 54-57, 64, 347, 357-358, 367, 422,427-434,463 shot noise 422, 425, 427, 431-434 smoothing spine ANOVA (ss-ANOVA) 317, 319-320, 322, 324-325, 329 SO~- 290-291, 295-296 Southwestern Ontario Great Lakes tributaries 273-274 spatial analysis 299 spatial distribution 350, 355-356 spatial distribution of rainfall 460 spectral analysis 331 spectral transfer function 335 station discontinuance 163 stochastic dynamic programming (SDP) 64,66-67,69-72,74 stochastic precipitation models 3, 6, 9 storage forecasting model 93 storage-runoff forecasting model 88-89, 9293 streamflow modelling 361 streamflow forecasting 78-80, 85 T

thermodynamic efficiency 179, 185, 187 Thames river at Thamesville 368 three-cascade tank model 460 transfer function models 88 trend analysis 243 two-reservoir system 78, 82 U uncertainties 121 univariate autoregressive moving average (Univariate ARMA) 110 universal law 125, 128 user interface 99 V

Vaal Dam, South Africa 451 variogram fitting 319 velocity distribution 121-125, 129-131, 133

SUBJECT INDEX

W water budget modelling 41 water quality monitoring networks 135 weather classification schemes 6-8 Wiener systems 436,445-447 wet/dry spell 48-49 Willamette River Basin, U.S.A. 322-323 Williston Lake, British Columbia 63-69, 74 Woodruff, Utah 48, 51, 54

z z transform theory 335

Water Science and Technology Library 1. AS. Eikum and R.W. Seabloom (eds.): Alternative Wastewater Treatment. Low-Cost Small Systems, Research and Development. Proceedings of the Conference held in Oslo, Norway (7-10 September 1981). 1982 ISBN 90-277-1430-4 2. W. Brutsaert and G.H. Jirka (eds.): Gas Transfer at Water SUifaces. 1984 ISBN 90-277-1697-8 3. D.A Kraijenhoff and J.R. Moll (eds.): River Flow Modelling and Forecasting. 1986 ISBN 90-277-2082-7 4. World Meteorological Organization (ed.): Microprocessors in Operational Hydrology. Proceedings of a Conference held in Geneva (4-5 September 1984). 1986 ISBN 90-277-2156-4 5. J. Nemec: Hydrological Forecasting. Design and Operation of Hydrological Forecasting Systems. 1986 ISBN 90-277-2259-5 6. V.K. Gupta, I. Rodriguez-Iturbe and E.F. Wood (eds.): Scale Problems in Hydrology. Runoff Generation and Basin Response. 1986 ISBN 90-277-2258-7 7. D.C. Major and H.E. Schwarz: Large-Scale Regional Water Resources Planning. The North Atlantic Regional Study. 1990 ISBN 0-7923-0711-9 8. W.H. Hager: Energy Dissipators and Hydraulic Jump. 1992 ISBN 0-7923-1508-1 9. V.P. Singh and M. Fiorentino (eds.): Entropy and Energy Dissipation in Water Resources. 1992 ISBN 0-7923-1696-7 10. K.W. Hipel (ed.): Stochastic and Statistical Methods in Hydrology and Environmental Engineering. A Four Volume Work Resulting from the International Conference in Honour of Professor T. E. Dnny (21-23 June 1993). 1994 1011: Extreme values: floods and droughts ISBN 0-7923-2756-X 1012: Stochastic and statistical modelling with groundwater and surface water applications ISBN 0-7923-2757-8 10/3: Time series analysis in hydrology and environmental engineering ISBN 0-7923-2758-6 10/4: Effective environmental management for sustainable development ISBN 0-7923-2759-4 Set 10/1-10/4: ISBN 0-7923-2760-8 11. S. N. Rodionov: Global and Regional Climate Interaction: The Caspian Sea Experience. 1994 ISBN 0-7923-2784-5 12. A Peters, G. Wiuum, B. Herrling, D. Meissner, C.A Brebbia, W.G. Gray and G.F. Pinder (eds.): Computational Methods in Water Resources X. 1994 Set 1211-1212: ISBN 0-7923-2937-6

Springer-Science+Business Media, B.V.

More Documents from "Geomar Perales"