Design Engineer

  • Uploaded by: saipavan999
  • 0
  • 0
  • March 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Design Engineer as PDF for free.

More details

  • Words: 57,442
  • Pages: 136
Loading documents preview...
732

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS JOINT WORKING GROUP

D2/C2.41

JUNE 2018

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS JWG D2/C2.41

Members A. DEL ROSSO, Convenor R. JAMIESON M. EIJGELAAR M. HABJA G. ARROYO FIGUEROA G. LAKOTA S. NOURI

US UK NL FR MX SI IR

T. BORST, Secretary T. XIA D. CORTINAS E. PFAEHLER S. CHEN S. RAJAGOPALAN X. CHEN

NL US FR DE US US CN

Contributing Members G. SANTAMARÍA M.Y. HERNANDEZ PÉREZ M. BASTOS

MX MX BR

A. HERNÁNDEZ D. MARAGAL

MX US

Copyright © 2018 “All rights to this Technical Brochure are retained by CIGRE. It is strictly prohibited to reproduce or provide this publication in any form or by any means to any third party. Only CIGRE Collective Members companies are allowed to store their copy on their internal intranet or other company network provided access is restricted to their own employees. No part of this publication may be reproduced or utilized without permission from CIGRE”. Disclaimer notice “CIGRE gives no warranty or assurance about the contents of this publication, nor does it accept any responsibility, as to the accuracy or exhaustiveness of the information. All implied warranties and conditions are excluded to the maximum extent permitted by law”.

WG XX.XXpany network provided access is restricted to their own employees. No part of this publication may be

reproduced or utilized without permission from CIGRE”. Disclaimer notice

ISBN : 978-2-85873-434-4

“CIGRE gives no warranty or assurance about the contents of this publication, nor does it accept any

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

EXECUTIVE SUMMARY Objective The CIGRE Joint Working Group No. D2/C2.41 is a joint effort between the study committees D2 and C2. It has surveyed and examined current practices, industry trends, and new research on the use of various data sources and analytics tools to enhance situational awareness of system operators, as well as on the data-integration and -management technologies to facilitate effective implementation of data-analytics applications in the control room and to support operation engineers. Motivation The increasing complexity and interconnectivity of modern electric grids, in addition to the highly stringent reliability, economic, and environmental constraints, impose the need to provide system operators and operation engineers with better tools for assessing system conditions and to support them on making critical decisions. Fortunately, the large variety of internal and external data sources that are available to electric utilities opens up the possibility to implement advanced data-analytics and -visualization technologies to improve the way the system is operated and controlled. Analytics algorithms capable of synthesizing actionable information from the raw data can be used to provide tools that use real-time data streams to support fast, accurate, and adaptable decisions solving critical problems at the right moment, as well as to plan mitigation actions against anticipated system security issues. Using data to make critical operational and business decisions is certainly not new to the electricity industry. Indeed, techniques for data analysis have been applied to several areas such as load forecasting, predictive asset maintenance, crew scheduling, outage management, and demand response, among others. Nevertheless, the maturity and practical implementation of data-analytics applications to support the operation of power systems remains relatively low compared to other areas and industries. Therefore, it is very valuable to examine how advanced data-analytics technologies can be further used to solve the emerging critical challenges in operating electric systems. Approach The content of the technical brochure is broken down into the major areas that are relevant for the development and implementation of data-analytics tools, which are: data and information sources, data-analytics techniques to interpret this data, applications of data analytics in system operations, data integration and modelling to integrate data into operations, and data quality. This document has six main sections, each of them addressing one of these topical areas. The content in each section is intended to provide the reader with an informed and comprehensive starting point to understand the relevant issues and challenges in each area. The sections discuss latest advances in terms of data-analytics methodologies, data-management and -integration tools, applications development, and new trends and emerging technologies. Value This technical brochure provides useful insight on how advanced data-analytics techniques and tools that integrate various data sources can be used to improve situational awareness of those who operate power systems and to support various operation functions. This work is expected to be useful for Cigre members in the following areas: 

Operators of transmission and distribution systems will gain knowledge on how new dataanalytics and -visualization technologies can help improve situational awareness.



Product vendors will assist in identifying gaps in the market and potentially new uses for existing products.



Application and system developers will better understand what the challenges are for operations and the need for better analytics tools.



Researchers will assist in recognizing new areas for research and the application of this research.



Consultants and project engineers will provide relevant reference material.

3

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Summary of Relevant Conclusions and Takeaway The importance of decision support and situational awareness for system operators is becoming more prominent as significant changes in the way systems need to be operated and controlled occur. Improved situational awareness at all control levels is necessary to ensure that operational decisions are properly made and executed, which is critical for maintaining the integrity of system operations. Therefore, while operators will remain at the core of grid operations, it is becoming more and more important that they are supported by advanced data-analytics and -visualization tools. Like in other complex systems, the amount of automation in power systems to assist operators has increased to a great extent. Even though advanced automation is essential in modern power systems, there is still a human in the loop with the potential for human error, especially if an operator has limited visibility of system conditions. Indeed, one of the risks of increased automation in the grid is that operators may become less aware of the current system conditions. Consequently, the operation support tools should be able to process and exhibit to the operator in a concise an effective manner. Data Sources, Data Management, and Integration Over the past few years, many companies have started their “big data” projects and are competing to bring a set of information and communication technologies that are largely new to the utility industry. Several tools that make use of advanced data-analytics techniques and integrate data for various sources have been developed and implemented. These tools serve a variety of functions to support system operation, including: tools to detect system events, identify and analyze faults, conduct widearea monitoring, monitor and analyze equipment health, trend and forecast load, monitor the conditions of renewables and systems, and make recommendations for system operation. Many dataanalytics techniques have been around for years, many of them appearing in the 1990s. Also, a great deal of research has been devoted to the use of data analytics in operation support. The difference today is the advancements made in handing big data analytics and the adoption throughout the industry. Visual analytics is key is to improving the ability of an operator to have situational awareness and make effective decisions. Providing interactive visual interfaces helps analysts and system operators to get a better impression of possible symptoms and suspicious behavior and to understand the performance of a power system to increase the situational awareness. The way in which system data is presented to the operator can support the strengths and reduce the effects of limitations of human perception and performance. Advanced visualization techniques enable a wider array of situationawareness capabilities to handle the increased complexity of system operation. The way that data and information can be displayed and exposed to operators has evolved to a great extent over the years, as the technology has progressed. Today’s visualization technologies have advanced a long way from old-fashioned static visualization used in the past. Newer visualization platforms include geographic-based dynamic visualization with user-friendly interfaces and real-time measurements and analytical results from measurement-based and model-based tools that populate the system map. Visual aids such as color contouring, 2-D and 3-D bubbles and cones, animation, geo-spatial representation, display profiles, and integrated system views are widely used in newer visualization tools. New trends in visualization in control rooms are based on the concept of integrated space and time, which is intended to help operators to assess current situations in a static fashion, to understand and visualize evolving conditions in a power system, and to get better prepared to implement effective control actions. In general terms, integrated space-time tools include three main functions: situational awareness on first sight, projection of future status, and recommendations for operators. As an example, RTE in France has developed a time-driven concept of situational awareness. The objective is to create an application that will provide the operator a single user interface based on the hyper-vision concept. The application is intended to help system operators to focus on the actions that they must take by presenting at the right time the relevant information that they need to make the right decisions. The designed system—called Apogee—performs security analysis on forecasted system conditions in the near future through data-analytics and modelling tools and displays relevant information to the operator only when it is needed. That is, the hyper-vision user interface remains empty as long as no potential unsecure conditions are detected within the time horizon of the analysis. In that way, the tool reduces the effects of limitations of human perception and improves operator responsiveness and performance.

4

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Barriers and Needs for Research and Development In general terms, it can be stated that adoption of data analytics to support the operation of a power system is gaining momentum, as utilities have started realizing the great potential of the technology. Nevertheless, there is still a long way to go before these technologies are widely used across the various operation functions. One of the barriers for more extensive use of data-analytics tools that integrate various data sources is the shortage of standardized data structures. Well-defined standards-based data models are essential to support advanced applications, analytics, and visualizations used in grid operations. Even though significant progress has been made in this area, more effective and accurate data models and procedures are needed for ensuring data integrity and availability of the right data in the right format. Another aspect that to some extent hinders the implementation of data-analytics solutions in system operation is the lack of understanding of the value and accuracy of these technologies. Traditionally, tools used in control centers and operation engineering have been mostly based on system models and simulations. Because a diverse set of sources of power system and external sensor data are now becoming available, a hybrid approach can be used to developed superior technical approaches and software tools that have the potential to be implemented in system control rooms to support system operators. Those tools will combine conventional analytics techniques based on physical models with heuristic data analytics and decision-making methodologies. For instance, simulation engines would perform contingency analysis and vulnerability/risk analysis across several different possible scenarios that may be built with the help of data collected from a variety of sources.

5

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

CONTENTS EXECUTIVE SUMMARY ............................................................................................................................... 3 OBJECTIVE .................................................................................................................................................................................. 3 MOTIVATION .............................................................................................................................................................................. 3 APPROACH .................................................................................................................................................................................. 3 VALUE ........................................................................................................................................................................................... 3 SUMMARY OF RELEVANT CONCLUSIONS AND TAKEAWAY ..................................................................................... 4

1.

INTRODUCTION AND BACKGROUND...................................................................................... 11

1.1

MOTIVATION ............................................................................................................................................................... 11

1.2

OBJECTIVE .................................................................................................................................................................... 12

1.3

APPROACH ................................................................................................................................................................... 12

1.4

SITUATIONAL AWARENESS....................................................................................................................................... 14

WHY IS IT IMPORTANT....................................................................................................................................................... 16 CONCLUSION .......................................................................................................................................................................... 18

2.

DATA SOURCES IN ELECTRIC POWER SYSTEMS .................................................................... 21

2.1 DATA FROM MONITORING AND PROTECTION DEVICES................................................................................... 21 2.1.1 Digital protective relays .................................................................................................................................... 21 2.1.2 Recorders ............................................................................................................................................................. 22 2.1.3 Revenue meters ................................................................................................................................................... 22 2.1.4 Synchrophasor s .................................................................................................................................................. 23 2.1.5 Remote terminal unit (RTU) ................................................................................................................................ 23 2.1.6 Power quality meters ......................................................................................................................................... 23 2.1.7 SCADA (supervisory control and data acquisition) ....................................................................................... 23 2.2 DATA FROM EQUIPMENT SENSORS ........................................................................................................................ 24 2.2.1 Circuit breaker .................................................................................................................................................... 24 2.2.2 Transformer.......................................................................................................................................................... 25 2.2.3 Distributed generation (solar and wind) ......................................................................................................... 27 2.2.4 BESS: Battery energy storage system ............................................................................................................. 28 2.2.5 New sensors for equipment monitoring ........................................................................................................... 30 2.3

NON-ELECTRICAL DATA SOURCES (EXTERNAL DATA) ......................................................................................... 31

2.4

COMMUNICATION REQUIREMENTS FOR SMART GRID DATA ........................................................................... 32

2.5

REFERENCES .................................................................................................................................................................. 36

3.

DATA-ANALYTICS TECHNIQUES ................................................................................................ 37

3.1 DATA MINING AND ASSOCIATION RULES ............................................................................................................ 38 3.1.1 Brief definition ..................................................................................................................................................... 38 3.1.2 Technical description .......................................................................................................................................... 38 3.1.3 Application domains ........................................................................................................................................... 38 3.1.4 Potential applications......................................................................................................................................... 39 3.2 K-NEAREST NEIGHBOR............................................................................................................................................... 39 3.2.1 Brief definition ..................................................................................................................................................... 39 3.2.2 Technical description .......................................................................................................................................... 39 3.2.3 Application domains ........................................................................................................................................... 39 3.2.4 Potential applications in smart grid ................................................................................................................. 39 3.3 MACHINE LEARNING .................................................................................................................................................. 40 3.3.1 Supervised and unsupervised learning ........................................................................................................... 40

6

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

3.3.2 3.3.3 3.3.4 3.3.5 3.3.6

Linear regression ................................................................................................................................................. 41 Decision and regression trees ........................................................................................................................... 43 Artificial neural network .................................................................................................................................... 45 Support vector machine (SVM) ......................................................................................................................... 47 K-means and clustering ...................................................................................................................................... 49

3.4 PROBABILISTIC NETWORKS ...................................................................................................................................... 51 3.4.1 Bayesian networks .............................................................................................................................................. 51 3.4.2 Bayesian classifiers ............................................................................................................................................. 53 3.4.3 Decision networks ................................................................................................................................................ 53 3.5 DEEP LEARNING ........................................................................................................................................................... 54 3.5.1 Brief definition ..................................................................................................................................................... 54 3.5.2 Technical description .......................................................................................................................................... 54 3.5.3 Potential applications in smart grid ................................................................................................................. 55 3.6 VISUAL ANALYTICS ..................................................................................................................................................... 55 3.6.1 Brief definition ..................................................................................................................................................... 55 3.6.2 Related research areas and challenges ......................................................................................................... 55 3.6.3 The visual-analytics process .............................................................................................................................. 56 3.7

REFERENCES .................................................................................................................................................................. 57

4.

APPLICATIONS OF DATA ANALYTICS IN SYSTEM OPERATIONS ........................................ 61

4.1

INTRODUCTION ........................................................................................................................................................... 61

4.2 DATA VISUALIZATIONS IN REAL-TIME SYSTEM OPERATION .............................................................................. 61 4.2.1 Visualization technologies in control centers .................................................................................................. 62 4.2.2 Example of control room visualization at ISO – ERCOT case ..................................................................... 71 4.2.3 Emerging trends in control room visualization................................................................................................ 74 4.3 DATA ANALYTICS IN SYSTEM OPERATION SUPPORT PROCESSES ................................................................... 77 4.3.1 Real-time situational awareness with PMU data........................................................................................... 77 4.3.2 Fault identification, location, and analysis ..................................................................................................... 80 4.3.3 Real-time stability assessment .......................................................................................................................... 84 4.3.4 Alarm processing and filtering ......................................................................................................................... 85 4.3.5 Renewable energy generation forecasting and storage analytics ........................................................... 85 4.3.6 Damage prediction (weather related or due to other causes) ................................................................... 86 4.3.7 Outage restoration analytics ............................................................................................................................ 87 4.3.8 Power quality analytics (including voltage control) ...................................................................................... 88 4.3.9 Peak load management (via demand-side management analytics) ......................................................... 89 4.3.10 Load research analytics and energy portfolio management analytics ..................................................... 89 4.3.11 Non-technical loss analytics ............................................................................................................................... 89 4.3.12 Physical and cyber security assessment analytics ......................................................................................... 89 4.3.13 Dynamic assessment of transmission line capacity (dynamic line rating) .................................................. 90 4.3.14 Cable thermal monitoring.................................................................................................................................. 93 4.4

SUMMARY OF INDUSTRY SURVEY ........................................................................................................................... 94

4.5

REFERENCES .................................................................................................................................................................. 99

5.

DATA INTEGRATION AND MODELING .................................................................................. 103

5.1 DATA MODELING PROCESSES FOR SYSTEM OPERATIONS ............................................................................. 103 5.1.1 Model information and its usage ................................................................................................................... 103 5.1.2 Model update procedure and lifecycle ........................................................................................................ 106 5.2 DATA MODELS AND OPEN STANDARDS.............................................................................................................. 107 5.2.1 Why do we need a common data model?................................................................................................... 107 5.2.2 IEC standardized data models ....................................................................................................................... 107 5.2.3 Example of harmonization between CIM and IEC 61850 ........................................................................ 112 5.3 IMPACT OF NEW TECHNOLOGIES AND NEW DATA SOURCES ON DATA MODELING ............................ 112 5.3.1 Impact of Synchrophasors on operations data modeling .......................................................................... 112 5.3.2 Impact of renewable energy on operations data modeling..................................................................... 113 5.3.3 Impact of equipment health condition monitoring on operations data modeling .................................. 115 5.4

ADVANCED DATA INTEGRATION MODELING CASE STUDY ............................................................................ 121

7

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

5.5

CONCLUSIONS .......................................................................................................................................................... 123

5.6

REFERENCES ................................................................................................................................................................ 124

6.

DATA QUALITY AND VALIDATION ......................................................................................... 125

6.1

INTRODUCTION ......................................................................................................................................................... 125

6.2

DATA QUALITY PROBLEMS ...................................................................................................................................... 125

6.3 DATA QUALITY ASSESSMENT ................................................................................................................................. 126 6.3.1 Data Interpolation ............................................................................................................................................ 127 6.3.2 Data Profiling .................................................................................................................................................... 128 6.3.3 Data quality Assessment Framework ............................................................................................................. 129 6.4 DATA QUALITY PROBLEM CORRECTION .............................................................................................................. 130 6.4.1 Impact Assessment............................................................................................................................................. 131 6.4.2 Correction and Cleaning ................................................................................................................................. 131 6.4.3 Scavenging of Essential Causes ...................................................................................................................... 133 6.4.4 Monitoring and Prevention .............................................................................................................................. 133 6.5

CONCLUSIONS .......................................................................................................................................................... 133

6.6

REFERENCES ................................................................................................................................................................ 134

7.

CONCLUSION ............................................................................................................................. 135

FIGURES AND ILLUSTRATIONS Figure 1-1: Aspects to address the development and implementation of analytics techniques using various data sources .................................................................................................................. 13 Figure 1-2: levels of situational awareness................................................................................... 15 Figure 1-3: Representation of operator mental model based on training and experience .................. 16 Figure 1-4: Relationship between analytics and visualization complexity ......................................... 17 Figure 1-5: Illustration of the role of system operators in highly automatic environment ......... Erreur ! Signet non défini. Figure 2-1: Basic structure of a battery energy storage system...................................................... 29 Figure 2-2: Requirements of a smart grid network........................................................................ 34 Figure 3-1: k-NN classification of abnormal PMU data ................................................................... 40 Figure 3-2: Supervised learning (upper rectangle) and unsupervised learning (lower rectangle) ....... 41 Figure 3-3: A simple DT model for detecting faults in a transmission line........................................ 44 Figure 3-4: ANN1 schematic diagram of a feed forward NN ........................................................... 46 Figure 3-5: ANN2 information processing in ANN .......................................................................... 46 Figure 3-6: A toy example of a linearly separable problem ............................................................ 48 Figure 3-7: A toy example of clustering transmission lines during storm using K-means ................... 50 Figure 3-8: Example of a simple Bayesian network ................................ Erreur ! Signet non défini. Figure 3-9 : Examples of conditional probability tables .................................................................. 52 Figure 3-10: Deep learning components ...................................................................................... 55 Figure 3-11: The visual analytics process ..................................................................................... 56 Figure 4-1: Overview of a control center monitor display .............................................................. 62 Figure 4-2: Examples of schematic network diagrams ................................................................... 63 Figure 4-3: Contour showing voltage magnitudes with values below 0.98 per unit........................... 64 Figure 4-4: Examples of contour gradients for continuous values ................................................... 64 Figure 4-5: Situational awareness by 2D bubbles ......................................................................... 65 Figure 4-6: 3D display showing bus voltages and generator reserves ............................................. 66 Figure 4-7: Example of situation awareness by 3D cones .............................................................. 66 Figure 4-8: Example of situation awareness by 3D cones .............................................................. 67 Figure 4-9: Example of animated power flow arrows in distribution feeders .................................... 67 Figure 4-10: Visualization of dispersed generation in operator workstation at RTE ........................... 68 Figure 4-11: Visualization of dispersed generation in the general panel at Red Eléctrica del España .. 68

8

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-12: Examples for geopatial network diagrams ................................................................. 69 Figure 4-13: Integrated system view with Icons and Info boxes .................................................... 70 Figure 4-14: Distribution network visualization ............................................................................. 70 Figure 4-15: Distribution network visualization ............................................................................. 70 Figure 4-16: ERCOT control room - 2016 ..................................................................................... 72 Figure 4-17: ERCOT control room - load and generation details display and quick start/non-spin graphs...................................................................................................................................... 73 Figure 4-18: ERCOT control room – wind generation .................................................................... 73 Figure 4-19: ERCOT control room - real-time sequence monitor .................................................... 74 Figure 4-20: ERCOT control room - system voltage overview display .............................................. 74 Figure 4-21: CORESO control room (www.coreso.eu) ................................................................... 75 Figure 4-22: Example of control actions displayed in the main interface of Apogeé .......................... 76 Figure 4-23: Example of the time-based constraint display in Apogeé ............................................ 77 Figure 4-24: Swings in the map .................................................................................................. 79 Figure 4-25: power system status, change time range .................................................................. 79 Figure 4-26: Schematic display with recognized islands................................................................. 80 Figure 4-27: Phase angle difference of the voltages between different PMUs .................................. 80 Figure 4-28: Correlation of a feeder outage and lightning strike..................................................... 82 Figure 4-29: Smart Cable Guard system and web interface, showing the location of increasing partial discharge activity over time ........................................................................................................ 83 Figure 4-30: Example of open PQ Dashboard display .................................................................... 83 Figure 4-31: Example of Synchrophasor -based frequency stability monitoring ................................ 84 Figure 4-32: Framework proposed in [17][18] for real-time dynamic security assessment combining PMU data analytics and high performance dynamic simulation....................................................... 85 Figure 4-33: Wind forecasting and optimization tools .................................................................... 86 Figure 4-34: Digging damage prediction model ............................................................................ 87 Figure 4-35: Smart meter based outage management .................................................................. 88 Figure 4-36: Power quality analytics tool ..................................................................................... 88 Figure 4-37: Analyzing system load ............................................................................................. 89 Figure 4-38: Dynamic powerline capacity assessment ................................................................... 90 Figure 4-39: SUMO architecture .................................................................................................. 91 Figure 4-40: Exceptional weather events ..................................................................................... 92 Figure 4-41: Thunderstorm – lightning activity and rainfall event notification .................................. 92 Figure 4-42: Visualization platform ODIN-VIS screenshot .............................................................. 92 Figure 4-43: National Grid (U.K.) Cable Thermal Monitor............................................................... 93 Figure 4-44: Responses to survey – Section 1, Question 1 ............................................................ 95 Figure 4-45: Responses to survey – Section 1, Question 2 ............................................................ 95 Figure 5-1: Dominion Virginia Power EMS modeling data ............................................................ 105 Figure 5-2: EMS Winter Build Lifecycle ...................................................................................... 106 Figure 5-3: Common Utility EMS modeling update process ......................................................... 107 Figure 5-4: Data modeling on Smart Grid Architecture Model framework ...................................... 108 Figure 5-5: IEC 61 850 modeling approach[7] ............................................................................ 110 Figure 5-6: Sources and actors[8] ............................................................................................. 111 Figure 5-7: RES Data Integration/Modeling Diagram ................................................................... 115 Figure 5-8: Proposed Concept to Incorporate Equipment Condition Information indices into PRA Calculations ............................................................................................................................ 118 Figure 5-90: Overview CIM class model for breaker health integration environment ...................... 120 Figure 5-101: Location of UML diagrams and modifications for the breaker health integration ........ 121 Figure 5-11: Asset and Network Model Integrated Solution Architecture ....................................... 122

TABLES Table Table Table Table Table Table

2-1: Monitored parameters of circuit breakers ..................................................................... 25 2-2: Status of different condition assessment techniques for power transformers ................... 26 2-3: Different sensors and output data ............................................................................... 27 2.4: Solar measurement and description ............................................................................. 27 2.5: Wind turbine sensors and applications ......................................................................... 28 2-4: Example of the minimum required BESS signals for a EMS (SICAM microgrid control) ...... 30

9

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Table Table Table Table Table Table

2-4: 2-5: 2-6: 2-7: 2-8: 2-9:

General requirement of communication in power system............................................... 32 Networks and associated communication requirements ................................................. 33 Communication requirements in terms of latency and data time window ........................ 33 Network requirements for smart grid applications ......................................................... 34 Technology supporting each particular application (L – Low, M – Medium, H – High) ....... 35 Communication technology options ............................................................................. 36

10

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

1. INTRODUCTION AND BACKGROUND 1.1 MOTIVATION The motivation behind this technical brochure is to assist in addressing the growing gap between the challenges arising from an increasingly interconnected world and humans who are still required in the control loop. The topics of situational awareness—especially data management and analytics—are large subject fields in their own right. When combined, perhaps the combination is a challenge too large to solve as a single project. Therefore, this brochure breaks the challenge down into many component parts and tries to focus on the relevant areas within these subjects. This growing gap between the new challenges and humans in the control room is driven by a wide variety of factors, which also have their own internal drivers behind them and do not necessarily consider the direct impact that the gap is having on the electricity utility industry. With increasing complexity and interconnectivity of the grid, the scope and complexity of maintaining an increasing situational awareness have grown. As a consequence, there is a need to furnish system operators and operation engineers with better tools for assessing system conditions and for providing effective and timely decision-making and remedial reactions to an incident. It is not enough to just understand the current state. Situational awareness implies also the ability to anticipate system changes and their impact on system security. These issues will only become more challenging as a wide variety of technologies categorized under the generic “smart grid” concept have been deployed, including advanced control, monitoring systems, and a wide array of new measurement devices on the transmission system, in substations, and on consumer’s premises. These technologies and systems result in the collection of tremendous amounts of data related to the performance and management of the transmission system. In addition, there are many other new sources of data that can be very valuable for planning, operating, and managing the system such as external GIS data, satellite data, weather data, lightning data, and data from renewable resources, storage, and demand response. Some of these diverse sources of data represent tremendous opportunities to operate the transmission system more efficiently and more reliably. The aim of retaining or even increasing situational awareness as the system becomes more complex is to guarantee the quality of decision-making regarding system integrity. This requires combining the aforementioned different areas of data management, with new types of analytics to make use of such data, as well as proper information and computation technologies and procedures to properly integrate and manage the data. Advanced data-analytics techniques have been developed and used in a variety of applications in many different industries and organizations. Using data to make critical operational and business decisions is certainly not new to the electricity industry. Indeed, techniques for data analysis have been applied to several areas such as load forecasting, predictive asset maintenance, crew scheduling, outage management, and demand response, among others. Additionally, big data analytics is being used in distribution systems to convert massive data streams from smart meters and distributed energy resources into actionable information for grid operations. Nevertheless, it is very valuable to examine how data analytics experience from other industries, as well as from former implementations in the utility industry, can be used to solve the emerging critical challenges of electric systems, considering that the power industry is not as mature as other businesses in their use of analytics but has some great opportunities ahead to address the challenges with situational awareness. To ensure that this brochure is relevant to this wide area of users, a survey was prepared (see Section 4) and circulated among as many Cigre members as possible across a diverse geographical area. The results from this survey confirmed that this technical brochure addresses a challenge that many members feel is growing in importance.

11

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

1.2 OBJECTIVE The objective of this technical brochure is to address the increasing importance of situational awareness in grid operation and to give an overview of the most relevant developments in data analytics and data integration associated with situational awareness. It aims to identify future needs by addressing some of the fundamental questions that this growing challenge of maintaining an increasing situational awareness in a complex system implies:           

What does situational awareness mean for the electricity utility industry? What are the future needs for improved situational awareness and better operator decisionsupport tools (including tools for operation engineers, protection, etc.)? Is the new data that is available through smart grid investment useful for accomplishing the future needs and requirements for future solutions? Who are these new data sources for, and do they fulfill their needs? What data-analytics techniques and tools are needed to transform the large inflow of data into actionable information? What is the present status of the use of data analytics to support the operation of power systems? What technologies are needed for handling data and performing integration? What models and data-integration technologies are needed to automate the processes and enable actionable information, based on data from many sources, to reach the appropriate users. How can quality of data be properly assessed and improved to make the analytic solution more valuable and reliable? What are organizations doing in this space currently and in the future? What areas do organizations need to focus on to address this challenge?

Cigre is ideally placed to draw together these different elements because it has a large knowledgeable contributor base and can disseminate any learning to a wide audience. Therefore, this technical brochure is aimed to assist its members in the following areas:     

Transmission and distribution operations: essential for all levels of transmission from distribution level to system operators to gain knowledge on how new data-analytics and visualization technologies can help improve situational awareness. Product vendors: assist in identifying gaps in the market and potentially new uses for existing products. Application and system developers: better understand what the challenges are for operations and the need for better analytics tools. Researchers: assist in recognizing new areas for research and the application of this research. Consultants and project engineers: provide relevant reference material.

1.3 APPROACH This technical brochure aims to present the collective thinking from a wide range of industry experts across a broad range of perspectives to the different challenges involved in providing and maintaining situational awareness by breaking the problem down into the different areas involved in this challenge. Collection of rich data—complemented by system modeling, advanced data analytics, and emerging decision-support tools—has the potential to improve the possibility of predictive analytics that can enhance situational awareness and improve decision-making. In general terms, the development and successful implementation of data-analytics tools involve addressing specific aspects on various domains, as depicted in the Figure 1-1.

12

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Use Cases

Data Sources

New Decission Support Tools

Advanced Analytics Techniques

Data Models and Integration

Data Quality and Validation

Figure 1-1: Aspects to address the development and implementation of analytics techniques using various data sources

The main aspects associated in these areas can be summarize as follows: 

Identify data sources: There are an increasing number and variety of data sources in electric power systems arising from grid modernization and investments in the smart grid. In order to examine data-analytics applications and the best solutions to deploy, it is essential to understand the characteristics and availability of such large volumes of data.



Identify advanced analytics techniques: A suite of data-analytics techniques can be used for different applications to support system operations. Data analytics can reveal patterns, predict the prospective outcomes, and recommend appropriate decisions. In combination with visualization, data-analytics techniques can be effectively used to improve situational awareness of operators. Analytics algorithms can also be used to examine raw event data to provide descriptive analysis of the event. Understanding the underlying theory behind the analytics tools, their common and potential uses, and the advantages and implementation challenges is critical to select the techniques that best suit the problem at hand.



Identify use cases: The first step is to define the use case applications. The tools for operation support that can benefit from advanced analytics are not only tools to be used by system operators in the control room but also applications for engineers who support various operations related processes such as contingency analysis, outage scheduling and management, load and renewable forecasting, protection, models management, components rating calculation, compliance service, and special operation studies. From the technical viewpoint, the applications can be decided based on specific needs and preferences. However, the final decision may require consideration of several other factors such as alignment with the overall enterprise data-analytics strategy and roadmap.



Apply data models and integration: Well-defined standards-based data models are essential to support advanced applications, analytics, and visualizations used in grid operations. These include the need for accurate data models, procedures for ensuring data integrity, and availability of the right data in the right format. Indeed, data interoperability has been one of the main challenges for implementation of data analytics that use multiple data sources. There will be a growing importance of sharing data between parties, and this common approach is vital to ensure that this can take place efficiently and effectively.



Improve data quality and validation: It is clear that the overall quality of the data used in analytics applications significantly impacts the accuracy and trustworthiness of the outcome. It is then essential to put in place quality-assessment and -improvement processes to ensure

13

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

that the data used in the various applications meet the minimum standards of data quality to guarantee meaningful results. This technical brochure is designed to provide the reader with an informed and comprehensive starting point to understand these issues. Each section aims to discuss and incorporate the latest advances in the relevant area in terms of technology and approach to this particular challenge. The structure of this technical brochure is as follows: Section 1 – Introduction and Background: The second part of this introductory section introduces the concept of situational awareness in relation to electricity power networks and discusses why it is important to implement advanced data-analytics applications in the control room and departments that support system operations. Section 2 – Data Sources in Electric Power Systems: This section provides a description of the many data sources that can be found in an electric power system. It covers both traditional sources commonly used for monitoring, protection, and control and new or non-conventional data sources that emerge from smart grid technologies. It also describes data sources that are external to the electric system but can be accessed and used for power system applications and decision-making. It also describes the communication requirements of each dataset type to ensure that data reaches the different data-analytics applications with the required quality, velocity, and availability. Section 3 – Data-Analytics Techniques: The main advanced data-analytics techniques that can be used for a variety of operation-support tools are described in this section. The description of each of these techniques includes a definition, technical description with some mathematical details, common application domains, and potential applications in a smart grid. Section 4 – Applications of Data Analytics in System Operations: This section describes an extensive array of applications in power systems and the various tools and techniques identified in the previous section. A survey of existing practices, tools, and techniques using various sources of data to improve situational awareness and provide operation decision-making support is presented. Section 5 – Data Integration and Modeling: This section examines typical data modeling processes in electric utility transmission organizations to explain how data are assembled in the power industry for secure and reliable grid operation. To illustrate the concepts, it presents an example of an actual data-integration project in a large utility in the U.S. Section 6 – Data Quality and Validation: The importance of good data quality and the methods of validating this data are presented in this section. Section 7 – Conclusions: This section summarizes the main findings and conclusions. It identifies the future states, gaps, and research needs to move the utility industry to a more extensive use of data-analytics technologies to support the operation of a power system. The brochure concludes by discussing and presenting several conclusions, but due to the nature of the challenge, it will not provide “one solution to fit all.” Instead, it aims to leave the reader in a more informed position and with a valuable source of reference and further reading.

1.4 SITUATIONAL AWARENESS Situational awareness: “The perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future” –

Endsley.

The definition by Endsley is considered a classical definition of situational awareness. Although it is a very high level definition, the application in electrical power systems is very relevant. Aspects of situational awareness in power systems are:   

Perception and meaning: What is going on? Comprehension: How does this all relate to each other and the system? Projection: What does this mean for the near future (and what can I do about it)?

14

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

As grid operations become more complex—due to increasing variability in demand and supply balancing through new (and often “intelligent”) types of loads, renewable integration, and crossborder integration of systems—situational awareness becomes more challenging. To cope with this, automation and automated decision-making have become essential for grid operations. However, this creates a new level of complexity and makes the system less intuitive and transparent. So, to increase situation awareness—or, at minimum, keep it on par—new tools for analysis and decision support for grid operators are essential. Figure 1-2 describes the levels of situational awareness required for grid operation under the new conditions. It is a significant challenge to move upwards on these levels. Certainly, it requires an understanding of the various elements within this problem space, hence the need for the problem to be broken down in the different sections that are addressed in this brochure. Therefore, situational awareness from the perspective of electrical power systems can be interpreted as the continual assessment of the current and future state of the system in order to be able to respond with the correct measures to reach a desired goal, such as keeping the operating conditions within the appropriate boundaries, as well as reducing risks and increasing efficiency.

Figure 1-2: levels of situational awareness

This is not limited to the awareness within the central control room but also incorporates “awareness” and response from local equipment, often referred to as “edge processing.” More and more localized control systems are implemented in the grid or at its perimeters, such as in smart inverters from renewable energy sources feeding into the grid. Situational awareness thus includes the awareness of how different active control mechanisms work together. While the local active control in general helps to reach the goals of the central control, it sometimes can work against it, leading to undesired or even dangerous situations. Within the operation of transmission systems, there has been a focus on situational awareness because maintaining system reliability has always been crucial. Now this operational situational awareness becomes increasingly more important for distribution operation as well, increasingly also down to the low voltage levels of the grid. The growing amount and variability of data now available to operators within all levels of control centers is changing the control centers beyond recognition. Operators in control centers routinely receive system-related information, such as voltage, frequency, current, power flows, network topology, etc. However, the knowledge derived from asset data that is accessible by asset managers, equipment subject-matter experts (SMEs), and field staff is now finding its way into operators in control centers and is taken into consideration in operating the grid. A common element with all levels of control centers are the use of human operators. Even though technology has moved on, there currently is still a requirement to keep a human in the loop. Over time, operators build up a mental model of how the network works and behaves under certain conditions based on their training and experience. This will not change in the future, because this

15

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

mental model holds the overall picture that still includes much that cannot be taken over by (selflearning) algorithms. Thus, a mental model remains far more superior and flexible than algorithms (even though operators will be more and more supported by algorithms). Thus, it is very important that the correct models have been learned by the operators, because it is possible to present the correct information to an operator while they still make an incorrect decision.

Figure 1-3: Representation of operator mental model based on training and experience

It is therefore very important to understand that situational awareness is in the mind of the operator (see Figure 1-3), so while addressing all the highly complex analytics and technological challenges, we must not forget that there are also many traditional things that can be done to improve and maintain the situational awareness of the operator, such as:   

Focused training Increase experience What–if simulations

These elements are beyond the scope of this brochure but are addressed by other Cigre working groups. It is important to ensure that the advanced analytics information is presented in the best possible way. This issue can be addressed with appropriate HMI standards applied consistently across various visualizations within different systems, to ensure that when operators swap between systems that they understand what the key information presented means (e.g. using reserved colors: Red means an operator needs to take action now, and Yellow means that a system is moving toward an unsafe state).

Why is it important Situational awareness is becoming increasingly important because of the increasing complexity of power systems. It becomes more and more difficult to completely grasp the dynamics in the grid because of the risks associated with the increasing interaction of technology on power systems.

16

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

As the amount of external data feeds increases, there is a growing need to focus on the inputs into the power system algorithms. It can be inferred from Figure 1-4 that the increasing number of inputs into the system requires new analytics and visualization techniques to be developed and integrated into electricity utilities to create and enhance situational awareness. This is also due to the complexity of the inputs and interactions between them. Whereas the number of outputs in terms of measures has not really increased (e.g. we still measure voltage and frequency), how we visualize these measures does not necessarily require complex visualizations.

Figure 1-4: Relationship between analytics and visualization complexity

Examples of trends that make grid operation more challenging and dynamic are:  Variability of demand, both because of new demand (electric vehicles, heat pumps) and moving demand (demand-side management).  Changing generation mix (increasing reliant on weather, which also impacts demand) requires a more active role of the grid operator.  Market vs physical – the merging of markets across larger geographical/geopolitical zones with unclear impact of different physical power systems (island systems with larger interconnected systems).  Increased cost awareness in the regulated environment: who will pay for decarbonization and the increased system costs (reserves, response of weather variability)?  Tools to actively influence the grid become more easily available (e.g. demand response, gridconnected storage, active switching in the grid, dynamic line rating, grid capacity management, voltage management). Therefore, while situational awareness is becoming more prominent, not only do operators need to become more aware of the situation in their grid, but also equipment itself needs to be aware of the situation outside its direct environment. An example of this is given below:

“Some years ago a smart MV/LV distribution transformer with automatic tap changing based on power electronics with embedded controls was developed, build and fully tested at KEMA (now DNV GL) laboratories, including short circuit capabilities. It was installed in a greenhouse area in western part of The Netherlands to improve the voltage stability and power quality of the local distribution grid.

17

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

The smart transformer functioned according to expectation and it was decided to install a second one, electrically close, that is on the same MV string. As the smart transformers did not communicate with each other and no situational awareness and/or damping control loop was envisioned or implemented they started to react to each other resulting in unstable and oscillating behaviour. The end of the story was that they were removed from the grid.” (quoted from DNV GL white paper power cybernetics -> [https://www.dnvgl.com/energy/publications/download/power-cybernetics.html]) While operators will remain at the core of grid operations, it becomes more and more important that they need to be supported by advanced data analytics and analytics visualization in order to grasp the increased complexity, higher time pressure, and interlocking mechanisms, as indicated in the example below:

In September 2011 the loss of Arizona Public Service’s (APS) Hassayampa-North Gila 500 kV transmission line, effecting over 2.7 million customers. That line loss itself did not cause the blackout, but it did initiate a sequence of events that led to the blackout, exposing grid operators’ lack of adequate real-time situational awareness of conditions throughout the Western Interconnection. More effective review and use of information would have helped operators avoid the cascading blackout. For example, had operators reviewed and heeded their Real Time Contingency Analysis results prior to the loss of the APS line, they could have taken corrective actions, such as dispatching additional generation or shedding load, to prevent a cascading outage. The evaluation report recommends that bulk power system operators improve their situational awareness through improved communication, data sharing and the use of real-time tools. NERC Report 2012. Other sectors experienced the effects of increased complexity of systems in combination with human control. For example, in 88% of aviation accidents, human error was indicated as the cause, 50% of which was caused by air traffic control operational errors [Measurement of situation awareness in dynamic systems, Human Factors, 37(1): 65–84. 1995c.]. Like grid operations, these systems have grown in complexity and time-pressures with an increase in the amount of automation to assist the operators. However, there is still a human in the loop with the potential for human error. A major risk of (the necessary) increased automation in the grid is that operators actually become (relatively) less situationally aware and that automated systems and operators will work against each other, especially in crises and when under high pressure.

Conclusion Situational awareness always was and always will be a major element in maintaining the integrity of the electricity system. However, the importance of situational awareness in growing. As the operational margins are increasingly variable due to the increase of renewables in the energy mix as well as growing amounts of “intelligent” demand based on inverters, the system becomes more decentralized and complex. Therefore, retaining the current level of situational awareness is challenging. The changes in the power system require an increase of situational awareness on all control levels, so that the quality of operational decision-making, which is necessary to maintain system integrity, is kept. This brochure is about state-of-the-art tools and new data sources that enable operators to be aware of the situation in the power system and help them to make optimal decisions in operating it. And, as the complexity of the power system will continue to increase in the future, future needs to increase situational awareness are addressed. Situational awareness is about an integrated picture of the electricity system, including:  Situationally aware automation on a higher level of decision-making of grid operations under high time pressures, taking into account larger parts of the grid instead of the direct environment.  Increasing the situational awareness of operators by visualizing the current situation as well as future situations and scenarios.  Shifting the main focus of the operators to prepare (“prime”) the system for (near) future critical situations, using simulations, (short-term) scenarios, and models. The latter has the additional benefit that operators will gain a much faster and thorough understanding of the system dynamics than they would get based on experience of (hopefully rare) real-life events alone.

18

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

The following chapters address all major elements of situational awareness, starting from data and information sources, data-analytics techniques to interpret these data, applications of these analytics in system operations, data integration and modelling to integrate data into operations, and finally data quality and validation. References: [1]. Endsley, M.R. (1995b). "Toward a theory of situation awareness in dynamic systems". Human Factors. 37 (1): 32–64

19

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

20

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

2.

DATA SOURCES IN ELECTRIC POWER SYSTEMS

In order to examine the advanced data-analytics applications that support system operations, it is necessary to understand the sources and characteristics of the large variety of data that is available in power systems. There is wide range of measurement equipment, sensors, and recorders installed in the power network, and they are capturing a rich amount of data that can be used in analytics applications to extract valuable actionable information. Each recording device has its own characteristics in collecting, processing, and reporting captured data. These devices have specific built-in purposes, but the data that they provide may be used in data-analytics applications for additional objectives. In addition to the data captured by measurement equipment, there is data from non-electrical equipment and data from sources external to the power system that can also be leveraged for power system analytics. This section provides an overview of the many data sources that can be found in power systems. For description purposes, the following provides the classes of data sources [1][2]: 

Data from monitoring and protection equipment



Data from equipment sensors



Non-electrical data sources (external data)

Further, this section provides a description of the communication requirements to make this data available for its use in analytics applications.

2.1 DATA FROM MONITORING AND PROTECTION DEVICES Modern digital and microprocessor-based devices used for various protection monitoring applications are commonly referred as intelligent electronic devices (IEDs) [1][3]. Most of these devices were designed with a very specific, often limited, data-collection function in mind. However, with technological progress, IEDs evolved into more sophisticated devices with new capabilities, including new functionalities and higher quality of data recording. Data from many IEDs could be integrated and used for a variety of analytical applications, provided that standardization, data-recording, and communication issues are properly addressed. The different pieces of equipment are briefly described in the following subsections, including a discussion on the characteristics of recorded data, potential applications, and examples.

2.1.1 Digital protective relays The purpose of the protection function is to continually detect the abnormal/fault condition on a power system and provide a high-speed tripping mechanism to isolate the fault from the rest of the power system. Because the protection function is necessary for safe and normal operation of a power system, protection relays and equipment are considered critical and require high sampling frequency, high accuracy, and low latency data transmission. Hence, the data collection and processing are performed locally and very close to the equipment being protected. The data taken at a high sampling frequency is generally not needed for data-analytics applications. However, the post-event disturbance data may have data-analytics applications for analyzing the behavior of equipment and determining the statistics of events. The equipment that processes the signal information in real time are called protection relays. Although old electromechanical relays are used widely in power systems, these devices are replaced with modern microprocessor-based protection relays in a vast majority of applications, because data in digital format is necessary for data-analytics applications. Only the capabilities of microprocessorbased relays have been highlighted in this report. Protection relays generally measure the voltage and current information on a section of power system. They also are wired with additional alarms and indication signals from power system equipment that they are meant to protect. There exists many types of protection relays within a substation overlooking every section of the power system. In addition to providing high-speed protection, microprocessor-based protection relays also record the signals and status information at the time of disturbance. These are: 

Fault/trip information such as voltage and current magnitudes, angles, circuit breaker status, etc.

21

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

  

Equipment indications/alarms at the time of disturbance. Operating status of protection functions at the time of disturbance. Health of protection relay.

2.1.2 Recorders If a substation has all microprocessor relays, it would be possible to know the condition of a power system at the time of a fault/disturbance event by collecting and analyzing the information triggered in various protection relays. However, because electromechanical relays do not have any capability to record the disturbance information, separate standalone recorders are used to record the disturbance data. Similar to microprocessor-based protection relays the recorders measure voltage and current information from the substation and alarm/indication signals from power system equipment. One of the primary advantages with standalone disturbance recording equipment is that they can be set sensitive to trigger for any abnormal condition, whereas protection relays trigger only during fault conditions. The two types of standalone recorders widely used in power utilities are sequence-of-event recorders and digital transient/fault recorders.

2.1.2.1 SER: Sequence-of-event recorder Large power system equipment such as generators, circuit breakers, and motors have complex operating mechanism, which operate through a sequence of steps. In such equipment, several actuators, sensors, and control elements are connected in a complex configuration. Each of these elements often provides the operating status (0/1) on whether a measuring quantity has exceeded the threshold or an equipment has operated. SERs connect several of these signals and record the status changes with time stamps. Analysis of SER data helps to identify the operation time and performance of each of the control elements and sub-systems. Data-analytics applications can utilize this data to locate a sluggish-performing device and warn about a potential failure event. Proactive steps can be taken to replace the device and help to prevent a catastrophic failure event that causes motor damage.

2.1.2.2 DTR/DFR: Digital transient/fault recorder Digital fault recorders connect the continuous time-varying signals such as voltage, current, pressure, temperature quantities, and provide triggering functionalities to record a disturbance. DFRs continuously monitor these signals and record the transient waveforms on the occurrence of an event. DFRs may also contain few binary signals (0/1) to indicate the status of equipment. Analysis of disturbance snapshot recorded by DFRs provide insight into the transient performance and operational characteristics of power system. The data can be utilized to access the behavior and response of many connected power system equipment. The result of such analysis will help identify the root cause of the disturbance and enable corrective actions. Data-analytics algorithms can utilize DFR to model and access the system-wide health and performance.

2.1.2.3 Dynamic s wing recorders Dynamic swing recorders (DSRs) are especially aimed to capture the dynamic response of the power system as a result of a fault or sudden changes. DSRs exist both as standalone and integrated devices with digital fault recorders. Data is usually stored as RMS or phasor values and sampled from twice a cycle to every ten cycles. DFRs are able to capture swing record lengths from one minute to 30 minutes or pre-post triggering of swing data, and they can be used for several purposes, such as analysis of disturbances, the quantification of power system parameter changes, the investigation of system oscillations, and validation of stability models [1].

2.1.3 Revenue meters Real-time revenue metering and economic dispatch of generation are two of the most important functions in power enabling smooth and efficient operation. Revenue meters are located at the point of interconnection, segregating generators, transmission/distribution owners, and load centers. Metering data consists of capturing highly accurate data at the frequency of the power system, representing magnitudes of voltages, currents, real power, reactive power, and system frequency. The difference between regular meter and revenue meters is the accuracy. Regular meters are used

22

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

for visualization. However, revenue meters are connected to highly accurate revenue-grade instrumentation transformers, and the devices contain filters to selectively choose power frequency components. The real-time meter information is utilized in advanced data analytic algorithms to detect conditions and abnormalities in parts of the power system. Timely analysis can help system operators to take appropriate actions to mitigate these.

2.1.4 Synchrophasors A phasor is the mathematical representation of a continuously time-varying signal in terms of magnitude and angle. A Synchrophasor is a digitized phasor data with a UTC (Coordinated Universal Time) timestamp on each packet. Phasor measurement units (PMUs) are devices that measure the voltage and current quantities in a substation and compute Synchrophasor data of voltages, currents, and real/reactive power flow at a much higher rate than remote terminal units (10 to 120 samples/sec). Because Synchrophasor data utilizes a common time reference, it enables comparing power system state information across a wide geographical area in a common manner. Hence, mathematical operations such as addition, subtraction, multiplication, and division can directly be performed on the Synchrophasor data collected from different sources. This enables access the state of an entire power grid to a much higher granularity and accuracy than previously possible. With the higher penetration of PMUs, complete real-time automated closed loop control from a centralized EMS system becomes viable. Highly accurate Synchrophasor data can reveal the condition of a power system to a greater degree. It is possible to view the power system oscillations and generator dynamic responses in real time.

2.1.5 Remote terminal unit (RTU) To achieve centralized control of a power system, real-time values of voltage, current, real power, reactive power, system frequency, and circuit breaker status information are needed. RTUs connect the circuit breaker status signals and continuous time-varying voltage and current signals and calculate the magnitudes of these quantities. RTUs can be integrated into SCADA systems and connect to wide-area communication networks where the real-time information of these quantities is transmitted to a control center. Further, RTUs are connected to trip and control circuits of generators and circuit breakers to regulate the operation of generators and enable remote connection/isolation of sections of a power system. Hence, RTUs and SCADA systems are annexed in a critical equipment list to achieve centralized control of power to enable efficient and stable operation of a power system. RTU data is utilized in state-estimation algorithms to determine an accurate state of the power system at a given moment. This gives complete visibility of power system depicting its real-time health. Advanced control algorithms are further used to achieve manual and automated close-loop operation. Data-analytics applications can use RTU and other types of data and provide enhanced foresight and situational awareness to the system operator.

2.1.6 Power quality meters Power quality (PQ) meters are designed to record different power quality variations such as impulsive and oscillatory transients, sags/swells, interruptions, under/overvoltage, harmonic distortion, and voltage fluctuations. Usually, the sampling time of PQ meters can be configured according to specific application requirements. The newest generation of PQ meters can sample at rates of 1024 samples per cycle for normal conditions and up to 100,000 samples per cycle for transients [7][8]. Traditionally, PQ meter data has been used by power quality engineers for specific PQ monitoring and assurance purposes. However, the alternative usage of such data has recently been considered and investigated, including condition monitoring of equipment, fault identification, and fault analysis. There is an IEEE working group that specifically focuses on this type of data analytics [5][6][7][8]. Various software applications have been developed to process and analyze power quality databases and automatically combine that data with other power system with data from SCADA, GIS, and network topologies for detection and analysis of events in the grid [9].

2.1.7 SCADA (supervisory control and data acquisition) A power grid is a highly interconnected system between generators and loads, which are spread across wide geographical locations. For efficient and reliable operation of a power system, it is necessary to monitor its state from both a local and central location. SCADA systems connect to RTUs

23

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

in substations to monitor the voltage and current quantities and control the operation of circuit breakers. They also communicate with a central energy-management system for control actions. SCADA provides the following control capabilities: 

Generators: Control the voltage, frequency, and real/reactive power set points.



Transformers: Adjust the tap changers where tap changes are available.



Capacitor/reactor banks: Remotely open/close the banks.



FACTS (Flexible AC Transmission Systems): Control the set points to regulate power flow or system voltage on a section of the transmission system.



Loads: Non-essential loads are operated dynamically to shed at peak load and stressed times as part of demand-response programs.

A SCADA system uses information collected from both binary (0/1) and analog continuous timevarying signals for decision-making. SCADA systems form an important source for data-analytics applications because the data collection and communication infrastructure is already established and readily available at energy-control centers. Operational data collected from SCADA systems from various parts of the power system is continually streamed to a central location.

2.2 DATA FROM EQUIPMENT SENSORS The urgent need to diagnose aging equipment and asset health has led to the development of a variety of sophisticated equipment-based sensors, which enable one to assess the health and performance of different pieces of equipment. Devices that monitor the condition of assets contain equipment-specific intelligence to identify normal and abnormal responses. A condition-monitoring system can be standalone with advanced analytics about specific equipment or it can be part of a multifunction protection relay wherein general health and statistics information is provided. Examples of a standalone condition-monitoring system include vibration-monitoring systems for turbines, partial discharge monitoring systems for generator stators, and dissolved gas analysis (DGA) systems for transformer oil. Common functions embedded in modern multifunction protection relays include circuit breaker monitoring systems, as well as temperature and overload monitoring for transformers, motors, generators, and transmission lines. Data-analytics algorithms can use the information from various condition-monitoring systems to determine the health of major power system equipment in real time and deliver information about equipment health to system operators for situational awareness. In what follows, different types of sensors and condition- and operation-monitoring devices installed in substation and on transmission lines are briefly described. The list is not intended to be exhaustive but rather to exemplify the characteristics and possible uses of sensor data. The descriptions include conventional sensors commonly used in substation equipment, as well as emerging sensors and systems.

2.2.1 Circuit breaker IEDs in circuit breaker (CB) architecture provide precise indications of the CB’s operation condition with an efficient data-logging system. The CB-monitoring (CBM) system encompasses two categories of data collection: real-time and event-based. Breaker relays and CBMs supervise the following parameters in order to provide continuous evaluation of asset health (a detailed list of the monitored parameters is provided in Table 2-1).   

The breaker and trip coil statuses Charging motor conditions SF6 gas quality and heater integrity

During operations, the relays also record concurrent event data that the asset health center (AHC) monitoring system uses to assess breaker performance and maintenance needs. Such events-based data includes:   

Transient recordings of breaker interrupt currents Breaker operation times Trip coil currents

24

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

  

Battery voltages Mechanism charging currents Mechanism charging times Table 2-1: Monitored parameters of circuit breakers Categories

Electric Wear

Accessories

Dielectric

Mechanical

Parameters Contact wear (switch operations) Main nozzle wear Auxiliary nozzle wear Contact resistance Interrupter wear Function of cabinet, mechanism, and tank heaters Number of hydraulic pump starts Total accumulated run hours of the air compressor Total accumulated run hours of the SF6 compressor Insulating oil dielectric strength Rated voltage vs. applied voltage Rated current vs. applied current SF6 moisture content, density, temperature, pressure, and purity High-pressure SF6 moisture content, density, temperature, pressure, and purity Close time and velocity Trip time and velocity Interpole close time and trip time deltas Resistor pre-insertion time Total interrupter travel Mechanical supervision/monitoring (travel curve, times) Energy supervision/monitoring (spring/hydraulic) Motor and coil supervision/monitoring Sensor, heater, and self- supervision/monitoring Remaining energy detection for spring mechanism

The CBM system can support client/server architecture. It consists of the CBM devices attached to the CBs and software running on a central control unit. The main functions of the control unit are:    

Supervise the operating conditions of the circuit breaker. Prevent operation if the circuit breaker is outside its operational capabilities. Execute operating commands when it is safe to do so. Perform data acquisition of signals from the CB control circuit and record sequences of tripping and closing.

When a breaker operates, recorded files are transmitted to the central control unit using wired or wireless technologies. The bandwidth required for real-time data transfer of 15 signals, sampled at 2 kHz, is determined as 576 Kbps. The CBM IED monitors 15 electrical signals from the circuit breaker control circuit. The signals are generated during either tripping or closing of the breaker. Of these 15 signals, 11 are analog and 4 are binary signals. Analog signals include measurement of electrical variables such as phase current, while binary signals indicate the statuses of different components.

2.2.2 Transformer Online monitoring is used continuously during operation and offers possibilities to record the relevant stresses that can affect the lifetime of a transformer. The evaluation of these data offers the possibility of detecting incipient faults early. The addition of an embedded web-server, equipped with powerful data-analysis tools, means that users can manage and interpret information. Table 2-2 illustrates the status of different condition-assessment techniques.

25

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Table 2-2: Status of different condition assessment techniques for power transformers Method Ageing of oil (e.g., color, moisture, and tan δ) Furan in oil analysis

Offline 1 2

Online 1 2

Monitoring 3 N/A

Offsite 1 2

Gas-in-oil analysis (DGA)

1

1

1

1

PD (IEC 60270)

1

2

3

1

Unconventional PD-measurement (e.g., UHF PD measurement)

2

2

3

2

Transfer function (FRA)

1

3

N/A

1

Dielectric diagnostic (PDC and FDS)

2

N/A

N/A

2

Thermal monitoring

N/A

N/A

2

N/A

Degree of polymerization (DP-value)

N/A

N/A

N/A

1

1: Generally accepted or standardized; 2: accepted by different users; 3: under investigation or consideration; moisture measurement.

The control units of modern transformers offer a complete set of communication infrastructure based on IEC 61850-8, including GOOSE messaging, IEC 61850-9-2 Process bus, IEC 60870-5-103 serial communication, and DNP 3.0 slave protocol. The control, embedded webserver, and web-based software units of the transformer work as a SCADA that is used for:       

Incorporation of DGA, PD, and bushing monitoring (BM) in one unit. On-site and online display of DGA, PD, and BM key parameters. Control the operating conditions of the transformer and execute operating commands. Correlation of data from external inputs. Full control and communications via secure, flexible web access. Extensive analysis tools. Full compatibility with asset-management systems.

Dissolved Gas Analysis (DGA): Online DGA represents a vastly improved monitoring process. With online DGA, devices are installed on substation transformers that are capable of:      

Sampling and evaluating dissolved gasses and sending DGA data to back office systems. Integration of online DGA data into operations and maintenance processes. Capturing of data at least once per day and, in some cases, as often as once per hour. Capability of analyzing a larger number of data points, which improves trending analyses. Transmitting online DGA data to an energy-management system (EMS). EMS triggers alarms using a rule engine with preconfigured asset-specific parameters.

Partial Discharge (PD): Electrical discharges appear as various forms of voltage and current impulses that lead to PD and as having a very short duration (nanoseconds). These events radiate electromagnetic energy with a specific spectral signature for which UHF detection is well suited, enabling high levels of refresh rate and accuracy. The online PD indicator and its control unit work as a SCADA system that is used for:     

Radiated electromagnetic energy with UHF detection process, enabling high accuracy. Phase-resolved analysis and UHF detection method based on IEC 60270 rules. Simultaneous operation of PD indicator and SAW temperature monitoring system. Real-time separation of PD events and ambient noise using high-performance algorithms. Sample rate: 100 Mbps; Bandwidth: 16 kHz – 100 MHz or 1 MHz – 35 MHz.

Table 2-3 shows different monitoring techniques, sensor types, possible output data, and the purpose of monitoring.

26

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Table 2-3: Different sensors and output data Monitoring Method and Sensor

Output Data

Purpose of Monitoring

DGA

Analysis of the oil samples (Based on IEEE C57.104 and IEC 60599)

Combustible

Insulation, overheated oil

Spectroscope

Digital

PD UHF sensor Acoustic wave sensor Fiber optic sensor

Digital

Thermal analysis PT100 Thermal camera Fiber

Resistance Digital Digital

Vibration SKF Acceleration sensor

Voltage

Moisture Vaisala Humicap MMT318

Current

Insulation, overheated oil, system leaks, over-pressurization, or changes in pressure or temperature.

Insulation: If there is partial discharge detected, it is possible to locate the fault location accurately by using multiple sensors. Heat can indicate multiple faults. Oil temperature Surface temperatures Temperature directly from windings Loose core clamping or bonding bolts

Insulation

2.2.3 Distributed generation (solar and wind) The increasing number of renewable energy sources such as solar photovoltaic, wind, and microhydro is leading to a substantial generation of electric energy in the form of distributed generation (DG) units within the electric networks. Table 2.4 and Table 2.5 provide some brief descriptions of sensors and measurements for solar and wind power, respectively [16][17]. DGs need a fast and accurate data-transmission system to transfer the measured data and command signals to the relevant central controllers. Therefore, the monitoring of data provides a fundamental operation support of solar and wind power or other DGs. In addition, a proper ICT needs to be developed. This facilitates controlling and monitoring of electricity generation and consumption as well as network remote operation. Table 2.4: Solar measurement and description Monitoring Method and Sensor Pyranometer

Satellite-based irradiance measurement Back-of-module temperature sensor Ambient temperature sensor Current transformers (CTs) Inverter-direct monitoring Inverter temperature sensor Energy meter

Description Measuring irradiance: If there is more than one orientation of the PV array, then a separate pyranometer would be required for each orientation. Taking data from satellites and processing them with models to create an estimate of ground-level irradiance at a site. The temperature sensor with thermal conduction. Ambient temperature. Measuring current from combiner-box home runs measured at the inverter. Measuring production of each inverter. Identify overheating. Reporting at a minimum cumulative energy delivery.

27

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Table 2.5: Wind turbine sensors and applications Sensor Accelerometer sensor Position sensor

Application on Wind Turbine Gearbox monitoring Prop feathering monitoring

Pressure sensor

Dynamic pressure measurement on turbine blade

Temperature sensor Fluid property sensor

Bearing monitoring Gearbox oil monitoring

Level sensor

Gearbox oil level monitoring

Accelerometer sensor

Turbine shroud monitoring

Transformer sensor

Windings temperature monitoring

Temperature sensor

Stator winding monitoring

Vibration sensor

Tower sway monitoring

Position sensor

Tower leveling monitoring

The sensors/meters/actuators in one DG can be connected directly to the local controller through either ADC, GPIO, or serial communication. The received data and any processed outputs can then be transmitted by the reduced function device (RFD), which is connected to the local controller. The transmitted data by the local controller of the DG will be received by the central controller through the full function device (FFD). Alternatively, the sensors/meters/actuators can be connected directly to an RFD, which transmits data to the central controller. This is applicable for measurements from the CB and power distribution lines where no significant computation and control process is required.

2.2.4 BESS: Battery energy storage system Power generation is shifting from large-scale to a highly complex, distributed generation in which cost-efficient integration of renewables is paramount, and the demand for energy is continuing to rise. Therefore, a BESS has to provide energy for a large range of applications to optimize asset performance by stabilizing frequency and voltage and balancing variations in supply and in demand. The typical applications are, but not limited to: Generation  Frequency regulation  Renewable integration  Spinning reserve  Power plant hybridization  Ramp rate management Transmission  Voltage support  Dynamic line rating support  Renewable integration  Dynamic stability support  Loss reduction  Constraint relief Distribution  Residential and industrial backup power  Microgrid and island grid support  Distribution upgrade support  Peak load reduction The data exchange of a BESS can vary because of the different manufacturer structures of a BESS. Figure 2-1 shows the basic BESS elements. Usually, a BESS is conducted by the EMS shown on top.

28

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

EMS Energy Management System

BESS Battery Energy Storage System

SMS Storage Management System

BMS

SCU

Battery Management System

Storage Control Unit

Figure 2-1: Basic structure of a battery energy storage system

Storage Management System (SMS) The SMS of the BESS works as a SCADA system that is used to: 





Provide interfaces to external EMS/SCADAs, along with the appropriate control and communication hardware, to conduct energy-storage applications. Therefore, supported protocols like IEC 61850, IEC 60870-5-104, IEC 60870-5-101, Profibus DP, and Modbus TCP are standard. Control the connected inverters according to the operation mode and its activated control mode, such as: o U/f-Mode: battery as voltage source for reliability, grid improvement, island grid operation, and so on. o P/Q-Mode: battery as current source for energy shifting, energy optimization, and so on. Simultaneously measure, record, and analyze numerous signals such as: o AC voltage and power at POI (point of interconnection) o DC voltage of battery string o Different temperatures of the system o Positions of switching devices o Numerous information about the battery provided by the BMS

Battery Management System (BMS) The BMS is composed of several controllers coordinated to command, protect, and monitor the battery, ensuring maximum longevity and performance of the battery cells.      

Supervision of cell voltage Supervision of module temperatures Calculation of SOC value (state of charge) Calculation of SOH value (state of health) Balancing between the modules Assignment of warning and alarm messages in fault cases (fire, overcurrent, and so on)

29

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Disconnection of batteries from inverters in fault cases

The BMS also tracks and flags the security sensors indicating the state of charge (SoC) and the state of health (SoH) of the storage battery cells. In the fault case, the BMS sends corresponding warning and/or alarm messages to the BESS control unit to react correspondingly to disconnect the battery racks. Storage Control Unit (SCU) The SCU usually controls and coordinates the different inverters according to the operation mode and its activated control mode. Additionally, the system controls various parameters. Start conditions, availability, and operation parameters of each inverter are reported to the SMS Table 2-6: Example of the minimum required BESS signals for a EMS (SICAM microgrid control) Battery

Signal Type

Description

Value

Unit

Name BAT1

DPI (DoublePointIndication) Status

on / off

N/A

BAT1

DPI

Status

ready / failure

N/A

BAT1

SPI

Operating Mode

Grid forming / Grid supporting N/A

Operating Mode

Grid forming / Grid supporting N/A

on / off

(SinglePointIndication) BAT1

CO (Command)

BAT1

CO

Status

N/A

BAT1

AI

Active Power

kW

(AnalogInput) BAT1

AI

Reactive Power

kvar

BAT1

AI

State of Charge

%

BAT1

AO

Active Power Setpoint

kW

Reactive Power Setpoint

kvar

(AnalogOutput) BAT1

AO

2.2.5 New sensors for equipment monitoring A new suite of sensors has been developed to aid utilities in addressing issues with aging transmission infrastructure, increased utilization of existing assets, and optimized maintenance and asset management. Data from those sensors, along with the associated communication and data integration infrastructure, opens the possibility to enrich and expand scope and use of large variety of analytics applications. This new sensor suite includes [10]:     

Conductor – Temperature and Current Sensor: This sensor is used to capture the temperature

and current magnitude of overhead transmission conductors. It uses wireless communication to transmits the data for rating applications. Overhead Insulator Leakage Current Sensor: This sensor measures the level of leakage current of insulators. The main purposes are to aid in determining the right time to wash insulation and to detect insulators at high risk of flashover. Shield Wire – Fault Current Magnitude and Location: These sensors measure the time and magnitude of the fault current that flows through shield wires. The main use is fault identification. Shield Wire – Lightning Sensor: This sensor measure peak magnitude and time of lightning current flowing in the shield wires. It can be employed to validate lightning location. Transmission Line Surge Arrester (TLSA) RF Sensor: This sensor captures the total number of events and charges seen by an arrester. They can be used to provide life expectancy.

30

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Overhead Transmission Structure Sensor System: This system fuses RF sensors with image

processing and environmental data. The data is wirelessly communicated in real time with builtin alarming functions. The system is used to address outages.

2.3 NON-ELECTRICAL DATA SOURCES (EXTERNAL DATA) One of the areas where a variety of non-electrical or data external to the electric system is used for power system analytics is in the area of electric load forecasting. Data sources used for this purpose include electricity demand data from SCADA, weather history, forecasts from weather service vendors, economy history, forecasts from economic analysis firms, end-use information from surveys, industry codes, equipment locations, land-use information from GIS, and urban-development plans from local governments [1]. Data from other sources such as outage information from OMS, logs of demandresponse activities, and records of past and ongoing energy efficiency programs has also been used to increase forecasting accuracy. Other nonconventional data sources have been also used for load forecasting. For instance, satellite images are used for spatial load forecasting to track historical development of cities. High-density weather stations are being used for forecasting rooftop solar generation and electricity demand in micro-climate zones. Cameras are also being installed around local farms to capture the local cloud movement [11]. In recent years, the use of non- electrical data for various applications used to analyze data from power systems are seeing increased popularity. One of the areas where external data provides very valuable information is on the forecasting and mitigation of extreme events. For instance, the resiliency of the power system can be improved by predicting the impacts of weather-related outages by utilizing a variety of weather data and public data sources. Different non-electrical data from various sources are briefly described below: Weather Data [11][12]: Depending on the characteristics and capacity, weather stations provide some or all the following data:         

Lightning characteristics Air and soil temperature Wind speed and direction Precipitation Fog, frost, ice, snow, sleet, and blizzard Snow-water equivalent Air relative humidity Solar radiation Pan evaporation

Weather data, both historical and in real time, can feed analytics to provide useful insight for various types of nature-induced disturbances. For instance, lightning is an important cause of interruptions or damage in almost every electrical system exposed to thunderstorms. The problem is severe mainly for electric utilities that have exposed assets covering large areas. In areas with a high probability of lightning, cloud-to-ground (CG) lightning is the single largest cause of transients, faults, and outages in power transmission systems. Different systems that are in use to detect lightning are: •

Gated wide-band magnetic direction finders (DFs)



Time-of-arrival (ToA) sensors



ToA methods operating at higher frequencies



Interferometric methods

In the U.S., the National Lightning Detection Network (NLDN) gives utilities lightning warnings in real time and information on whether CG strokes are the root of faults, documenting the response of fixed assets exposed to lightning and quantifying the effectiveness of lightning protection systems [14]. The NLDN uses different kinds of lightning sensors located sparsely over the area of about four million square miles of the U.S. territory. Reference [11] provides a comprehensive list weather event data sources, with the name, short description, and the URL from where to access the database.

31

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Geospatial Data: Geospatial information systems (GIS) have been used by utilities for years; nevertheless, the types of data available have increased, which leads to new applications of geospatial data visualization. Weather event data integrated with geospatial information can be applied to advanced power system analytics such as predictive modeling, real-time forecasting, and post-event analysis [10]. Customer Data: Social media data can help the utilities to improve engagement with customers as well as manage outages during major storm events. This data is a prime example of how a combination of utility operation data, weather data, and customer data could be integrated to provide better preparedness in outage management. Moreover, customers can see outage information, as well as receive news (educational and operational) from their utility oftentimes via a mobile application [10]. Geographic Information System (GIS): The GIS data includes two types of data: spatial and attribute. The spatial data presents the absolute and relative location of geographic features, for instance coordinates of location where a substation is situated. The key for effective use of this in power system applications is the combination of GIS and GPS with the model of a power system. GPS provides time references that can be applied to synchronize all events. Most digital measurement devices such as PMUs, traveling wave fault locators, and lightning detectors and locators have integrated GPS units to send precise time stamps with measured data. The GIS model of a system can be correlated with an electrical model, providing a more enhanced geographical characterization of a system [1][15].

2.4 COMMUNICATION REQUIREMENTS FOR SMART GRID DATA Network reliability and coverage, bandwidth, packet jitter, and latency requirements are the most critical issues when developing the technical requirements for the power system. For example, the communications network needs to provide real-time, low latency capabilities for applications such as centralized remedial action schemes (CRAS), tele-protection (less than 10 ms), transmission, substation SCADA and VoIP applications (100 to 200 ms), phasor measurement (about 20 ms), and load-control signaling. These requirements drive the need for high-speed fiber optic and/or microwave communications to support those capabilities. On the other hand, applications such as automatic meter reading (up to a few seconds) and data beyond SCADA, which are more latency-tolerant, could use communications technologies such as unlicensed wireless mesh, broadband wireless, licensed wireless, and satellite. Future trends and applications in generation, transmission, and distribution systems present different class of requirements and challenges (general communication requirements in power system application are provided in Table 2-7). Table 2-7: General requirement of communication in power system Requirement

Description

Performance

Data transmitting in power system needs different network performance and data (bandwidth, latency, and payload) requirements.

Coverage

For wide-area power system, selecting one or more communication technologies must be done after thorough analysis of its characteristics, cost, and other associated operational challenges.

Rural areas have poor cellular coverage and metropolitan areas are deploying high-speed mobile 4G/LTE technologies.

Different communication technologies have different cost structure.

Private communication networks are CapExintensive with low OpEx, while a serviceprovider-based public solution such as cellular or satellite requires higher OpEx with lower upfront CapEx.

A layered networking architecture ensures integration of innovations over the expected lifetime of the deployment. Over the next few years, newer protocols such as IEC 61850 and beyond are expected to be prevalent.

Field infrastructures are deployed with an average lifetime of 15 to 20 years, which may appear incompatible with the pace of evolution in data communications.

Cost

Life Time

32

Example Substation automation GOOSE applications require low-latency communications with latency budgets in order of milliseconds, while a conservation voltage reduction (CVR) application has latency expectation of seconds.

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Data Gathering

Security

The number of devices, the amount of data, and frequency of communications with the devices are necessary. Acceptable latency and required bandwidth for every type of data should also be considered.

Data collection from many sources on the power grid—such as sensors, meters, and voltage detection—in the customer premises— such as sensors for high-consuming appliances and from external sources such as weather—is necessary.

Power system is vulnerable to cyberattacks.

Additional traffic on the network and bandwidth consumption.

In summary, power system communications require the following:      

Security Bandwidth Reliability Coverage Latency Backup

However, each area of communications has different levels of requirements. For applications such as tele-protection circuits and C-RAS fault detection and fast switching to avert transmission grid failure, high bandwidth is not necessary. However, it requires extremely low latency communication, from 3 to 8 milliseconds. A field area network (FAN), on the other end, requires wide geographical coverage and low to medium bandwidth. Table 2-8 presents a more detailed mapping of networks and associated communication requirements in power systems. Table 2-8: Networks and associated communication requirements Categories

Throughput

Latency < 50 ms for Now < 8 ms in the Future

Burstiness*

Inter-Utility Network

10–100 Mbps

High-Speed Backbone Network

~3.3 Gbps

< 150 ms

High

Tele-protection and Other Low Latency Network

< 1 Mbps

< 8 ms

High

Substation Bus Network

10–20 Mbps

< 8ms

High

Field-Area communication

1 Mbps Downstream/ 384 Kbps Upstream Total>384 Mbps

< 150 ms

Medium

Premise Area Network

4 Gbps

< 50 ms

Medium

*Burstiness

High

is a measure of the variability of traffic (i.e. the peaks and lows).

Also, other important applications and related communication requirements in modern power system are provided in Table 2-9. Table 2-9: Communication requirements in terms of latency and data time window Application

Origin of Data/Place Data Is Required

State Estimation

All substation/control center Generating substation/application server

Transient Stability Small Signal Stability Voltage Stability Post-Mortem Analysis

Latency Requirement 1 sec 100 ms

Data Time Window Instant 10–50 cycles (167 ms – 830 ms)

Some key locations/application server

1 sec

Minutes

Some key locations/application server All PMU and digital fault recorder data/historian

1–5 sec.

Minutes Instant and Event Data

NA

Several smart grid applications have already been developed, and some are in the process of development as a future trend in power systems. To understand their communication needs, a brief

33

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

and qualitative survey of some of the most important applications in terms of their data requirement and latency are presented in Figure 2-2 and Table 2-10.

Figure 2-2: Requirements of a smart grid network

Table 2-10: Network requirements for smart grid applications Application

Data Rate/Volume

Latency Allowance (One-Way)

Reliability

Smart Metering Inter-Site Rapid Response

Low/Very Low High/Low

High Very low

Medium Very high

Scada

Medium/Low

Low

High

Operations Data

Medium/Low

Low

High

Distribution Automation Distributed Energy Management & Control Video Surveillance

Low/Low

Low

High

Medium/Low

Low

High

High Medium

Medium

High

Mobile Workforce

Low/Low

Low

High

Corporate Data

Medium/Low

Medium

Medium

Corporate Voice

Low/Very low

Mow

High

To meet all the performance, coverage, cost, and lifecycle requirements of the network, utilities require a combination of multiple communication technologies, because no single communication technology can meet all of their requirements. The dynamic nature and wide range of communication technologies available today provide power systems with numerous options. However, this also creates the multiple challenges of choosing the appropriate technology and networking architecture. Specific technology supporting each particular application varies based on factors such as bandwidth,

34

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

latency, and reliability. Table 2-11 lists some of the modern power system applications and the associated communication technologies that may be employed for each application. Table 2-11: Technology supporting each particular application (L – Low, M – Medium, H – High) Infrastructure/Applications

Network Requirements

Technology Option

Bandwidth

Latency

Reliability

High-speed Backbone

H

Milliseconds to Seconds

H

Inter-utility Area Network

H

Milliseconds to Seconds

M

Phasor Measurements

H

Milliseconds

H

Tele-Protection Network

L

Milliseconds

H

Remedial Action Scheme Centralized Remedial Action Scheme

L

Milliseconds

H

Optical Transport (DWDM, SONET) MPLS and IP-based fabric Wired and wireless carrier/utility company owned wireless networks satellite, microwave Fiber optic, microwave, broadband wireless IEC 61850, hardened routers/switches Fiber optic, microwave

H

Milliseconds

H

Fiber optic, microwave

Protective Relaying

L

Milliseconds

H

Fiber optic, microwave, low latency wireless, copper

Substation LAN

L

Milliseconds

H

Transmission and Substation SCADA

M

Milliseconds to Seconds

H

Field Area Network

M

Seconds to Hours

M

T&D Crew of the Future

H

Seconds

H

Outage Detection

L

Minutes

H

L

Minutes

M

L

Seconds

H

L

Seconds

H

Microwave, satellite

Distributed Generation control

L

Seconds

H

Microwave, satellite

Advanced Metering (meter reading, disconnect, communication to HAN)

M

Seconds to Minutes

M

Unlicensed wireless mesh, PLC, Zigbee

Data Beyond SCADA

M

Minutes to Hours

M

Outage Detection (thru Fault Indicators, Protection systems or advanced meters)

L

Minutes

H

Premise Area Network

H

Seconds to Minutes

M

Wired and carrier owned/utility company owned wireless networks satellite, microwave

Distribution Automation (routine monitoring) Distribution Automation (critical monitoring and control) Distributed Generation monitoring

IEC 61850, hardened routers/switches IP-based fiber optic, microwave, copper lines, satellite Wired and wireless carrier/utility company owned wireless networks satellite, microwave Broadband wireless Fiber optic, microwave, broadband wireless, unlicensed wireless mesh Microwave, satellite, unlicensed wireless mesh Microwave, satellite, unlicensed wireless mesh

Microwave, broadband wireless, satellite Fiber optic, microwave, broadband wireless, unlicensed wireless mesh

Dynamic Pricing

L

Minutes

M

Internet, ZigBee

Plug-in Electric Vehicle

L

Minutes

M

Zigbee, PLC

Demand Response

L

Minutes

H

Zigbee, PLC, paging systems

Home Area Network Interface

L

Minutes

M

Wired or wireless broadband, Zigbee

*

EDISON, Southern California.

35

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Different modern technologies can be used in order to improve the functionalities of power systems and remove associated problems and solve challenges. There is not a finalized architecture for future power system communication infrastructure. However, the following technology options are the most promising at this point: Table 2-12: Communication technology options Characteristic High-speed Backbone

Migration from IPv4 to IPv6

Substation LAN

Solution Multi-protocol label switching (MPLS). Dense wave division multiplexing (DWDM). End-to-end IP-based fabric. Continue to use advanced fiber optic, microwave and satellite networks. To address needs for connecting of millions of end points. IEC 61850 protocol to transform the substation communications networks from serial (i.e. SCADA RTU) to IP-based communications using IEC 61850compliant IEDs and utility-grade rugged IP routers. Hardened and advanced routers and other networking equipment with scalable architectures to enable reliable and secure two-way communication between substation SCADA equipment and the EMS.

2.5 REFERENCES [1]. Advanced Data Analytics Techniques: Analysis and Applications for Power System Operation and Planning Support. EPRI, Palo Alto, CA: 2015. 3002007076 [2]. M. Kezunovic, L. Xie, S. Grijalva, P. Chau, and et al, Systematic Integration of Large Datasets for Improved Decision-Making, PSERC 2015. [3]. Substation Data Integration and Analysis: Study Report. EPRI, Palo Alto, CA: 2011. 1019916 [4]. J. Perez, “A guide to digital fault recording event analysis,” in 63rd Annual Conference for Protective Relay Engineers, 2010, pp. 1-17. [5]. S. Santoso, and D. D. Sabin, “Power quality data analytics: Tracking, interpreting, and predicting performance,” in IEEE Power and Energy Society General Meeting, 2012, pp. 1-7. [6]. W. Strang, and e. al., “Considerations for Use of Disturbance Recorders ” in System Protection Subcommittee of the Power System Relaying Committee of the IEEE Power Engineering Society, 2006. [7]. "Next-generation power quality meters," 2015; Available online [8]. W. Xu. "Working Group on Power Quality Data Analytics Objective & Scope," 2015; http://grouper.ieee.org/groups/td/pq/data/downloads/PQDA-Objective-and-Scope.pdf. [9]. "PQView," 2015; http://www.pqview.com/. [10]. Sensor Technologies for a Smart Transmission System, EPRI, 2009. [11]. Integration of Internal and External Data Sources to Support Transmission Operations, Planning, and Maintenance, EPRI, 2014. [12]. M. Kezunovic, L. Xie, S. Grijalva, P. Chau, and et al, Systematic Integration of Large Datasets for Improved Decision-Making, PSERC 2015. [13]. P.-C. Chen, T. Dokic, and M. Kezunovic, “The Use of Big Data for Outage Management in Distribution Systems,” in Int. Conf. on Electricity Distrib. (CIRED) Workshop, Rome, 2014. [14]. K. L. Cummins, E. P. Krider, and M. D. Malone, “The US National Lightning Detection Network<sup>TM and applications of cloud-to-ground lightning data by electric power utilities,” IEEE Trans. Electromagnetic Compatibility vol. 40, no. 4, pp. 465-480, 1998. [15]. P.-C. Chen, T. Dokic, and M. Kezunovic, “The Use of Big Data for Outage Management in Distribution Systems,” in Int. Conf. on Electricity Distrib. (CIRED) Workshop, Rome, 2014 [16]. https://www.nrel.gov/docs/fy17osti/67553.pdf [17]. http://www.te.com/content/dam/te-com/documents/sensors/global/TE_SensorSolutions_WindTurbines.pdf

36

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

3.

DATA-ANALYTICS TECHNIQUES

Information management in companies is becoming a process of much relevance. The goal is to discover knowledge from raw data generated during operation of the processes. Traditionally, data is used for purposes of process control; sometimes it was processed to get graphs of what was going with the process (situational awareness). Now the decision-makers want the data to be transformed into useful information for decision-making (decision support). In the early 1970s and 1980s, decision-support applications such as administrative information, predictive analytics, and online analytical processing (OLAP) have emerged and expanded the decision-support system domain. In the early 1990s, business intelligence (BI) played a pivotal role to increase value and performance of the enterprise. As a technology-driven process, BI helped corporate users make critical decisions by analyzing data and presenting information. BI involves a variety of tools, applications, and methodologies that collect data, prepare it for analysis, run queries; then analytical results such as reports, dashboards, and data visualizations are available to decisionmakers. The natural evolution of BI is data analytics (DA) or advanced analytics. There is no unique definition for the term “advanced analytics,” but it usually refers to tool types that based on predictive analytics, data mining, statistical analysis, digital signal processing, artificial intelligence, natural language processing, and other mathematical processes that attempt to recognize and validate data patterns and trends and draw conclusions therefrom. Data-analysis techniques can be combined with other analytical disciplines, such as descriptive modelling and decision modelling or optimization with the main objective to provide support for making better decisions. Many of analytics techniques appeared in the 1990s. Today, the data sets are significantly larger than before and most of these techniques adapt well to minimal data preparation. By using advanced analytics, utilities can study electricity usage data to understand and learn the state of the load and operations, and customer behavior. The advanced analytics can help to discover knowledge and facts that benefit business. By examining large volumes of data with details, useful information from hidden patterns and unknown correlations can be extracted to make better enterprise decisions. Data-analytics techniques have been applied across many industries, but the practice in the energy and utility sector is behind the other industries in terms of actual implementation. However, some of the implementation of analytics techniques (EPRI, Jan 28, 2016) used in the utility industry already show promising outputs. In order to enable secure, reliable, and interoperable operation of the power grid, an information-based framework is to be integrated into the electrical transmission grid. A large and heterogeneous collection of data from a multitude of measurements, status, or third-party data in various formats is used in constructing the framework. Data analytics is able to identify its unrevealed patterns, predict the prospective outcomes, and recommend appropriate decisions. Visualizing the current situation as well as future situations and scenarios could help to increase the situational awareness of operators; the visualization has to cooperate with data analytics. There is no unique classification of advanced analytics techniques, but each technique can contribute to data analytics of modern power system operation, especially in the situational awareness. In this brochure, the advanced data-analytics techniques are divided into six categories: 1.

Data mining and Association Rules

2.

k-Nearest Neighbor

3.

Supervised Learning and Unsupervised Learning

4.

Probabilistic Networks

5.

Deep Learning

6.

Visual Analytics

These six well-known categories are described in this section. These data and visual analytics techniques could apply to both real-time data and online and offline simulation data of electrical transmission grids to prepare short- or long-term scenarios and models. Therefore, operators could increase the situational awareness by visualizing the output of important information.

37

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

3.1 DATA MINING AND ASSOCIATION RULES 3.1.1 Brief definition Data mining is a process of analyzing data from varies perspectives and transferring it into useful information. Alternatively, it can be defined as the process of data selection and exploration and building models using vast data stores to uncover previously unknown patterns and using that information to build predictive models. Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. Such data includes:   

Operational or transactional data, such as sales, cost, inventory, payroll, and accounting. Nonoperational data, such as industry sales, forecast data, and macro-economic data. Metadata, which is data about the data itself, such as logical database design or data dictionary definitions.

The patterns, associations, or relationships among all this data can provide useful information. Information can be converted into knowledge about historical patterns and future trends.

3.1.2 Technical description Generally, any of four types of relationships are sought in data mining: 

Classification (Classes): Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.



Clustering (Clusters): Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.



Associations: Data can be mined to identify associations.



Sequential patterns: Data is mined to anticipate behavior patterns and trends and detect deviations (find anomalies). For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer’s purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements: 

Extract, transform and load transaction data into the data warehouse system.



Store and manage the data in a multidimensional database system.



Provide data access to business analysts and information technology professionals.



Analyze the data by application software.



Present the data in a useful format, such as a graph or table.

3.1.3 Application domains Data mining is primarily used today by companies with a strong focus on the consumer, such as in retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among “internal” factors such as price, product positioning, or staff skills, and “external” factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it allows them to drill down into summary information to view detailed transactional data. With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual’s purchase history. By mining demographic data from comments or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments. For example, Netflix mines its database of video rental history to recommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures. WalMart is pioneering massive data mining to transform its supplier

38

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

relationships. It uses this information to manage local store inventory and identify new merchandizing opportunities.

3.1.4 Potential applications Some examples of data-mining applications for electric power utilities are customer relationship management (CRM) to track behavior; power plant maintenance; electrical transmission grid planning (Chen, Onwuachumba, Musavi, & Lerley, 2017) and operation; human resource management; fraud detection; and finding anomalies. See Section 4 for a more detailed description of applications of data mining in the power industry.

3.2 K-NEAREST NEIGHBOR 3.2.1 Brief definition The k-nearest neighbor (k-NN), which is also referred to as lazy learning, case-based reasoning, and instance-based learning, is a well-established classification method that is based on closest training sets in the feature space. The main idea of k-NN, which could be explained as a sample’s category, is decided by its k most similar samples. The sample falls into a category that contains the largest number of its k most similar samples. The k-NN algorithm is among the simplest of all machine-learning algorithms. The k-NN is a nonparametric learning algorithm because it does not make any assumptions on the underlying data distribution. This feature is very advantageous because most of the practical data do not obey the common theoretical assumptions in the real world. Another feature of k-NN is that it is highly adaptive to local information. A k-NN algorithm utilizes the closest data points for estimation; it is capable of taking full advantage of local information and form highly nonlinear and adaptive decision boundaries for each data point.

3.2.2 Technical description k-NN compares a group of training objects (k) that are closest to the test objects and label the influential class in the neighborhood. Three essential elements are included in this process: 

A set of labeled objects (e.g., a set of stored records (data)).



A distance measurement or a similarity metric.



The number of nearest neighbors, the value of k.

Once an unlabeled object is provided, the distance of this object to the labeled objects is computed. Based on the data, k-nearest neighbors are identified, and the class labels of the nearest neighbors are utilized to determine the class for this unlabeled object. Multiple training and testing sets with random data from different sets could mitigate bias presented by noise or irrelevant data and thus improve the performance of k-NN.

3.2.3 Application domains k-NN is commonly applied to solving classification problems. Offline analysis helps to generate rules for different data classes, and online analysis could initiate decision trees for classification purpose.

3.2.4 Potential applications in smart grid One form of such classification is used for classifying historical load consumption data into three different classes (iTesla (Innovative Tools for Electrical System Security within Large Area), July 29, 2013). The classification is based on training and testing load consumption data, and the training class is prepared based on the cumulative distribution of the target load. Another application of k-NN algorithm is the classification of abnormal data from a PMU. The example in Figure 3-1 shows that k-NN is trained with phase angle difference data, which defines abnormality. If the test data has an abnormal phenomenon, k-NN can detect this phenomenon based on the training provided. It will be one of the online data-mining applications for PMUs. The purpose of such classification could validate the PMU data.

39

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 3-1: k-NN classification of abnormal PMU data

3.3 MACHINE LEARNING 3.3.1 Supervised and unsupervised learning Although the machine learning taxonomy is extensive, the most classical setup establishes two types of machine learning: supervised (SL) and unsupervised (UL). The former infers a model by relating an output y (also called labels) with one or more inputs (i.e. features) x, such that y=f\(x\). The feature vector is denoted in bold to specify that it is composed of several features (i.e. n-dimensional). Labels can be categories (e.g., a failure report on electric infrastructure is true or false), in which case the solved problem is called classification, or they can be continuous values (e.g., the daily power demand on a specific location), in which case the problem is called regression. Further, it is called supervised because we want to model a specific relation, the one that is given explicitly between y and x. On the other hand, unsupervised learning tackles problems related to building probabilistic models from unlabeled data. The goal is to discover hidden patterns within data in the form of hierarchies or groups. These patterns are obtained by making use of the statistical structure of the provided data. However, assessing unsupervised learning performance is difficult due to true patterns and probabilistic distributions where data sources are unknown. The most basic steps to train a learning model are: 1) information gathering and pre-processing 2) Model training 3) Model testing In the first step, information is obtained and processed to reduce noise and enhance model performance and assumptions made over the modelled process. The next step consists of iteratively estimating models by reducing the fitting error. The final step is different for SL and UL. In the former, the model is tested using unlabeled data; in the latter, patterns discovered are subject to additional analysis to extract information. Figure 3-2 succinctly presents a diagram of supervised and unsupervised learning.

40

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 3-2: Supervised learning (upper rectangle) and unsupervised learning (lower rectangle)

Observe that while both SL and UL fits a model to some data according to a vector parameter, the final objective is different. The former uses a model to relate a dependent variable with its explanatory features as is detailed by the labeled data; then, the model is used to predict unknown data. The latter uses unlabeled data. Consequently, the true relation between data is unknown beforehand. Thus, data is grouped, and then the resulting patterns are evaluated. The remainder of this section presents machine-learning techniques and applications used for power systems. Such applications are neither an extensive list nor necessarily the best solutions. Rather, this list aims to show how ML techniques can be used to obtain robust regression models or to extract valuable information about the application problems. First, supervised algorithms as linear regression (LR), decision trees (DTs), artificial neural networks (ANN), and support vector machines (SVM) are presented. Then, an unsupervised learning technique called K-Means is discussed. Additionally, other clustering methods are mentioned, and references are provided where needed. Formally, a generic SL problem can be stated as follows: Given a dataset of the form of {𝒙𝑖 , 𝑦𝑖 }𝑛𝑖=1 ∣ 𝑋 ∈ ℝN × 𝑌 ∈ ℝ, where 𝑋 is an N-dimensional space of features and 𝑌 is the corresponding response, we are asked to estimate the relation between 𝑦𝑖 = 𝑓(𝒙 ∣ 𝜽), where 𝜽 are the function's parameters. In classification, the response variable is binary 𝑌 ∈ {±1}, whereas, in regression, the response is continuous 𝑌 ∈ ℝ. For instance, the problem of forecasting a generator's failure (given measurements of humidity, vibrations, thermic energy, gases, and aging) is a classifying problem (i.e. fails or not), whereas the prediction of the daily wind power generation of a wind farm is a regression problem. It is worth mentioning that, the relation 𝑌~𝑋 is estimated by minimizing an error criterion that ensures that the inferred function generalizes as accurately as possible the true underlying process.

3.3.2 Linear regression 3.3.2.1 Brief definition and technical description The linear regression model is one of the oldest, most renowned, and most used models for statistical and ML applications (James, Witten, Hastie, & Tibshirani, 2013; Hastie, Tibshirani, & Friedman, 2009). This model is simple and leads to robust solutions. It is readily interpretable by non-expert users and is accessible to code. Nonetheless, linear regression makes some naive assumptions about the modelled process (e.g. the process can be approximated by a linear combination of variables, deviations from the model obey a normal distribution, and so on), assumptions that are hardly met by real-world problems.

41

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

However, even while a large list of newer and more robust algorithms exists, still today LR remains the workhorse of several industries. Some of the most important characteristics of LR are its high interpretability (i.e. we know how much the dependent variable will change with respect each feature), while additional analysis can be performed using the trained model itself (i.e. features ranking). For instance, by using LR in a power generation forecasting application, we can know how each of the measured variables (e.g. humidity, rain, mean transformer losses) impact the power generation output in electrical power units. Further, LR has been subject to several extensions to enhance its robustness and precision (Rao, Toutenburg, Shalabh, & Heumann, 2008). Even more, with the advent of big data technologies, LR has regained popularity and predictive power (Ma & Sun, 2015; Ma & Cheng, 2016). It is worth noting that, in literature, LR usually refers to a model with only one explanatory variable, whereas multiple linear regression (MLR) refers to a model with two or more explanatory variables. In this work, we refer to both as LR. Colloquially, an LR model can be understood as RESPONSE = FIT + RESIDUAL. In this expression, RESPONSE stands for the variable of interest; the FIT term represents a linear combination (a summation) of measured features related to the response; the RESIDUAL term represents an unpredictable error/noise of the observed values with respect to the model’s prediction. For illustrative purposes, we will first introduce a one-variable LR model: 𝑌 = 𝑓(𝑋) → 𝑦𝑖 = βi,0 + 𝛽𝑖,1 𝑥𝑖,1 + 𝜖. The former is a line equation where 𝛽i,0 stands for the intercept (i.e. the expected value of Y when X = 0), βi,1 for the slope’s line (the average increase in Y with a one-unit increase in X), and 𝜖 is the irreducible error or noise made in the model (James, Witten, Hastie, & Tibshirani, 2013; Hastie, Tibshirani, & Friedman, 2009; Shalev-Shwartz & Ben-David, 2014). 𝛽 corresponds to the weights assigned to each variable and are the parameters of the model. Then, the problem is reduced to find 𝛽0 and 𝛽1 such that the difference between sample data labels Y and predicted labels 𝑌̂ is minimized. For two or more variables, the LR model is simply defined as: 𝑦𝑖 = β0 + 𝛽1 𝑥𝑖,1 + ⋯ + 𝛽𝑁 𝑥𝑖,𝑁 + ϵ = β0 + ∑𝑁 𝑗=1 𝛽𝑗 𝑥𝑖,𝑗 + 𝜖, (1) Where the vector 𝜷 stands for all the weights of the LR model. Consequently, the training of Equation (1) is reduced to find 𝜷 such that the resulting hyperplane (i.e. line in more than three dimensions) is as close as possible to the data points. So far, we have neglected how to find the parameters of the model. The former requires that LR minimize an error criterion, in its most basic setup, the residual sums of squares (RSS). Let 𝑦̂𝑖 be the prediction of the model for the 𝒙𝒊 sample point, and 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 the error between the true and the forecasted values. Then, the RSS is defined as: 𝑛

𝑅𝑆𝑆 = ∑

𝑒𝑖 .

𝑖=1

3.3.2.2 Application domains LR applied to building energy consumption Building energy consumption is the main component of worldwide consumption and carbon dioxide emissions. Nowadays, LR-based models have been successfully proposed for predicting how much and when energy will be consumed for single buildings (Asadi, Shams, & Mohammad, 2014) and building blocks (Ma & Cheng, 2016). On the other hand, understanding the relation between building energy consumption and its components is essential for developing adequate energy-management policies (Hsu, 2015; Walter & Sohn, 2016; Chung, 2012). For instance, a building’s energy consumption used by indoor comfort such as heating, ventilation, and air-conditioning (HVAC) account for 65% (Lam, Wan, Liu, & Tsang, 2010). In this regard, a penalized LR has been used for automatic identification of energy system components such as operational schedule, number of customers, lighting control, employee behavior, and maintenance in commercial buildings (Hsu, 2015). In another instance (Braun, Altan, & Beck, 2014), an LR model was proposed to predict a U.K. supermarket electricity and gas consumption. Given the particular conditions of a supermarket building (i.e. large refrigerated shelves), it was found that climate changes in relative humidity and temperature are expected to increase the electricity consumption by 2.1%, whereas gas will decrease by 13% (Braun, Altan, & Beck, 2014). LR applied to energy policies

42

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Energy policies constitute laws and actions to address energy infrastructure development, production, distribution, and consumption. One of such policies with more significant impact on energy consumption is retrofitting, which improves energy consumption (e.g. lighting, indoor comfort) by replacing older electrical components with newer ones. LR was employed to assess the cost-saving benefits of retrofitting commercial and residential buildings (Walter & Sohn, 2016). Similar results were obtained in the work by Huebner, Hamilton, Chalabi, Shipworth, & Oreszczyn in 2015. Using an LR model, they assessed energy consumption of a U.K. housing stock by categories of predictors such as building variables, socio-demographics, heating behavior, and psychological factors. They found that a building’s electrical components explain by far most of the variability in energy consumption, thus supporting retrofitting policies (Huebner, Hamilton, Chalabi, Shipworth, & Oreszczyn, 2015). Furthermore, the construction of smart buildings and smart energy policies require building energy consumption benchmarks. In this sense, expert knowledge and non-technical regulations need to be integrated into benchmarks. Thus, in (Chung, 2012), a fuzzy-LR model was developed for benchmarking building energy consumption, including expert knowledge. LR applied to utility companies Predicting energy load (Hong, Gui, Baran, & Lee, 2010), demand (Kandananond, 2011), and consumption (Tso & Yau, 2007) plays an important role in decision-making and planning for utility companies. For instance, the long-term load forecasting can be employed for transmission and distribution (T&D) planning, whereas a short-term can be used for the demand-side management (DSM). DSM is particularly important to reduce peak electricity demand while maximizing utility generation capacity (Hong, Gui, Baran, & Lee, 2010). Another LR application for utility companies is the assessing of the reliability and security of a power system (Halilcevic, Gubina, Strmcnik, & Gubina, 2006). In this sense, LR can be employed to identify the critical components of energy transmission in power supply networks. By knowing this, utilities can perform better managerial actions such as power reserving and transmission network reinforcement planning (Halilcevic, Gubina, Strmcnik, & Gubina, 2006).

3.3.3 Decision and regression trees 3.3.3.1 Brief definition Classification and regression trees (DTs) were introduced to the AI area in the mid-1980s. Even though classification trees and regression trees perform different tasks, they are both referenced here as DTs. DTs can be used for supervised and unsupervised tasks. However, later applications are beyond the scope of this document; further, even though DTs can perform regression, binary, and multiclassification, for pedagogical purposes, we constrain DTs algorithm explanation to binary classification. In such a setup, DTs build a rule model for separating two classes (e.g. 𝑦 = {±1}) graphically presented in the form of a tree, thus their name. More precisely, DTs perform a partition of the feature space into subspaces where a simple model (e.g. the most common class) is fitted (Hastie, Tibshirani, & Friedman, 2009). DTs have positive and negative characteristics: on one hand, they are interpretable as rules providing an explanation between x measurements and target value y, they can handle different types of data (e.g. numerical, categorical, nominal) and missing data at the same time, and they are computationally cheap (Rokach & Maimon, 2015). On the other hand, DTs suffer from high variance (i.e. they tend to over fit the model to training data, performing poorly with new data) reducing its performance against more robust classifiers (James, Witten, Hastie, & Tibshirani, 2013). Nonetheless, DTs performance can be enhanced by constraining the tree parameters such as the depth of the tree or using combinations of trees to reduce variance. Using statistical methods like bagging a DTs forest can be grown and used as a single classifier/regressor. Such statistical methods and how to combine the forest into a single function are elsewhere documented.

3.3.3.2 Technical description DTs models are composed of branches and nodes. Branches connect each node in a directed way (i.e. from A to B). Except from the root node, all other nodes have an incoming branch from a previous node, whereas except from terminal nodes, each one has a pair of outgoing branches. Each node corresponds to a decision or split of the feature space. Nodes can be of three types: root, internal, or terminal (i.e. leaf node). The root is the starting node, and it performs the best dichotomic partition of the feature space between two given classes and connects to a pair of internal/terminal nodes. Internal

43

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

nodes correspond to intermediate steps where feature space is further split into more specific subspaces in accordance with some criterion. Terminal nodes correspond to a final decision on the analyzed point. It is worth mentioning that terminal nodes rather than a class allocation can be interpreted as the probability of each class. Moreover, DTs display an explicit hierarchy between features: the root node (the first variable to perform a split) is the most important feature for the problem, the internal nodes are the second most important variables, and so on. Thus, a binary classification DTs is a function 𝑓(𝒙) = 𝑦, which predicts the class or probability of any instance x by taking decisions following binary rules described by nodes. A binary tree is built as follows: 1. Identify the feature that performs the best separation between classes. a. Find the best split-point of the feature (the value in which the best separation is obtained). b. Divide the feature space into two distinct and non-overlapping regions 𝑅𝑖 𝑦 𝑅𝑗 . 2. If the maximum tree depth is reached or the stopping criterion is met, assign to every observation in the region 𝑅𝑗 the most common class. Else, identify a new feature and its splitpoint to separate region 𝑅𝑗 into two new sub-partitions. 3. Repeat step 2. As an example, a simple DT for detecting faults in a transmission line during a storm is shown in Figure 3-3. On the right side, the tree classifier is depicted; on the left, the partition performed in the feature space is shown. Sampled transmission lines under storm conditions are shown in the feature space (B part of the figure). Orange dots correspond to lines, which suffered a failure, whereas gray dots are the non-interrupted transmission lines. Features on this example are precipitation, which is the continuous variable, and thunderbolts, which is categorical one.

Figure 3-3: A simple DT model for detecting faults in a transmission line

On the A side, the tree constructed for precipitation and thunderbolts is shown; split-points for each feature are shown on edges, while labels under each terminal node corresponding to the region defined on the feature space. Further, on each node is also shown the frequency of faults/no-fault and the corresponding probability. On the B side, the feature space, which is divided into regions R1, R2, and R3, are presented. In this example, if precipitation is less than a 10 cm^3 threshold (R1), transmission lines will be classified as no-fail with a 91% probability, whereas failure during a storm with precipitation below the former threshold will have a very low probability (0.09). Furthermore, so far we have neglected some important concepts like the criterion for selecting partition features, how to select split-points, and how to determine a DTs depth. Thus, readers are referred to (James, Witten, Hastie, & Tibshirani, 2013; Hastie, Tibshirani, & Friedman, 2009; Rokach & Maimon, 2015) for more DTs details.

44

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

3.3.3.3 Application domains DTs applied to building energy consumption As was mentioned in the LR section, understanding and forecasting energy consumption patterns in a building allow for reducing CO2 emissions and manage energy load by regulating demand. Properly managed energy consumption also requires understanding how the electrical components impact the overall building consumption. For instance, building designers and architects require tools that can allow them to predict a new building’s energy usage patterns based on atmospheric data, building architecture and household characteristics, and energy sources (Yu, Haghighat, Fung, & Yoshino, 2010). Recently, DTs have been employed to forecast Energy Use Intensity levels (i.e. the ratio of annual total energy used between the building’s floor area) for buildings across Japan (Yu, Haghighat, Fung, & Yoshino, 2010). One of the most important contributions of this work is the analysis of rules obtained from the classification tree. By analyzing the hierarchy of features on the DT, they found that different sets of features impact a building’s consumption in accordance with district temperatures. In this regard, we could test other data sources like sun movement and clear-sky solar irradiance on a building’s surface. Thus, by re-training the DT model and analyzing where such variables are located in the hierarchy of the tree, we may conclude whether such variables are significant or not in characterizing a building’s energy consumption.

3.3.4 Artificial neural network 3.3.4.1 Brief definition Artificial neuronal networks (ANNs) are computational networks that try to simulate the decision process that occurs in biological networks of neurons in a central nervous system (Graupe, Sep 2013) (Kalogirou, Dec 2001) (Russell & Norvig, Dec 13, 1994). Similar to biological neurons, an ANN can be described as a massively parallel-distributed processor that stores knowledge and makes it available for use (Haykin, 1999). According to (Kalogirou, Dec 2001), ANNs “are good for tasks involving incomplete data sets, fuzzy or incomplete information and for highly complex and ill-defined problems, where humans usually decide on an intuitional basis. They can learn from examples and are able to deal with non-linear problems.” An ANN is a group of interconnected artificial neurons, interacting with one another in a concerted manner. In such a way, excitation is applied to the input of the network. It resembles the human brain in two respects: 1) Knowledge is acquired by the NN by means of a learning process. 2) Inter-neuron connection strengths known as synaptic weights are used to store the knowledge. They learn the relationship between the input parameters and the controlled and uncontrolled variables by studying previously recorded data, similar to the way a nonlinear regression might perform (Kalogirou, Dec 2001).

3.3.4.2 Technical description The network consists of three elements: 1) Input layer 2) Hidden layers 3) Output layer (see Figure 3-4 [22]).

45

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 3-4: ANN1 schematic diagram of a feed-forward NN

In its simple form, each neuron is connected to other neurons of a previous layer employing adaptable synaptic weights, and knowledge is stored as a set of connection weights. An ANN is composed of many nodes connected by links, where each link has a numeric weight (Russell & Norvig, Dec 13, 1994) (see Figure 3-5). Training is the process of modifying the connection weights in some order using a learning method that usually takes place by updating weights (Russell & Norvig, Dec 13, 1994). By means of this learning mode, an input is presented to the network along the desired output and weights are adjusted, so the network attempts to produce the desired output, value of weights is acquired after training (see Figure 3-5 [22]):

Figure 3-5: ANN2 information processing in ANN

The basic idea shown in Figure 3-5 is that each node or neuron makes a local computation based on inputs from its neighbors but without the need for any global control over the set of units as a whole (Russell & Norvig, Dec 13, 1994). The node receives weighted information from other nodes. First, they are added and then passed to the activation function. For each of the outgoing connections, this value is multiplied by the specific weight and transferred to the next node (Kalogirou, Dec 2001). In practice, ANNs are implemented in software like Matlab, SPSS, Weka, Rapidminer, and Java, among others. According to (Kalogirou, Dec 2001), for training, the ANN is a necessary training set or group of matched input and output patterns. Each produced output through the network is compared to the desired output. To reduce the error to the desired tolerance, the ANN often needs to run repeatedly by altering connections weights. When the training achieves a desirable level, the network locks the weights constant and uses this trained network to make decisions, identify patterns, or define associations in new input datasets.

46

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

The most popular and powerful algorithm of ANN is back propagation (BP). The train of all patterns of a dataset is called an epoch. BP tries to improve the performance of NN by reducing the total error by changing weights and its gradient. The error is expressed by the RMS (root-mean-square), a zero-error value indicates that all the computed output patterns match the expected values, and therefore network is well trained (Kalogirou, Dec 2001).

3.3.4.3 Application domains According to (Kalogirou, Dec 2001), ANNs are able to learn the key information patterns within a multidimensional information domain which is fault-tolerant, robust, and noise-immune (Rumelhart, Hinton, & Williams, 1986). Data from energy systems are noisy, making the data a good candidate to be analyzed with neural networks. ANN has been applied to predict and optimize energy use in commercial buildings—particularly in HVAC in commercial buildings—without sacrificing comfort (Kreider, Wang, Anderson, & Dow, December 1992) (Curtiss, Brandemuehl, & Kreider, January 1994). ANNs have been applied to the diagnosis of line faults of power systems and load forecasting in power systems. ANNs were used to model the combustion process of incineration plants with the purpose to optimize the reduction of toxic emissions (Muller & Keller, 1996). In (Milanic & Karba, 1996), ANNs were used for predictive control of a thermal plant, by using the steam flow as input and a simple network structure because on-line predictions of plants are faster. In (Mandal, Sinha, & Parthasarathy, 1995), ANNs were applied for short-term load forecasting in electric power systems. The output of the ANN was the next hour load, and no weather variables were considered. In (Khotanzad, Abaye, & Maratukulam, 1995), a recurrent neural network (RNN) load forecaster was used for hourly prediction of power system loads. In (Datta & Tassou, 1997), ANNs networks were used for prediction of the electrical load in supermarkets. ANNs are used in wind energy systems and can be grouped into three major categories: forecasting and prediction, prediction and control, and identification and evaluation (Keles, Scelle, Paraschiv, & Fichtner, 2016). Forecast methods for day-ahead electricity prices are essential for energy traders and supply companies. ANN has to be used to successfully forecast day-ahead electricity prices, providing even better results than ARIMA (Keles, Scelle, Paraschiv, & Fichtner, 2016). Finally, ANNs are used for the implementation of a wide variety of anomaly-detection systems, including intrusion detection systems (IDS) for network computers in the electric energy sector as well as advanced IDS for the smart grid in an ensemble with other algorithms (Aburomman & Reaz, March 2017).

3.3.5 Support vector machine (SVM) 3.3.5.1 Brief definition The support vector machine (SVM) was developed by Vapnik and others during the 1990s (Scholkopf & Smola, 2002). SVM was initially developed as a linear classifier, although it is somewhat famous for its capacities to handle noisy nonlinear data. SVM has also been extended to the problem of regression, probability estimation, clustering, and so on (Scholkopf & Smola, 2002). However, because SVM main features are shared among all distinct SVM applications, we limit the description of the algorithm to classification problem.

3.3.5.2 Technical description To introduce SVM, we first require introducing the empirical risk minimization (ERM) principle. It is the most used criterion to train any ML model: It only requires that the model achieves the lowest possible error on the training sample. Achieving the lowest error rate on a given sample only requires to model every possible case. However, such model will be so particular that will perform poorly on out-of-thesample points. On the other hand, SVM was designed based on the structural risk minimization (SRM) principle (Scholkopf & Smola, 2002). The former establishes a bound that relates generalization (i.e. how well the model explains unseen samples) to the simplicity of the model (i.e. if the model is too complex, it should perform poorly on unseen data). Thus, SVM uses the simplest family of functions and hyperplanes to approximate a given sample. Further, to constrain the number of valid hyperplanes and its complexity, a margin around the hyperplane is added.

47

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Such a margin guarantees that the problem is convex (i.e. it has an optimal solution(s)), improving on the computational burden of estimating the hyperplane. Another important feature of the SVM formulation is that the hyperplane is described using a reduced set of the sample points known as the support vectors (SVs). Thus, in a classification problem, learning the optimal hyperplane with a given sample is reduced to find the support vectors. Because only the SVs are required to build the hyperplane, all remaining training points are disregarded. An SVM toy example of two dimensions is shown in Figure 3-6.

Figure 3-6: A toy example of a linearly separable problem

SVM’s hyperplane is the area between the blue and gray dots. SVs circled in red lie exactly on the margin. Note that the hyperplane function is written as a line. Moreover, the margin is nothing more than two lines parallel to the hyperplane separated from it by 1/w, where w is the slope of the line and b is the bias parameter. The w vector was drawn not parallel to attribute 2 for illustrative purposes. In its original formulation, the margin is considered to be hard, meaning that all SVs must lie on the margin. Such a formulation is heavily restrictive because it only works with non-noisy linear data. Consequently, SVM was extended to model nonlinear relations by adding the kernel trick. Kernel functions map a point from the original space to the feature space. This mapping has several advantages: 1) The feature space has more dimensions. Thus, it is presumably easier to find a separating hyperplane. 2) The feature space is a nonlinear map. Thus, it can handle non-linear data. 3) Comparing two vectors in the feature space simply requires multiplying them and then computing the map, without the computational burden of performing first the mapping for each point. Later, SVM was extended to allow for noisy data. The soft-margin formulation permits SVs to violate the restrictions imposed by the margin (SVs can be found within or beyond the margin). Although SVM is formulated as an n-dimensional line, it is rather convenient to find it by solving its dual formulation. The further details of the optimization problem or kernel functions are presented (Scholkopf & Smola, 2002). Formally, given a data set of the form (xi , yi )1m ∈ RN × {±1} , the optimal hyperplane is found by solving the problem: m

1 m αi − ∑ α α y y k(x , x ) 2 i,j=1 i j i j i j i=1

maximize W(α) = ∑ subject to 0 ≤ αi ≤ m

C for all i = 1, … , m, m

and ∑ αi yi = 0. i

In this equation, αi corresponds to the weight of the ith sample point, yi is its class, and xi is its feature vector; C is a penalization parameter that controls the complexity of the model (i.e. larger C values correspond to a simpler model, whereas a zero value produces a very complex one). Once weights (the

48

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

support vectors αi ) are found, a new instance can be classified using the hyperplane equation (Scholkopf & Smola, 2002).

3.3.5.3 Potential application in smart grid SVM applied to transmission lines fault detection Transmission line faults entail 85% to 87% of power system faults (Singh, Panigrahi, & Maheshwari, 2011). Power systems and electrical grids require reliable transmission lines, detection tools designed to find early faults may decrease the time that a circuit is interrupted. Protective relays are in charge of detecting energy or hardware faults and ameliorate their impact. Initially, these protections were electromagnetic. However, nowadays they are digital and possibly transmitting its measurements to the internet. The main problem with the detection of any fault resides on the characterization of the current and voltage signals. Once a proper characterization is chosen (Singh, Panigrahi, & Maheshwari, 2011; Ray & Mishra, 2016), machine-learning techniques such as DT or SVM can be used to classify fault/nonfault signals. However, DTs can be heavily biased given that they tend to overfit training data, and such data is gathered from simulators rather than real measurements. On the other hand, SVM produces more robust models that are less sensitive to particular simulated conditions. Moreover, with the kernel trick, complex relations between faults and its characteristic signals can be captured. Even more, adding data gathered by the protective relay, more solid results may be expected from SVM than DTs.

3.3.6 K-means and clustering 3.3.6.1 Unsupervised learning This introduction is necessarily incomplete given the enormous range of topics under the rubric of “unsupervised learning.” For instance, the goal may be to discover groups of similar data points (clustering), to determine the distribution of data within the input space (probability density estimation), or to project the data from a high-dimensional space down to two or three dimensions for visualization purposes. This document focuses on the first objective. Clustering can be understood as gathering data into groups of similar individuals known as clusters. Using similarity/dissimilarity measurements, points are assigned to one or more clusters with other data that share common features. Further, groups may be ordered by hierarchy or be linked to other groups. Clustering algorithms are preferred for exploratory purposes, such as when there is no a priori knowledge about relations existing within data. The most iconic clustering algorithm is called K-means.

3.3.6.2 Brief definition and technical description K-means is a hard-clustering (Gan, Ma, & Wu, 2007; Wu, 2012), partitional (Kaufman & Rousseeuw, 1990) method. On one hand, it is hard because any point only belongs to one cluster. On the other, is partitional because it divides the feature space into non-overlapping regions. Further, K-means proposes a single point to represent each divided region of the feature space. The former are called centroids or means, and they geometrically correspond to the center (mean) of the cluster (Wu, 2012). These kmeans are refined iteratively by minimizing/maximizing some similarity/dissimilarity function among all the members of each cluster. K-means is fast, scalable, and has a linear computational cost in regards to the dataset size (Gan, Ma, & Wu, 2007; Wu, 2012). Nevertheless, K-means requires clusters to be convex (e.g. spherical) and tends to perform poorly on different-sized groups (Gan, Ma, & Wu, 2007; Wu, 2012). Thus, they are prone to be outliers and not well-suited for modelling skewed distributions or noisy overlapping groups. A K-means algorithm works as follows: First, select randomly k centroids and assign the remaining to the closest centroid. Second, using all the points assigned to the ith-cluster, recalculate its centroid. Third, if a centroid does not change or changes little or other stopping criteria are satisfied, the algorithm ends. Else, it reassigns points to the new centroids and returns to the second step. Formally, given a dataset X = {xi }, xi ∈ ℝN , i = 1, … , m, K-means assigns each xi to a particular cluster ck ∈ C, k = 1, … , K by minimizing some objective function. As originally formulated, each partition of the feature space is determined by minimizing the Euclidean distance among the members of each cluster, and its mean is μk ∈ ℝN . Given that the Euclidean norm for a ck cluster is defined as: 𝐷𝑒𝑢𝑐𝑙 (𝑐𝑘 ) = √∑

𝑥𝑖 ∈𝑐𝑘

49

(𝒙𝑖 − 𝝁𝑘 ) ,

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Then the K-means objective function is defined as: 𝐾 𝑚 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ∑ ∑ 𝑾𝐷𝑒𝑢𝑐𝑙 (𝒙𝑖 , 𝝁𝑘 ), 𝝁𝑘 , 1 ≤ 𝑘 ≤ 𝐾 𝑘=1 𝑖=1

where μk is the centroid of the K-cluster, Deucl is the Euclidean distance, and W is m × k matrix that satisfies: 1. 2.

𝑤𝑖,𝑘 ∈ {0,1} 𝑓𝑜𝑟 𝑖 = 1, 2, … , 𝑚, 𝑎𝑛𝑑 𝑘 = 1, 2, … , 𝐾 ∑𝐾 𝑘=1 𝑤𝑖,𝑘 = 1, 𝑓𝑜𝑟 𝑖 = 1, 2, … , 𝑚

K-means performance is determined by several parameters: centroids initialization, an adequate number of clusters, and the similarity/dissimilarity measure. First, given that initial centroids are designed randomly, only local convergence is guaranteed (Gan, Ma, & Wu, 2007). However, this shortcoming is easily overcome by repeating the clustering procedure several times and choosing the partition where k-means objective function achieves the smallest value. Second, the determination of the number clusters for a sample is key in the performance of K-means. Although literature in this endeavor is large and vast, the k parameter is typically defined by experts criterion (Wu, 2012). Lastly, the similarity/dissimilarity metric must take into consideration the features domain (e.g. numerical, categorical, strings, and so on) to ensure that a proper distance is measured for the data. Although typically the Euclidean distance is employed, further measures may be used instead. By doing the latter, k-means improves its effectiveness and its speed when applied to high-dimensional data (Wu, 2012). Figure 3-7 poses a hypothetical example of the usage of k-means. Data from faults on transmission lines during storms is gathered. Measured features are the number of thunderbolts during a storm, which is a discrete variable, and the precipitation is a continuous one. In this example, data is gathered from non-fault lines shown as gray dots, line-to-line (L2L) fault lines (a short circuit caused by two energized lines) are shown in blue, and single line-to-ground (SL2G) faults (a short circuit due to a line touching the ground or a neutral conductor) are shown in orange. However, to show k-means functionality labels of lines, fault statuses are omitted. Moreover, for this example, the number of centroids was fixed arbitrarily to 3.

Figure 3-7: A toy example of clustering transmission lines during storm using K-means

Dots represent different types of fault lines: Gray corresponds to no fault at all, blue represent L2L faults, and SL2G are shown in orange. Centroids are displayed in red: The initial centroids are shown in the lightest red, whereas final centroids are shown in vivid red. Dotted red lines display the partitions corresponding to each centroid. As can be observed, as the optimization procedure unfolds, centroids altogether with their partitions are tuned (lightest red displays the first iteration, whereas vivid red shows the final centroid/partition). The corresponding numbers for each iteration are shown on the left side of the figure.

3.3.6.3 Potential applications in smart grid K-means applied building energy consumption

50

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

As has been stressed, before understanding and predicting energy consumption patterns in a building, it is key to attack climate change and to better define energy management policies, utility planning, and so on. However, the categorization of buildings is a rather challenging task: They are multidimensional and heterogeneous. On one hand, the number of components and interactions of a building electrical system are vast. On the other, each building population is also heterogeneous and composed of many sub-groups in different locations, with distinct legislations, energy requirements, and so on. Thus, ways to group buildings into clusters of similar energy consumption patterns are highly valuable. For instance, given a dataset of buildings and energy consumption (characterized by active power, reactive power, voltage, and so on), the most trivial approach would consist in applying K-means to data for exploratory purposes. Because no relation between data is known beforehand, K-means allows us to explore possible relations among data. In this example, the Euclidean similarity measure is employed. However, readers must be aware that such distance measure requires continuous independent features. Afterwards, the number of clusters to be tested is defined, and the results are displayed. Although numeric performance measures of the clusters exist, visualization of the clusters may provide more explicit hints on the relations between building energy consumption patterns groups.

3.4 PROBABILISTIC NETWORKS Probabilistic networks are representations based on graph theory and probability theory, for modeling domains with uncertainty and for making inferences with uncertain or incomplete information. They are based on a domain model through a set of random variables and their dependency relationships represented using a graph. This structure allows representing the joint probability distribution by a set of local probabilities, which significantly reduces the computational complexity in space and time. Probabilistic networks include, among others: 

Bayesian networks



Bayesian classifiers



Decision networks

These types of models are suitable to represent problems involving uncertainty; applications include medical and industrial diagnostic systems, user and student modeling, tutor strategies, planning under uncertainty, voice and gestures recognition, prediction, image analysis, and robotics. Reference (Kang, S. B., Advances in Computer Vision and Pattern Recognition, 2015) has detailed discussions on the Bayesian networks, Bayesian classifiers, and Decision networks.

3.4.1 Bayesian networks 3.4.1.1

Brief definition

A Bayesian network (BN) takes consideration of a set of local parameters. These parameters are the conditional probabilities for each variable given its network structure in Erreur ! Source du renvoi introuvable.. Therefore, based on these local parameters, the conditional probabilities can be represented.Erreur ! Source du renvoi introuvable. Depicts it is an example of a simple BN; the structure of the graph implies a set of conditional independence assertions for this set of variables.

A B

D

C

E Figure 3-8: Example of a simple Bayesian network

51

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

3.4.1.2 Technical description Representation For example, the joint distribution of a set of n (discrete) variables, 𝑋1 , 𝑋2 , … , 𝑋𝑛 , can be represented by a Bayesian network (BN). In Figure 3-9, the node A, B, C, D, and E correspond to variables that associate with conditional probability table (CPT) as indicated 𝑃(𝐸|𝐵, 𝐶). The structure of the network implies a set of conditional independence assertions, which give power to this representation. The joint distribution of this set could be represented by giving the structure of the network with conditional independence assertions. This joint distribution is defined as 𝑃(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) = 𝑃(𝐴) 𝑃(𝐶) 𝑃(𝐵|𝐴) 𝑃(𝐷|𝐵) 𝑃(𝐸|𝐵, 𝐶)

A B

D

C

E

Figure 3-9: Examples of conditional probability tables

Inference Inference uses a Bayesian network to compute probabilities. Inference involves a general scenario to compute 𝑃(𝑋|𝐸 = 𝑒), where X is query variable, E=e is evidence (observed) variable; and the joint distribution 𝑃(𝑋, 𝐸, 𝑌) is known, where Y is unobserved variable. There are two types of the inference: single-query inference and conjunctive-query inference, which consists of the effects of observed variables in a Bayesian network to estimate its effect on the unknown variables. Pearl’s algorithm, inference elimination, conditioning, junction tree, and stochastic simulation are the algorithms used for Inference. Structure and parameter learning Learning problem in Bayesian networks includes structure learning and parameters learning. When the structure or topology of the BN is known, and sufficient data are available for all the variables, parameter learning is straightforward and could estimate the CPTs for the variables. If there is not sufficient data, the uncertainty of the parameters can be modeled and estimated by a secondorder probability distribution like a Beta distribution for this situation. There are two main types of methods for structure learning: search and score for global methods and conditional independence tests for local methods. The complex process of obtaining the topology of the BN for structure learning requires good estimation on the statistical measures. Techniques such as trees, polytrees, general DAG depending on the type of structure could be used.

3.4.1.3 Application domains Bayesian networks have advantages to express a compact representation of joint probability distribution of nodes and fit data, it is an efficient way to represent complex probabilistic systems. Bayesian networks modeling in several real-world application domains are listed such as system biology, gene regulatory networks, medicine, biomonitoring, document classification, information retrieval, semantic search, image processing, turbo code, and spam filter.

52

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

3.4.1.4 Potential applications in smart grid Probabilistic networks can be used in power systems for diagnosis of a fault in different equipment, detection of inconsistent values in databases or sensors, detection of causes of technical and nontechnical losses of electricity, models for the prediction of energy demand, and models for the prediction of energy generation.

3.4.2 Bayesian classifiers 3.4.2.1 Brief definition Bayesian classifiers are statistical classifiers based on Bayes Theorem. It could perform probabilistic prediction of a particular sample is a member of a particular class. The Bayesian classifiers could be supervised or unsupervised. Clustering is for the unknown classes in the unsupervised problem and priori is for known classes in the supervised problem. Naïve Bayes classifier, tree augmented Bayesian classifier (TAN), Bayesian network augmented Bayesian classifier (BAN), semi-naïve Bayesian classifier, multidimensional Bayesian classifier, and Bayesian chain classifier are among the major Bayesian classifiers.

3.4.2.2 Technical description The formulation of the Bayesian classifier is based on the Bayes theorem to estimate the probability of each class given the evidences. 𝑃(𝐶𝑙𝑎𝑠𝑠|𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒) =

𝑃(𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒|𝐶𝑙𝑎𝑠𝑠)𝑃(𝐶𝑙𝑎𝑠𝑠) 𝑃(𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒)

The evidence normally consists of a set of observations 𝐸 = (𝑒1 , 𝑒2 , … , 𝑒n ). If a single most likely class is selected, the maximized probability 𝑀𝑎𝑥[𝑃(𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒|𝐶𝑙𝑎𝑠𝑠)𝑃(𝐶𝑙𝑎𝑠𝑠)] needs to be estimated for the Bayesian classifier. In general, the hard part is estimating 𝑃(𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒|𝐶𝑙𝑎𝑠𝑠), and some assumptions have to be made for the estimation.

3.4.2.3 Application domains Bayesian classifiers have been used for person’s skin detection by obtaining an approximate classification of pixels in an image as skin or not skin based on the color attributes of each pixel. Another application can be found in health field such drugs selection for patients and optimization of treatment decisions.

3.4.2.4 Potential applications in smart grid Examples of Bayesian classifier applications in smart grids include among others: Diagnosis of fault in different equipment, detection of inconsistent values in databases or sensors, detection of causes of technical and non-technical losses of electricity, models for the prediction of energy demand, and models for the prediction of energy generation.

3.4.3

Decision networks

3.4.3.1 Brief definition A decision network is often called influence diagram has decision nodes that chosen by action nodes and utility nodes additional to Bayesian networks to enable the rational decision making. Decision networks have compact graphical and mathematical representation and efficient evaluation to help a decision-making situation. The decision models should help the decision-maker to select the optimal choice under uncertainty by maximizing the expected utility.

3.4.3.2 Technical description The influence diagrams as directed acyclic graphs, 𝐺 viewed as an extension of Bayesian networks, incorporate with decision and utility nodes. Random nodes (𝑋), decision nodes (𝐷), and utility nodes (𝑈)   

Random nodes (𝑋) are chance variables associated with CPT; they are represented as ovals. Decision nodes (𝐷) are variables that make decisions; they are represented as rectangles. Utility nodes (𝑈) are measures of possible outcomes, usually, decision-makers are trying to maximize the utility; they are represented as the diamond.

53

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

An influence is denoted as an arrow, which connects the nodes described above and also expresses relevant knowledge from a node to another. A decision tree is a graphical representation of a decision problem, which is also complementary of influence diagram. It consists of three types of nodes that represent decisions, uncertain events, and results. Usually, an influence diagram has a much more compact representation than a decision tree.

3.4.3.3 Application domains These types of models are adequate to represent problems in which decisions have to be made with uncertain information. Some applications are educational, medical, and industrial diagnostic systems, such as student and tutor models to select tutorial actions in intelligent tutors given the current and incomplete information of the context.

3.4.3.4 Potential applications in smart grid Decision networks can be used to model intelligent power grids. They can be seen as a complex and uncertain system, where decisions can be done (for example, intelligent assistants in operation and maintenance diagnostic systems). Another potential application is the energy market, supporting and permitting both the suppliers and the consumers to be more flexible and sophisticated in their operational strategies.

3.5 DEEP LEARNING 3.5.1

Brief definition

Deep learning has a long history and many aspirations in solving practical applications. Modern practices involving deep networks, consisting of all of the most successful methods (Goodfellow, Bengio, & Courville, Deep Learning, 2016). Usually, to find the parameters of a model that corresponds to some desired functions, these methods are used for training. With enough training data, this approach is very compelling. Modern deep learning provides an essential supporting structure for supervised learning. A deep network can represent functions of increasing complexity by adding more layers and more features within a layer. Given sufficiently large models and large datasets of labeled training examples, the mapping from features can be accomplished by deep learning. A shortcoming of the current state of the art for industrial applications is that our learning algorithms require large amounts of supervised data to achieve reasonable accuracy. That is also the reason that many active research projects try to solve the shortcoming by using unsupervised deep-learning algorithms. Many deep-learning algorithms are also designed to tackle unsupervised learning problems, but none has truly solved the problem in the same way that deep learning has primarily solved supervised learning problems for a wide variety of tasks (Goodfellow, Bengio, & Courville, Deep Learning, 2016). The high dimensionality of the random variables is the main problem for unsupervised learning. This brings two recognized challenges: a statistical challenge and a computational challenge.

3.5.2

Technical description

Deep learning is a branch of machine learning using multiple processing layers, composed of various linear and nonlinear transformations. In Figure 3-10, the shaded boxes indicate components that are able to learn from data. The depth of a model includes the shaded boxes, representing the difference from rule-based systems, classic machine learning, and simple representation learning (Goodfellow, Bengio, & Courville, Deep learning, 2016). The deep learning has additional layers of more abstract features. These additional paths could refine a sophisticated and accurate computing model. The shaded boxes (layers) could be implemented, but is not limited to, the following: 1. 2. 3. 4. 5. 6. 7.

Linear factor models Autoencoders Representation learning Structured probabilistic models for deep learning Monte Carlo methods Confronting the partition function Approximate inference

54

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

8. Deep generative models

Figure 3-10: Deep learning components

3.5.3

Potential applications in smart grid

Two recent deep learning applications on energy demand are described here. One implementation uses deep learning methods to increase the accuracy of the estimated building energy demands and user behavior. Reference (Mocanu, Nguyen, Gibescu, & Kling, June 2016) investigates two newly developed stochastic models for time series prediction of energy consumption, namely Conditional Restricted Boltzmann Machine (CRBM) and Factored Conditional Restricted Boltzmann Machine (FCRBM). By using layer-wise unsupervised learning, the ability to train deep architectures could help achieve an accurate energy prediction. Another application is energy disaggregation, which estimates appliance-by-appliance electricity consumption from a single meter measuring the whole home’s electricity demand. Reference (Kelly & Knottenbelt, November 4-5, 2015) adapts three deep neural network architectures to energy disaggregation: 1) A form of the recurrent neural network called “long short-term memory.” 2) Denoising autoencoders. 3) A network that regresses the start time, end time, and average power demand of each appliance activation. With the unsupervised pre-training, unlabeled data from each house can be disaggregated.

3.6 VISUAL ANALYTICS 3.6.1

Brief definition

According to (Thomas & Cook, 2005), visual analytics is the science of analytical reasoning supported by interactive visual interfaces. The emerging of data and visual analytics becomes necessary for data users to understand better the relevant information contained in the data and make effective decisions. The data-analytics techniques discussed in this section could be applied to the enormous amounts of data that are being recorded in the electrical transmission grid. The combination of human knowledge and data analytics is a key to achieving knowledge discovery and data mining in planning and operating power transmission systems. In addition, providing interactive visual interfaces could help analysts and system operators to get a better impression of possible symptoms and suspicious behavior and understand power system performance to increase the situational awareness. Humans may directly interact with the data analysis and be well informed by using advanced visual interfaces.

3.6.2

Related research areas and challenges

The list of research areas related to visual analytics includes information analytics, spatial-temporal data analytics, scientific analytics, statistical analytics, knowledge discovery, data management, knowledge representation, cognitive and perceptual science, and interaction. Visual analytics can be seen as an integrated approach combining visualization, human factors, and data analysis. Cognition and perception are the human factors that play an important role in the communication between the human and the computer, as well as in the decision-making process. Information visualization and computer graphics often relate to the areas of visualization. Data management and knowledge representation, as well as data mining are profited from methodologies developed in the fields of data-analysis techniques. To make use of visual analytics to understand the behavior of electrical transmission grids, visualization systems for power system analysis can be contributed to and reveal

55

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

future research directions in this emerging field. Reference (VISUAL-ANALYTICS.EU, n.d.) has more research-related topics and projects that have been done or are still going on. The visual-analytics research challenges could be categorized in the following areas: visualization data, users, design, and technology, which include the challenges of: 

Dealing and integrating with huge, heterogeneous, variable quality datasets.



Meeting the needs of the users.



Assisting designers of visual analytic systems.



Providing the necessary infrastructure technology.

3.6.3

The visual-analytics process

In order to gain knowledge from data, the combination of a computation and visual model tied with human interaction suggests passing through the following visual analytics process. Figure 3-11 shows an general overview of the different stages (represented by square blocks) and their transitions (arrows) in the visual-analytics process (Keim, Kohlhammer, Ellis, & Mansmann, 2010).

Figure 3-11: The visual analytics process

First, data needs to be preprocessed and transformed to derive different representations for further exploration. The preprocessing tasks may include data cleansing, normalization, grouping, or integration of heterogeneous data sources. In many application scenarios, heterogeneous data sources will need to be integrated before computational or visual model analysis methods can be applied. Once the data preprocessing task has been completed, the decision whether to apply computational or visual model analysis methods is made. If a computational model analysis is used first, data-mining methods are applied to generate models of the original data. Once a model is created, this model must be evaluated and refined by interacting with the data. The loop between computational and visual model methods through model visualization and model building could lead to continuous refinement of models and verification of preliminary results. Thus, model visualization can be used to evaluate the findings of the generated models. Interpreting and verifying results at an early stage leads to better results and higher confidence. If a visual data exploration is performed first, data mapping methods are applied to visual models of the data. The interaction with a computational model analysis should also involve modifying parameters or selecting other analysis algorithms. Findings in the visualizations can be used to direct model building in the computational model analysis. The interaction with the visualization is intended to reveal a deep understanding of information, for instance by considering data from different perspectives or zooming in and out on different data areas. In conclusion, in the visual analytics process, knowledge can be gained from visualization, computational analysis, as well as the interactions between visualizations, models, and human. The knowledge would help to increase the situational awareness in many aspects of operating the electrical transmission grid. In Section 4 of the applications of data analytics, more details of visualization applications are explained.

56

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

3.7 REFERENCES [1] EPRI, "Advanced Data Analytics Techniques: Analysis and Applications for Power System Operation and Planning Support," Power Delivery & Utilization - Transmission, Jan 28, 2016. [2] S. Chen, A. Onwuachumba, M. Musavi and P. Lerley, "A Quantification Index for Power Systems Transient Stability," Energies 2017, 10, 984. [3] iTesla (Innovative Tools for Electrical System Security within Large Area), "Deliverable D2.4 Data mining methods - Uncertainties modeling for offline and online security assessment," July 29, 2013. [4] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning: with Applications in R., Springer, 2013. [5] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer, 2009. [6] C. Rao, H. Toutenburg, Shalabh and C. Heumann, Linear Models and Generalizations - Least Squares and Alternatives, Springer, 2008. [7] P. Ma and X. Sun, "Leveraging for big data regression," WIREs Comput Stat, vol. 7, no. 1, p. 70–76, 2015. [8] J. Ma and J. Cheng, "Estimation of the building energy use intensity in the urban scale by integrating GIS and big data technology," Applied Energy, vol. 183, pp. 182-192, 2016. [9] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014. [10] W. Chung, "Using the fuzzy linear regression method to benchmark the energy efficiency of commercial buildings," Applied Energy, vol. 95, pp. 45-49, 2012. [11] J. Lam, K. Wan, D. Liu and C. Tsang, "Multiple regression models for energy use in airconditioned office buildings in different climates," Energy Conversion and Management, vol. 51, pp. 2692-2697, 2010. [12] D. Hsu, "Identifying key variables and interactions in statistical models of building energy consumption using regularization," Energy, vol. 83, pp. 144-155, 2015. [13] S. Asadi, S. Shams and M. Mohammad, "On the development of multi-linear regression analysis to assess energy consumption in the early stages of building design," Energy and Buildings, vol. 85, pp. 246-255, 2014. [14] T. Walter and M. Sohn, "A regression-based approach to estimating retrofit savings using the Building Performance Database," Applied Energy, vol. 179, pp. 996-1005, 2016. [15] M. Braun, H. Altan and S. Beck, "Using regression analysis to predict the future energy consumption of a supermarket in the UK," Applied Energy, vol. 130, pp. 305-313, 2014. [16] G. Huebner, I. Hamilton, Z. Chalabi, D. Shipworth and T. Oreszczyn, "Explaining domestic energy consumption - The comparative contribution of building factors, socio-demographics, behaviours and attitudes," Applied Energy, vol. 159, pp. 589-600, 2015. [17] T. Hong, M. Gui, M. Baran and H. Lee, "Modeling and Forecasting Hourly Electric Load by Multiple Linear Regression with Interactions," in Power and Energy Society General Meeting, 2010. [18] K. Kandananond, "Forecasting Electricity Demand in Thailand with an Artificial Neural Network Approach," Energies, vol. 4, pp. 1246-1257, 2011. [19] G. Tso and K. Yau, "Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks," Energy, vol. 32, pp. 1761-1768, 2007.

57

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

[20] S. Halilcevic, A. Gubina, B. Strmcnik and F. Gubina, "Multiple regression models as identifiers of power system weak points," Generation Transmission and Distribution, vol. 153, no. 2, pp. 211216, 2006. [21] L. Rokach and O. Maimon, Data Mining with Decision Trees, Link, Singapore: World Scientific Publishing Co., 2015. [22] Z. Yu, F. Haghighat, B. Fung and H. Yoshino, "A decision tree method for building energy demand modeling," Energy and Buildings, vol. 42, pp. 1637-1646, 2010. [23] D. Graupe, Principles of Artificial Neural Network, World Scientific, Sep 2013. [24] S. A. Kalogirou, "Artificial neural networks in renewable energy systems applications: a review," Renewable and Sustainable Energy Reviews, vol. 5, no. 4, pp. 373-401, Dec 2001. [25] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, Dec 13, 1994. [26] S. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice Hall, 1999. [27] D. E. Rumelhart, G. E. Hinton and R. J. Williams, "Learning internal representations by error propagation," in Parallel distributed processing: explorations in the microstructure of cognition, vol. 1, MIT Press, 1986, pp. 318-362. [28] J. F. Kreider, X. A. Wang, D. Anderson and J. Dow, "Expert systems, neural networks and artificial intelligence applications in commercial building HVAC operations," Automation in Construction, vol. 1, no. 3, pp. 225-238, December 1992. [29] P. S. Curtiss, M. J. Brandemuehl and J. F. Kreider, "Energy management in central HVAC plants using neural networks," ASHRAE Transactions, vol. 100, no. 1, pp. 476-493, January 1994. [30] B. Muller and H. Keller, "Neural networks for combustion process modelling," in Proc of the Int Conf EANN '96, London, UK, 1996. [31] S. Milanic and R. Karba, "Neural network models for predictive control of a thermal plant," in Proc of the Int Conf EANN '96, London, UK, 1996. [32] J. K. Mandal, A. K. Sinha and G. Parthasarathy, "Application of recurrent neural network for short term load forecasting in electric power system," in Proc of the IEEE Int Conf ICNN '95, Perth, Western Australia, 1995. [33] A. Khotanzad, A. Abaye and D. Maratukulam, "An adaptive and modular recurrent neural network based power system load forecaster," in Proc of the IEEE Int Conf ICNN '95, Perth, Western Australia, 1995. [34] D. Datta and S. A. Tassou, "Energy management in supermarkets through electrical load prediction," in Proc of the First Int Conf on Energy and Environment, Limassol Cyprus, 1997. [35] D. Keles, J. Scelle, F. Paraschiv and W. Fichtner, "Extended forecast methods for day-ahead electricity spot prices applying artificial neural networks," Applied Energy, vol. 162, pp. 218-230, 2016. [36] A. A. Aburomman and M. B. I. Reaz, "A survey of intrusion detection systems based on ensemble and hybrid classifiers," Computers & Security, vol. 65, pp. 135-152, March 2017. [37] B. Scholkopf and A. Smola, Learning with Kernels, The MIT Press, 2002. [38] M. Singh, B. Panigrahi and R. Maheshwari, "Transmission Line Fault Detection and Classification," in International Conference on Emerging Trends in Electrical and Computer Technology (ICETECT), 2011.

58

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

[39] P. Ray and D. Mishra, "Support vector machine based fault classification and location of a long transmission line," Engineering Science and Technology, an International Journal, vol. 19, pp. 1368-1380, 2016. [40] G. Gan, C. Ma and J. Wu, Data Clustering: Theory, Algorithms, and Applications, SIAM, 2007. [41] J. Wu, Advances in K-means Clustering, Springer Berlin Heidelberg, 2012. [42] L. Kaufman and P. Rousseeuw, Finding Groups in Data, John Wiley & Sons, Inc., 1990. [43] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016. [44] I. Goodfellow, Y. Bengio and A. Courville, Deep learning, MIT Press, 2016. [45] E. Mocanu, P. Nguyen, M. Gibescu and W. Kling, "Deep learning for estimating building energy consumption," Sustainable Energy, Grids and Networks, vol. 6, pp. 91-99, June 2016. [46] J. Kelly and W. Knottenbelt, "Neual NILM: Deep neural networks applied to energy disaggregation," in ACM BuildSys' 15, Seoul, November 4-5, 2015. [47] J. J. Thomas and K. A. Cook, Illuminating the Path: Research and Development Agenda for Visual Analytics, IEEE-Press, 2005. [48] "VISUAL-ANALYTICS.EU," [Online]. Available: http://www.visual-analytics.eu/related/] . [Accessed 31 August 2017]. [49] D. Keim, J. Kohlhammer, G. Ellis and F. Mansmann, Mastering the Information Age Solving Problems with Visual Analytics, Eurographics Association, 2010.

59

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

4. APPLICATIONS OF DATA ANALYTICS IN SYSTEM OPERATIONS 4.1 INTRODUCTION In general terms, it can be stated that adoption of data analytics to support power system operations has been getting a late start as compared with other industries, or even other business areas in the electricity industry. However, as digital innovations grow in the electricity sector, companies are beginning to adopt and adapt. Because the availability of large volumes of data from sensors and devices in the power grid is growing exponentially, big data analytics techniques that are already applied in other industries now find their way to power systems. Over the past few years, major companies have started their big data projects and are competing to bring a set of IT tools to the market that are largely new to the utility industry. This section describes applications in power systems of the various analytics methodologies described in Section 3, with the focus on tools and techniques that use various sources of data to improve situational awareness and provide operation decision support. The description includes not only fully mature technologies being used in production mode in control rooms but also technologies in early stages of development, in the following main areas: 

Visualization analytics and technologies



Tools for system events detection, faults identification, and analysis



Wide-area monitoring



Equipment-health monitoring



Trending and forecast



Operational decision support

4.2 DATA VISUALIZATIONS IN REAL-TIME SYSTEM OPERATION The way data and information have been displayed and exposed to operators has evolved to a great extent over the years, as the technology has evolved. Energy management systems (EMS) were created to manage the physical flow of electricity in the grid following the 1965 blackout. During 1960s and 1970s, system visualization was based on analog computers and hardwired systems. For this system, sources of visual information were local, and displays of visual information were typically quite static. Circuits had to be switched to present different system information. Starting from the 1980s, with the birth of affordable digital computers, visualization in control rooms became networked and software-driven. Within the network, sources of visualization information could be anywhere and shared elsewhere—only communication packets had to be switched to get a different set of measurements. However, the visual information displayed was still typically quite static. Presently, as discussed before, large volumes of data from smart grid devices and distributed generation in the power grid are growing exponentially, and the need to further advance situationalawareness tools is greater now than ever before. Old-fashioned static visualization tools (using logs and tables) could hardly harvest the fruit of this new technology trend to improve situational awareness of the operating system. Emerging platforms include geographic-based dynamic visualization with user-friendly interfaces and real-time measurements and analytical results from measurement-based and model-based tools that populate the system map. This subsection focuses on how data is typically visualized in control centers of grid operators. Given the vast amount of work in this area, a detailed description of all currently available visualization tools and platforms for system operations is beyond the scope of this work. Therefore, the intention is to present a brief overview of main visualization approaches used in control centers, as well as a description of new trends and emerging visualization technologies. The interested reader can find in reference [1] a comprehensive survey of the state of the art of visualization products offered by numerous vendors and developers, and assessment of the effectiveness of the various approaches with recommendations for developing a visualization strategy.

61

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

4.2.1 Visualization technologies in control centers The traditional control center has a graphical representation of the present state of the network and the generation that is directly connected to the transmission grid. Such representation usually includes some kind of general view available to all operators and more detailed views in each workstation, as shown in Figure 4-1.

Figure 4-1: Overview of a control center monitor display

The various displays are intended to expose real-time conditions of the grid, as well as trends of relevant system variables, to help power system operators maintain adequate situational awareness and respond to conditions potentially threatening the system stability in an expedited manner. The way in which system data is presented to the operator can support the strengths and reduce the effects of limitations of human perception and performance, thereby enhancing operator situational awareness. As explained in Section 3, there are several principles of display design that help to understand how humans detect, process, interpret, and act on information[1]. Some ground rules are:  

A display should look like the variable that it represents. Processing a large set of information can be facilitated by dividing this information across several resources (e.g. using both visual and auditory information) and minimizing the cost in time or effort to “move” selective attention from one display location to another to access information.

These principles lay the foundations for how to design a human-machine interface that satisfies the needs of human abilities to process information and prevents the negative consequences of cognitive biases. A description of the main components of the human information-processing system, and how they apply for display design in system operations, can be found in [1]. Generally, the key driver for the selection of the appropriate visualization display depends on the task at hand. For example, if one wants to understand the overall voltage variation across a region, then contours can be quite effective, but if the exact voltage to three decimal points is needed, then a numerical display is more appropriate. Many techniques have already been applied to the field of power system visualization, with some of them described in the following subsections.

4.2.1.1 Schematic network diagrams (one-line/single diagram) A schematic network diagram is a simplified notation for visualizing an electrical power system. Elements on the diagram do not represent the physical size or location of the electrical equipment. The display is optimized to provide the user a good overview of the network topology.

62

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-2: Examples of schematic network diagrams

Both types of network diagrams provide:   

A consistent visualization and interaction possibilities, such as for alarms, outage areas, zoom, pan, switching, and adding notes. The ability to toggle between schematic and geospatial views. The possibility to open a new network diagram from an existing one (example: open schematic view from geospatial view).

4.2.1.2 Contouring The use of color contouring on one-line diagrams is a common technique to highlight feature that attracts attention to a particular area within a display, thus reducing the size of the search space and for facilitating target detection. A wide variety of different color maps are possible, utilizing either a continuous or discrete scaling. Color contours take advantage of the fact that as humans, we perceive the world in patterns. Hence the speed in which color codes can be interpreted and compared is often faster than numeric processing. The use of discrete symbols for visualization can be quite helpful, provided that the number of individual symbols is relatively low (less than several hundred). However, as the number of values grows, the displays eventually become too cluttered, making it difficult to detect any underlying patterns. To avoid that problem, color mapping can be designed not to cover the entire data range but rather to highlight values within a particular range of interest. As an example, Figure 4-3 shows the use of contouring to just highlight voltages that are below 0.98 per unit for a case in the Northeast region of the U.S. Eastern Interconnection [3].

63

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-3: Contour showing voltage magnitudes with values below 0.98 per unit

It is important to carefully select the colors that are used to represent the different elements or conditions as to avoid potential covering or camouflaging other important information [2]. Transparency is also used in some cases for this purpose. Contour gradients are another variation of contouring used to represent and compare to classes of values. Thus, the operator can identify significant deviations in the network at a single glance, as well as their location and severity. In the example shown in Figure 4-4, generation infeed from renewables is visualized such that red and yellow spots represent a higher generation, while green spots indicate a lower generation compared with the current (in the cited case, today’s) schedule [28].

Figure 4-4: Examples of contour gradients for continuous values

2D bubbles A new visualization strategy focuses on highlighting significant deviations from normal states. In contrast to contours that melt into one another, the 2D view shows only single-colored bubbles with a fixed radius around a bus. By only coloring the specific area around the bus, misunderstandings can be avoided. Examples are illustrated in Figure 4-5. Another key principle is that bubbles are only shown when there is a deviation that needs the user’s attention—similar to putting the spotlight on. The colors indicate the type of deviation (e.g. high voltage = yellow, low voltage = orange). The severity is shown by the color density, with low density

64

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

indicating upcoming problems and high density representing severe problems. Violated limits are coded additionally by showing a small black ring. With increasing excess of the limit value, the bubble grows beyond the initial radius, creating a halo effect around the black ring. Feeders that are connected to a violated bus bar are highlighted in the same color as the bubble to indicate the impact of the deviation on the network. The operator receives a first-level indication of a potential problem on a feeder, even if the feeder itself does not have real-time measurements [27].

Figure 4-5: Situational awareness by 2D bubbles

4.2.1.3 Three-dimensional visualization The key advantage of 3D is its ability to show the relationships between multiple variables. Usually in 3D visualizations, the third dimension is used with some abstract objects, such as a cylinder, in which attributes of the objects such as size and coloring correspond to the value of an underlying variable. For example, to provide situational awareness related to voltage security, one is often interested in knowing both the location and magnitude of any low system voltages, and also the current reactive power output and the reactive reserves of the generators and capacitors in the cylinder form with different colors. Such a situation is illustrated as an example in Figure 4-6. As stated in reference [1], the potential advantages of 3D graphic displays over 2D numeric displays are significant. The added dimension and pictorial enhancements may provide, among others, the following benefits: increase the amount of information that can be presented on standard display screens, assist in navigation and search activities, and facilitate more accurate mental models of the systems being manipulated.

65

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-6: 3D display showing bus voltages and generator reserves

3 D cones This an extension of the 2D bubbles visualization presented above, where 3D transparent cones are used to display variables of interest with less obfuscation of other parameters, as illustrated in Figure 4-7Figure 4-8. The height of the cone indicates the severity of the violation, so non-critical deviations (limit not yet violated) stay flat for indicating potentially upcoming but not yet critical problems. In addition, the pointing direction is showing the type of violation. Hence, low voltage violations as cones pointing downwards with high voltage violations as cones pointing upwards. Each cone matches a bubble with a circle in the 2D view, supporting the user’s orientation in the system. The described principle can be applied to multiple other scenarios, for example for representing areas with very low versus very high demand, or outage indices such as customer average interruption duration index (CAIDI) [27].

Figure 4-7: Example of situation awareness by 3D cones

66

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-8: Example of situation awareness by 3D cones

4.2.1.4 Animation Some visualization tools provide the option to display animated vectors to visualize power system dynamics. Figure 4-9 shows an example of animation of power flows, where the direction of the transmission line animation corresponds to the direction of flow in the physical system [28]. In this figure, animated power flow arrows display profiles for active and reactive power flows. On demand, the user can turn the animation for a window off or on. Further, some tools offer the ability to define thresholds based on percent of thermal limits or other parameters, which can be set so that flows get animated only when they approach alarming levels. Animated arrows are used to visualize MW and MVAR flows to identify loop flows or other abnormal patterns. According to the results of human factor experiment conducted by D. A. Wiegmann and other researchers [4], animation in power system displays can be very effective to help operators interpret displays by directing their attention to the most important information for a particular task or situation. It also enhances an operator’s understanding of system behavior. If properly configured, it also assists an operator to better assess current system states and the causal factors that underline those states, decide on mitigation measures if a violation of system resources occurs, and provide immediate feedback regarding the effectiveness of implemented measures.

Figure 4-9: Example of animated power flow arrows in distribution feeders

4.2.1.5 Renewables and dispersed generation Most countries have seen mass introduction of renewables (wind and solar) in their systems, sometimes as big wind and solar parks directly connected to the transmission grid, but mostly as dispersed generation installed in the distribution networks. This widespread installation of dispersed generation has an impact of the operation of the transmission network. The challenge is threefold:  

How to gather the information from DSOs and from weather forecasts. How to predict the behavior of this kind of generation and its impact on the system.

67

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



How to present this information to the operators in a practical way.

Some examples of visualizations of renewables are given in Figure 4-10 and Figure 4-11.

Figure 4-10: Visualization of dispersed generation in operator workstation at RTE

E Figure 4-11: Visualization of dispersed generation in the general panel at Red Eléctrica del España

4.2.1.6 Geospatial representation One of the most important recent developments in the visualization field has been the integration of power system information with GIS-based displays, in which the power system is drawn superimposed over other geographic information. These systems may integrate various displays of data information such as satellite images, weather, road maps, and other infrastructures into a common display, to facilitate interpretation of the correlation between the different elements involved in the problem being assessed. Indeed, the coordinates of graphical elements have a relation to the real world. The user can view the geographic position of equipment, outages, and so on. Furthermore, the user can view streets, customers, land usage, or other images. Figure 4-12 shows an example of geospatial network diagrams.

68

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-12: Examples for geopatial network diagrams

They are quite useful for particular uses. One such example is fault location. In that case, the ability to couple the fault location, expressed as a distance from the terminals of the line, with road maps and/or satellite imagery can be quite helpful in dispatching repair crews [1]. Another example is when the GIS integration is used to show the relationship between different infrastructures such as is done with the Oak Ridge National Laboratory VERDE (Visualizing Energy Resources Dynamically on Earth) platform. VERDE is a software application that uses the Google Earth platform to provide real-time visualization of the electric power grid. Its capabilities include line descriptions and status of outage lines; geospatial-temporal information and impacts on population, transportation, and infrastructure; analysis and predictions results; and weather impacts and overlays [5]. The assessment of visualization technology presented in reference [1] concludes that even though GIS integrated tools produce impressive graphic representations, they are not necessarily the best alternative to help an operator understand the state of the electric power system, especially in control of a transmission system. The reason is that, for example, in wide-area power system operating condition visualizations, the locations of elements of greatest interest electrically, such as substations and power plants, usually have a very small geographic footprint as compared to the entire area. Hence, their representation in exact geographical coordinates may not be useful. Also, the urban areas that contain much of the electric infrastructure are likewise relatively small geographically. Transmission lines sharing a common corridor or even transposed on a single tower would likewise be difficult to differentiate in a pure GIS representation because they are essentially in the same location. Finally, some traditional one-line elements, such as aggregate loads, are often spread over a large geographic area and hence would be difficult to define precisely. Traditional control center map boards that focus on display topological representation of the network, with pseudo-geographical coordinates in some cases, seem to be the preferred option.

4.2.1.7 Integrated system view Dynamic icons assigned to specific incidents will be displayed at the network diagrams. The icon pops up at the topological diagram position, where the incident occurred. The user can access further details on the individual elements on demand. By clicking an icon, a small info box will pop up, showing the most relevant information as well as a link to further information such as the complete outage record. The operator can open multiple info boxes at the same time for comparing information. For example, for the task “reviewing outage details and location,” the operator needs to get a comprehensive overview on the network diagram regarding: 

The outage location (as determined by outage management prediction engine)



Trouble calls from customers



Crews and their availability



Critical customers such as hospitals and police stations

69

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Indicators for fault locations (derived by network analysis)



Planned maintenance work in the same area

Figure 4-13 shows three screenshots of a tool that produce integrated system view [28].

Figure 4-13: Integrated system view with Icons and Info boxes

4.2.1.8 Display profiles With display profiles, it is determined what type of information is an additional layer on the network diagram, as shown in Figure 4-14 and Figure 4-15 [28]. Thus, the system administrator configures display profiles for the specific operator workflows. An operator can switch between the display profiles on the fly. Display profiles could contain: 

One or multiple types of visualizations



Show/hide topological coloring



Level of background opacity

Selection of profiles

Outage Information

Network analysis

Figure 4-14: Distribution network visualization

Voltage violations Selection of profiles Figure 4-15: Distribution network visualization

70

Network analysis

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

4.2.2 Example of control room visualization at ISO – ERCOT case This section presents an example of modern visualization technology implemented at the control room of the Electric Reliability Council of Texas (ERCOT). ERCOT manages the flow of electric power to 24 million Texas customers, representing about 90 percent of the state’s electric load. As the independent system operator for the region, ERCOT schedules power on an electric grid that connects more than 46,500 miles of transmission lines and 570+ generation units. The ERCOT control room sees an enormous amount of data flow each day. In order to manage the data, extract useful information from it, and properly present it to the operators, ERCOT uses standardized display building software. This brings the following benefits [6]:   

Enhances the ability to customize displays that are dynamically updated by the underlying model. Adopts an industry standard display building process. Is more secure for the production environment while allowing easier access for users.

Display Principles To ensure that the data being presented to the operators is relevant to the issues of real-time operations, the information to be displayed must be carefully selected. 1. Indicators of system state/health: Good indicators of system performance need to be developed and critical functions identified. The displays should use these indicators and functions to summarize the state of the system, with the ability to show detailed information on-demand. 2. Alerts and alarms: Logical and consistent displays should be developed to show alerts and alarms for violations of metrics associated with indicators of system health. 3. Sources of data: Multiple systems provide data to the control room, and related data from these varied sources should be compiled together to ensure a holistic view of the system. 4. Division of responsibility: The ERCOT control room has eight “desks,” each administered by an operator. Each of the functions carried out by the eight desks requires its own set of displays leading to an enormous inflow of information to the control room. The information needs to be organized in terms of the function(s) it helps address. The desks are: a. Real-time:  Ensures that frequency within the ERCOT system remains within the tolerances specified by the protocols and NERC.  Monitors the health of the security-constrained economic dispatch (SCED) application and validates the reasonableness of the solution.  Verifies the quality of load forecast data and switches sources when necessary. b. Transmission and Security:  Analyzes base case and post-contingency constraints and takes actions to maintain system reliability.  Responsible for ensuring that the ERCOT system is operated so that instability, uncontrolled separation, or cascading outages will not occur.  Updates stability limits for all ERCOT generic transmission constraints (GTCs) every 10 minutes. c.

Resource Operations:  Monitors ancillary service levels and executes a supplementary ancillary services market (SASM) when necessary.  Deploys and recalls reserves as system requirements change.  Manages planned, maintenance, and forced outages for generation resources.

d. Shift Engineer:

71

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

 Works closely with the ERCOT control room system operators, providing round-the-clock support for analysis and system applications.  Develops and authors congestion management plans for mitigation of temporary and ongoing grid vulnerabilities.  Gathers relevant and accurate information about grid events and communicates that information in a timely manner to the shift supervisor and engineering support groups. e. Shift Supervisor:  Monitors the operation of all desks in the control room.  Continually reviews and analyzes system security.  Provides the primary point of communication with ERCOT management and market participants. f. DC Ties:  Schedules and monitors energy transactions into and out of the ERCOT control area across the asynchronous DC ties.  Coordinates the import of emergency energy across the DC ties into the ERCOT control area during emergency operations. g. Reliability Unit Commitment (RUC):  Oversees the weekly reliability unit commitment (WRUC), day-ahead reliability commitment (DRUC), and hourly reliability unit commitment (HRUC) processes.  Performs hourly studies to identify potential voltage problems on the ERCOT system.  Responds to inquiries about RUC commitments. h. Reliability Risk:  Coordinates with the RUC, real-time, transmission and security, resource, operations, shift supervisor, and other ERCOT operators as necessary to maintain grid reliability .  Responsible for the safe and efficient operation of all intermittent renewable resource (IRR) generation assets.  Responds to inquiries about intermittent generation dispatch, wind and solar forecast, operations, curtailments, and other related tasks. Figure 4-16 is an overview of the control room main visualization board and various displays. Figure 4-17 shows displays related to generation and load information. The graphic on the right displays in a graphical information non-spin and quick-units, to facilitate interpretation by system operators.

Figure 4-16: ERCOT control room - 2016

72

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-17: ERCOT control room - load and generation details display and quick start/non-spin graphs

Figure 4-18 is an overview of the visualization tool that is used to display wind generation details. It is intended to provide the operator with breakdown of wind generation by zone, allowing the operator to visualize current wind generation trends in one screen. Figure 4-19 is a snapshot of the real-time sequence monitor, whose main objective is to summarize results from real-time security assessment tools, such as state estimation and contingency analysis, with timer indicating last execution. It provides the shift engineers and operators with alarms when real-time applications have not successfully run. Figure 4-20 is the system voltage overview display. It gives an overview of voltage levels at some 345-kV and 138-kV busses around the ERCOT system. It also alerts operators when voltage levels are too high or too low and indicates what reactive devices can be put in service to help control voltage.

Figure 4-18: ERCOT control room – wind generation

73

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-19: ERCOT control room - real-time sequence monitor

Figure 4-20: ERCOT control room - system voltage overview display

4.2.3 Emerging trends in control room visualization 4.2.3.1 Space and time visualization In most cases, control centers visualizations represent a network topology and generation state (space). However, a system operator should rather analyze and prepare for the near future (time). A new paradigm of visualization is needed to integrate space and time in a practical way. Because of the increased variability and volatility of system operation conditions caused by the growth of renewable generation, increased power transfer through interconnections, demand changes, and other factors, operators need to be informed in advance of potential risks. They need to be guided on possible mitigation actions that can be taken and the time the actions should be executed to be effective. Hence, visualizations should to be integrated with the various data processing and analytics tools built into modern energy management systems (EMSs) to provide operators a comprehensive assessment of system vulnerability and control actions, considering the following aspects: 



Situational awareness on first sight: o

Workflows of a specific user group have to be mapped in the visualization.

o

Seamless interplay of applications need to reduce navigation effort during the workflow. Visualization tool should integrate results from other operation support tools and allow operator to easily navigate among them.

Projection of future status: o

Integrated study environment (analysis of possible future events through simulation grid models).

74

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



o

Fully integrated day-ahead and intraday congestion forecast.

o

Contingency analysis (N-n) to prevent thermal and voltage problems.

Recommendation for operation: o

Provide incident patterns based on analysis of previous incidents or via simulations with the grid model. The interaction of these incident patterns with the current grid image helps an operator to understand the potential risk on the current operating condition.

4.2.3.2 Visualization for inter-area coordination As the extension of power system interconnection increases, there is a significant need to improve coordination with the many utilities, system operators, and other agents involved in system operation and security. For example, in Europe several initiatives are underway to improve the coordination of system operation among TSOs and across countries. Two main Regional System Coordination Initiatives (RSCIs) exist today, namely: CORESO [7] and the Transmission System Operator Security Cooperation (TSC) [8]. They operate a control room that gathers information from each TSO participating in the initiative, and they carry security analysis for the whole area, 24 hours a day, 7 days a week. These kinds of initiatives are generating new visualization needs that go beyond the physical representation of the network. They need to aggregate a huge amount of data to present it in a practical way to operators. Figure 4-21 provides an overview of the CORESO control room, where the representation of the participant regions can be seen on the main wall map.

Figure 4-21: CORESO control room (www.coreso.eu)

In the United States, the situation is different. The North American system is split into three distinct grids: the Eastern, Western, and Texas Interconnections. These grids are operated independently and are only electrically tied together by several DC links. There is no single party responsible for coordinating the operation of all these areas.

4.2.3.3 Time-driven situational awareness Time-driven situational awareness is a new concept being developed by RTE in France. RTE calls this system Apogeé. The objective is to create an application that will provide the operator a single user interface based on the hyper-vision concept. The application is intended to help a system operator to focus on the actions that he must take by presenting at the right time the relevant information that he needs to make the right decisions. The system interfaces with the operator support tools available in the control room. It filters and processes the data from these tools to generate key information and schedule the time when that information will be presented to the operator through a dedicated graphical interface.

75

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

To illustrate this idea, this section describes a forecast security tool, which will be part of the hypervision system. On a rolling basis, the system gathers data from forecasting tools such as load, renewable generation, and market results forecasts, as well outage planning, and combines that data with the last SCADA snapshot to feed it into grid models that have been updated to represent foreseen system conditions in the near-term future (few minutes, to 24 hours). The system performs security analysis through data analytics and modelling tools. If a constraint or reliability issue is detected, it assesses the effectiveness of possible remedial actions that have been used in the past in similar situations, or that have been considered in grid studies to solve the type of problem being analyzed. If no solution is found in the remedial action library for the foreseen constraint, the system alerts the operator that a detailed evaluation needs to be performed to assess whether and when a preventive action is to be taken to mitigate the security risk. In that case, the operator performs studies to design a proper solution to the constraints in question and adds the solution to the remedial actions library for future use. The system will alert the operator in a timely manner, so that the mitigation actions can be effectively implemented. The hyper-vision user interface remains empty as long as no potential unsecure conditions are detected within the time horizon of the analysis. If a constraint is identified for any future operating condition considered, it is displayed in the upper timeline along with the proposed remedial actions, as shown in Figure 4-22. The operator has access to detailed information about the constraint and results of the analysis.

Figure 4-22: Example of control actions displayed in the main interface of Apogeé

To monitor the process, a time-based supervision display that synthesizes the results of the forecasted security analysis is also proposed. Figure 4-23 illustrates the concept (labels and names have been obfuscated to protect sensitive information). The first column is an expandable tree representation. The first level of the tree displays contingencies that result in constraints. For each of those contingencies, the field can be expanded to show the second level, which contains further description of the constraint and recommended remedial actions. The color code is as follows: Type Contingency

Color Green

Meaning Constraints are detected. There is at least one effective remedial action.

Red

Constraints are detected. No effective remedial action found.

Constraint

Black

The constraint is detected.

Remedial action

Green

The remedial action is effective.

76

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Red

The remedial action is not effective.

Figure 4-23: Example of the time-based constraint display in Apogeé

4.3 DATA ANALYTICS IN SYSTEM OPERATION SUPPORT PROCESSES The following use cases are among the most recognized ones in relation to energy data analytics to support system operation:             

Real-time situational awareness with PMU data System event detection (detection of equipment failure or malfunction) Fault location and root cause analysis Real-time stability monitoring Alarm processing and filtering Renewable energy generation forecasting and storage analytics Damage prediction (weather related or due to other causes) Outage restoration analytics Grid optimization and power quality analytics (including voltage control) Peak load management (via demand-side management analytics) Load research analytics and energy portfolio management analytics Non-technical loss analytics Physical and cyber security assessment analytics

In the following sections, we present a brief description of these use cases, with references where interested readers can find more details.

4.3.1 Real-time situational awareness with PMU data Outstanding characteristics of Synchrophasor data, namely high resolution and time synchronization, make it possible to monitor power system dynamic performance as well as grid stresses over a large geographical area. Synchrophasor applications for on-line or near real-time operations enhance situational awareness and help detect situations that can threaten reliability of the grid. On-line applications include, among others, system electromechanical oscillations detection and evaluation of associated damping, voltage, and angular stability assessment; voltage sensitivities with respect to real and reactive power changes; display and analysis of voltage angle difference over wide geographical areas; improved state estimation; islanding detection and monitoring; and event detection [12]. Actionable information from these applications is useful when there is sufficient time for an operator to take action to mitigate the threat. In cases where there would be not enough time for an operator action, automatic corrective control should be designed and implemented. Even though there is a

77

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

great potential to use Synchrophasor data for automatic control, not many applications have been successfully implemented. One of the main hurdles that prevents extended use of control applications is data quality and availability. Apart from those main applications, analytics that use Synchrophasor data have been developed to identify and diagnose a wide number of grid events, such as failing potential transformer, capacitor bank switching issues, open phases on breakers, negative sequence concerns, issues created by variable loads (such as arc furnaces), as well as generation and transmission equipment misoperations [13]. Because of the advantages of Synchrophasor technology to enhance monitoring and situational awareness of the grid, many electric systems have deployed a large number of PMUs across their footprints. In the USA, for example, the Smart Grid program led by the Department of Energy in the recent past resulted in the installation of PMUs at about 1000 substations and an extensive communication infrastructure to collect and archive the data. Other countries, in particular China and India, have also implemented plans for large deployment of PMUs and the corresponding communication and computation infrastructure. There is an abundant technical bibliography on Synchrophasor technology and applications. Reference [14] gives a thorough update of Synchrophasor projects in North America, with detailed explanation of the applications currently implemented. Reference [15] provides a methodology for identifying and estimating the benefits of using Synchrophasor technology to enhance grid operations and planning. Also, a wealth of technical information can be found at the North American Synchrophasor Initiative (www.naspi.org). Several vendors provide platforms and software solutions for various one-line applications of Synchrophasor s data [12][16]. The following visualizations examples represent standard applications of typical wide area monitoring systems based on Synchrophasor technology:

4.3.1.1 Power swing recognition (PSR) Wide area monitoring is getting more and more in focus in the control center visualizations. They will be integrated in the standard visualizations to monitor the whole roundtrip from monitoring until the automatic regulations. The “power swing recognition,” also called “oscillation monitoring system” (OMS), can recognize, evaluate, and display active power swings in the energy supply network. This ensures that power swings that can be dangerous for network operation are recognized and reported automatically. A power swing can be observed between two locations by evaluating the phase angle difference of the PMUs involved, or at a single location by evaluating the active power determined there. If a power swing measured in terms of phase-angle difference is present, the locations of both PMUs involved are circled and the associated connection line of the same color is inserted between them (see connection Paris – Rome in Figure 4-24 [29]). If a power swing measured in terms of active power at an individual PMU is present, then the PMU where the measurement was made is marked with a circle (see Copenhagen in the following figure). The assigned color represents the damping ratio and amplitude quantities, which are required for a meaningful estimate of the actual degree of danger. These quantities, coupled with the associated limiting values, give a degree of danger that forms the basis for assessing the potential consequences of the detected power-swing event.

78

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-24: Swings in the map

4.3.1.2 Power system stability curve (PSS) This curve displays the state of the complete power system. This kind of “fever curve” is calculated from all available measured values for which the limiting values are defined. The user can assign parameters for which measured values are to be included in the calculation. The curve is calculated from the weighted distances between the measured values and their limiting values. The curve can be displayed by defining the time range. It is divided into defined time steps (hours, for example). In the on-line mode, the right end of the diagram shows the current value, as shown in Figure 4-25 [29].

Figure 4-25: power system status, change time range

If any of the actual limits is violated by any measurement, the PSS curve changes the color to red. Trends to instability can be easily recognized by a rising level of the PSS curve. A customer can optimize the settings of the limiting values, such that the PSS curve shows the appropriate sensitivity of the power system.

4.3.1.3 Island state detection (ISD) The objective of island state detection is to use the measured values of the frequency (f) and the rate of change of frequency (df/dt) available in each PMU to determine whether separated networks have formed. If there is islanding between two or more substations, then the detected islands are displayed in the schematic display as colored areas. If only one substation is in the island, the area around the substation is displayed as a square. For several substations, the area is displayed as a polygon, with the substations as corners (see Figure 4-26 [29]). In this example, four detected islands have: 1. Island (orange): Copenhagen 2. Island (blue): Paris - Nuremberg - Rome - Munich

79

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

3. Island (green): Muelheim 4. Island (violet): Vienna

Figure 4-26: Schematic display with recognized islands

4.3.1.4 Visualization of angle differences The phase angle difference of the voltages between different PMUs can be displayed in graphical form. The locations form triangles, such as Copenhagen – Paris – Rome, which are shown in color (see Figure 4-27 [29]). In the figure below, Nuremberg is defined as the reference PMU and shown in a white frame. If a PMU has a positive phase angle difference in comparison to the reference PMU, the phase angle leads. If a PMU has a negative phase angle difference in comparison to the reference PMU, the phase angle lags. Color deviations indicate angle differences.

Figure 4-27: Phase angle difference of the voltages between different PMUs

4.3.2 Fault identification, location, and analysis In case of power system faults, protection and fault analysis engineers are often presented with event recordings coming from various substations and also from multiple devices. There are several challenges in processing these files efficiently in order to perform analysis and make decisions based on the event data. Once this event data has been classified and prioritized, fault location analytics will

80

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

allow engineers to correctly select the affected transmission line, determining fault type and performing fault location calculation. The use of data-analytics techniques and tools for identification and classification of power system events has been the subject of extensive research. It is probably one of the most studied areas for the use of data analytics to support system operations. Reference [9] provides a survey of dataanalytics applications to support system operations, with emphasis in fault and event detection and analysis. Techniques used for fault detection, classification, and analysis include: Artificial neural network (ANN), wavelet transform, support vector machine, k-Nearest Neighbor, decision tree, and association rules. See section 3 in this document for an explanation of these techniques. Hybrid methods that combine various techniques have also been developed. For example, reference [10] presents a new method based on combined wavelet transform-extreme learning machine (WT-ELM) technique to identify, classify, and locate a fault in a series-compensated transmission line. Fault-diagnosis methods based on association rules can take both spatial and temporal characteristics into account. The resulting set of rules could obtain a real-time model that helps to create a list of preventive actions to be taken. A second suggested advantage of this approach is that the data is provided by protection equipment like relays. Sequences of events like voltage sags can be analyzed using pattern sequence discovery algorithms that are an association rule-based method. The events that are measured in a measuring point have an associated time of occurrence, and temporal patterns that occur with a sufficient frequency are identified. The time spans between expected related events that are the result of this model can be used for prediction and prevention of successive events. In the same field of fault but a different specific objective, an application using advanced dataanalytics techniques has been proposed for fault-direction detection for protection purposes. It based on Multilayer Feedforward Neural Network (MFNN). It is claimed that the proposed discriminator is fast, robust, and accurate. And it is suitable for realizing an ultrafast directional comparison protection of transmission lines [11]. As concluded in the study presented in [9], only a few of the developed data-analytics algorithms have been implemented in production mode in electric utilities. One of the difficulties to fully implement them is the need for appropriate communication and data integration infrastructure. Certainly, a system intended to perform fault detection, classification, and location has to operate in on-line mode with minimal manual interventions in place. This requires a communication system to retrieve the appropriate data from relays, digital fault recorders, and any other necessary devices and securely send this data to a centralized location where the algorithms are run. If the methodology uses data from other sources as well (such as weather data), such data also needs to be timely available and properly integrated with the electrical data for the analysis. Example: Lightning correlation detection One such use case is the real-time outage and lightning correlation described in [21]. In the event of a feeder outage, the correlator process combines data on current network topology and geography, circuit breaker operation, and lightning data from a lightning location service to show the affected feeder area and the lightning strike that caused the outage of the feeder. This provides valuable information to the dispatcher to coordinate the field crew and accelerate remedial activities.

81

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-28: Correlation of a feeder outage and lightning strike

Input data on lightning activity is provided by a lightning detection system (LDS), such as Euclid in Europe or NLDN in the USA/Canada. Electrical grid data is provided from GIS (geographical information system), where powerlines and switching elements, such as circuit breakers, are modeled. The network topology processor (NTP) builds network topologies from GIS data. When it receives changes in states of switching elements from SCADA, it rebuilds the topology and calculates the supply areas for every switching element. It then processes whether any lightning events have occurred in the vicinity of the powerlines of the supply areas. If the time of the lightning event and change of switching state, taking into account the relay protection settings, correlate temporally and spatially, the specific lightning event is proclaimed as the cause of the failure [21]. Example: Smart Cable Guard DNV GL’s Smart Cable Guard [39] is an analytics-based advanced tool to locate faults in MV cables while providing information of developing weaknesses in MV cables. This is done by detecting partial discharges that are developing in these cables. Knowing the location of a defect within a 1% accuracy of the cable length enables a network owner to replace the weak spot before it results in a breakdown. This will help to reduce both the SAIDI and SAIFI and will enable network owners to plan repair work (at optimal costs), and in most cases, it also enables them to better/faster identify the root causes of defects. The system uses two time-synchronized sensors that capture and measure data from the MV cable.

82

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-29: Smart Cable Guard system and web interface, showing the location of increasing partial discharge activity over time

Example: OpenXDA platform An open-source platform that integrates several applications has been developed by Grid Protection Alliance (GPA) in the United States, under the auspice of several sponsors, including EPRI, electric utilities, government agencies, and other research organizations. One of the applications is the openXDA software, which is an extensible platform for processing events and trending records from disturbance-monitoring equipment such as digital fault recorders (DFRs), relays, power quality meters, and other power system intelligent electronic devices (IEDs). Open PQ Dashboard, another application of the suite, provides visual displays to quickly convey the status and location of power quality anomalies and other events throughout the electrical power system (see Figure 4-30 [22]). It is also used to display results from openXDA in the geo-referenced visualization panel. Summary displays start with the choice of a geospatial map-view or annunciator panel, both with visualizations for across-the-room viewing fit for operations support center [22].

Figure 4-30: Example of open PQ Dashboard display

Root cause identification of faults As described above, the classification of power system events has been the subject of extensive research, of which both the identification of presence and of the faulty phases are the main focus. A variety of methodologies and software tools have been developed and implemented to assist operators and protection engineers to locate the fault in an accurate and fast manner. However, methods to identify the possible root cause of a fault in on-line mode have not received the same level of attention; consequently, algorithms and tools are not readily available. Adding information about the underlying cause of a fault to the fault location process can be very beneficial to expedite repairs and restore the faulted line to service, as well as optimize preparation work of crews. It also can give an operator valuable information to decide and implement control actions. Thus far, there has not been much work in identifying the underlying causes of events, but this gap in research is expected to be filled by new research. References [23] proposes a methodology to automatically identify the underlying cause of a fault in transmission lines based on analysis of fault data recorded by different IEDs, as well as other nonelectrical data. Reference [24] proposes a similar approach that uses a machine learning approach to classify faults as they occur in the system into preselected fault cause groups.

83

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

4.3.3 Real-time stability assessment Voltage stability is always an operational reliability concern for modern power systems. Voltage monitoring and control have been traditionally based on SCADA and EMS. Due to the inherent limitations of these systems and applications, such as slow data sampling rate, slow data communication rates, time-consuming computation, and model inaccuracies, a complete assessment of system voltage stability condition may take several minutes to perform. Motivated by the advantage of technology of Synchrophasor s and their wide installation in the current power systems around world, PMU-based voltage stability monitoring applications are appearing now, which can improve the power system voltage stability and security.

Figure 4-31: Example of Synchrophasor -based frequency stability monitoring

Monitoring methodologies based on the use of high-resolution PMU data are proven to be effective to track dynamic performance of a power system in real time and provide understanding of the current state of the system, including potential operating margins under varying system conditions. However, these approaches cannot assess performance under contingency conditions or for changes in operating scenarios. As previously described, for improved situational awareness, system operators need a succinct view of current operating conditions, in addition to an assessment of potential risks associated with expected changes in load, topology, or generation, as well as unexpected events in the system (faults, changes in renewables output). Therefore, it is clear that a security assessment tool based only on PMU data is not sufficient. Various so-called hybrid approaches that combine traditional simulation methods with PMU analytics have been proposed. An R&D project performed under the auspice of the U.S. Department of Energy proposes an integrated platform that combines high-performance dynamic simulation analysis tools and Synchrophasor -based stability assessment algorithms. The project integrates the results to provide real-time situational awareness, including available operating margins against major stability problems [17][18]. Figure 4-32 depicts a high-level overview of the proposed framework for real-time dynamic security assessment. Reference [18] provides the application of the framework through selected illustrative examples.

84

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-32: Framework proposed in [17][18] for real-time dynamic security assessment combining PMU data analytics and high performance dynamic simulation

4.3.4 Alarm processing and filtering Many of the conventional alarm management systems as part of SCADA/EMS systems lack the ability to analyze complex events efficiently within a time constraint. As a result, operators are overloaded with alarms in the control rooms, which they will start ignoring. This has the risk that serious alarms that are hidden in this list are overlooked. Analytics-based intelligent alarm processing can solve this problem. Examples that have appeared are an advanced alarm processor that combines alarm processing techniques at both the substation automation system and the energy management system level. In addition, fuzzy-reasoning petri-nets diagnosis models can take advantage of both expert system and fuzzy logic [37][38].

4.3.5 Renewable energy generation forecasting and storage analytics With a growing installed capacity of renewable energy plants comes a growing number of remote monitoring solutions to track the performance of these plants. Enormous amounts of data are being generated by renewable energy plants, and it is becoming ever important to create valuable insights from this data. Big data analytics performed on the data collected from these plants enables owners and O&M crews to operate the renewable plants at the plants’ maximum potential. Among all the types of big data analytics that could be performed on the plant data, predictive analytics holds the most promising of providing insights by leveraging performance data to create correlations and outcomes. Regression models can be applied to study impact on weather on electricity demand. The results can be used to forecast demand. Variables like historic electricity demands, temperatures, humidity, GDP, and population growth numbers can be taken into account.

85

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-33: Wind forecasting and optimization tools

Monitoring and optimization tools are becoming available for all types of renewable energy sources from PV modules, PV inverters, wind farms/turbines, and the whole portfolio. This gives owners and operators uniform access to, and analysis of, their operational data. It facilitates efficiency and intelligent operational decisions and maximizes availability, efficiency, production, and financial return.

4.3.6 Damage prediction (weather related or due to other causes) The impact of storms and other extreme weather events on utility services can be devastating. Hurricane Sandy is a recent example of the enormous damages that storms can inflict on electrical infrastructure (and on society and the economy). Quick response to these emergencies represents a big challenge to electric power utilities. With the advent of the Smart Grid technology, utilities are incorporating automation and sensing technologies in their grids and operation systems. This greatly increases the amount of data collected during normal and storm conditions. These data, when complemented with data from weather stations, storm forecasting systems, and online social media, can be used in analysis in order to enhance storm preparedness for utilities. For example, a flash alarm service produces alarms on lightning activity approaching the area of a utility’s interest. The alarms are categorized into levels according to lightning activity distance to area of interest. Areas of interest can be of any shape to suit different critical electrical assets, such as substations, power line corridors, and communication facilities. The alarms are also used to warn the crewmen working on the powerlines or in the substations of incoming danger of lightning strikes. An illustrative implementation of a weather forecasting system was developed in the U.S. by Ameren Missouri and Saint Louis University (SLU). The system called Quantum Weather®, which became fully operational in 2008, is a storm-prediction system designed to improve the ability of an electric utility to anticipate and respond to weather-related damage. It harnesses the data in Ameren Missouri’s service territory from more than 100 strategically located weather stations and integrates the data stations with data from other sensors and measurement devices to run a numerical weatherprediction model in a high-performance computing platform [19]. The system has proven to be effective in forewarning emergency response planners, prepare staff, and dispatch resources to where they will be needed most. Another example of damage prediction system is the model implemented by San Diego Gas & Electric (SDG&E), which forecasts when strong winds will occur and determines how severe they will be and how much risk they will pose. The main risk associated with those strong winds—the so-called Santa Anas winds—is fire. Those winds that blow from the desert to the coast are dry and hot and may convert any type of ignition source into wildfires. The prediction system, which uses data from 170 weather stations on SDG&E’s transmission and distribution systems, helps the utility to forecast where the winds would be the strongest and to subsequently warn customers and position people in the field, as well as determine the staffing levels that it would need for the duration of the winds. They also use the real-time data during the windy condition to determine which circuits had lines that were at greatest risk so that it could shut them off [20]. Damage prediction systems are not only applied to weather-related events. A tool has been developed to predict and asses the risk related to damages to cables as a result of digging by construction

86

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

companies. Digging damage comprises a large part of low-voltage and medium-voltage power failures in any distribution network, so prevention of cable digging damage is important. Based on various data sources (location, soil, cable types, subcontractor track record, etc.), a predictive model is a very useful tool to management the risk related to digging damage.

Figure 4-34: Digging damage prediction model

4.3.7 Outage restoration analytics Outage restoration analytics helps utility distribution managers, outage managers, community liaisons, and regulatory affairs managers in applying analytics to predict, prevent, detect, assess, and respond to outages. It provides real-time situational awareness of unfolding outages, operational deployments, and restoration progress during major events. Self-service means users get up-to-the-minute information without distracting outage management operators from their tasks. Outage restoration analytics tools will offer insight into historical performance, trends, and possible root causes, helping companies to proactively reduce the number of outage events. It also enables monitoring of KPIs to identify emerging issues before they become problems and simplifies reporting on industry standard indices such as system reliability, IEEE reports, outage analysis, and crew history reports. For example, if a smart meter sends a “last gasp” to say it has lost power, the utility can determine the meter-to-transformer relationship and assess the scale of the problem. Information can be passed to the control center for visual display and creation of outage alerts that can be turned automatically into work orders for field crews.

87

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 4-35: Smart meter based outage management

4.3.8 Power quality analytics (including voltage control) Good power quality and uninterrupted power are extremely important goals at all utilities. Compromised power quality can cause damage to costly electrical equipment, reduce productivity, and—if severe enough—disrupt daily operations. Variations in power quality can result from voltage spikes, swells, and sags; harmonic disturbances; and short and long interruptions of power lasting from a few milliseconds to over two seconds. And any of these events can occur at any time. Power quality data analytics is about collecting waveform-based power system data, extracting information from it, and applying the findings to solve a wide variety of power system problems in areas such as power quality, power system protection, equipment condition monitoring, and network performance enhancement. Power quality data analytics tools will combine electric power system recordings with data from SCADA, GIS, and network models to provide estimated fault location and to send an alarm to the operations personnel, reducing time to locate faults by hours.

Figure 4-36: Power quality analytics tool

88

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

4.3.9 Peak load management (via demand-side management analytics) Peak load management, also known as demand side management, is the process of balancing the supply of electricity on the network with the electrical load by adjusting or controlling the load rather than the power station output. This can be achieved by direct intervention of the utility in real time, by the use of frequency sensitive relays triggering the circuit breakers (ripple control), by time clocks, or by using special tariffs to influence consumer behavior. Peak load management enables utilities to reduce demand for electricity during peak usage times (“peak shaving”), which can, in turn, reduce costs by eliminating the need for peaking power plants. Analytics can help in identifying demand patterns and determining the factors that drive energy load.

4.3.10 Load research analytics and energy portfolio management analytics Load research enables utilities to study the ways their customers use electricity, either in total or by individual end uses. Load research analytics tools allow for aggregation of profiles created by domain analysis to get an overall load shape of the territory and to calculate cost allocator statistics such as peak demands and their dates.

Figure 4-37: Analyzing system load

4.3.11 Non-technical loss analytics Non-technical losses (NTLs) include electricity theft, faulty meters, or billing errors. It can cause significant harm to the economy. Some countries may range up to 40% of the total electricity distributed. To detect NTLs, inspections of customers are carried out based on predictions. Traditionally, these predications are based on calculations of the energy balance requiring topological information of the network. This does not always work accurately. As network topology undergoes continuous changes, analytics tools enable analysis of customer profiles, their data, and known irregular behavior in order to trigger a possible inspection of a customer.

4.3.12 Physical and cyber security assessment analytics While the electric power industry has developed mandatory reliability standards that help provide a basis for grid reliability and resilience (e.g. NERC-CIP), grid modernization is introducing new technologies that do not have well-defined standards. Advanced information and communication technologies are being developed and deployed at a rapid pace to enable new system capabilities and to support the integration of variable and distributed energy resources. Technologies and capabilities to assess the “state of security” for the grid will be needed as cyber and physical threats evolve. Cyber-physical models, analytical tools, and performance metrics can help enable this capability to increase the security posture. Moving to real-time analytics and the ability to co-simulate cyber and physical systems can help perform non-traditional contingency planning, such as managing grid impacts of interruption to heating oil and propane deliveries. While the energy sector has a well-established capability to plan for and survive physical contingencies, it should also be able to survive physical contingencies that result from cyber incidents.

89

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

4.3.13 Dynamic assessment of transmission line capacity (dynamic line rating) Transmission systems are constrained by the capacities of their transmission lines. Normally, utilities use a static rating, which may vary by season or shorter periods, as the transmission line thermal capacity limit. This rating is determined based on assumed weather conditions, which in most cases are conservative. If instead the thermal rating of a transmission line is determined based on the environmental parameters that the power line is operating in at any moment, additional capacity can be available as compared to the static capacity. This is the concept of dynamic line rating (DLR) of transmission lines. Applying DLR can enhance reliability, security, and economical operation by enabling less constrained operation and timely mitigation action to avoid dangerous system security conditions. The additional capacity allowed by DLR can be very useful to transfer sudden and short lived increases in power flows from distributed energy resources, such as power from wind farms or solar PV plants, thus improving integration or renewable generation. Dynamic ratings are often, but not always, greater than static ratings. Demonstration projects conducted in the U.S. under the auspices of the U.S. Department of Energy confirmed the presence of real-time capacity above the static rating, in most instances with up to 25% additional usable capacity made available for system operations [30]. DLR is very a well understood concept. Indeed, the science and technology of DLR has been in development and deployment for over 35 years, and today several different DLR technologies are commercially available [31]. The technology has evolved significantly since the first development in the late 1970s. A new generation of DLR provides effective solutions for the most important shortcomings of early generations, one of the most important ones being the ability to forecast line rating in various timeframes. Certainly, additional transmission capacity from existing assets can provide major benefits when such capacity is known in advance, not in real time. The demonstration projects mentioned earlier revealed opportunities to enhance future DLR deployments by ensuring the reliability of DLR data, addressing cybersecurity concerns, integrating dynamic ratings into system operations, and verifying the financial benefits of DLR systems. A wealth of reference material about DLR technology, case studies, and practical applications is available in the literature, including several Cigre reports [32] [33]. Therefore, the intention of the following subsections is not to duplicate readily available information but rather provide a brief overview of some of the existing DLR technologies, with main emphasizes on the data analytics and visualization aspects. References are provided for those seeking additional details.

4.3.13.1 SUMO system An example of such a system is SUMO, a system for dynamic assessment of powerline capacity, utilized at Slovenian TSO ELES [26].

Figure 4-38: Dynamic powerline capacity assessment

The SUMO system combines different subsystems into a meaningful and helpful power grid operating tool. It comprises the following functions (Figure 4-39):

90

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

      

Measurements: currents from SCADA, measured data from weather stations, gridded weather data applied to micro locations using weather model and terrain data. Reliability analyses: N-1 analyses, line outage distribution factors (LODF) for power flow calculations. Forecasts: short-term load flow forecasts and short-term weather forecasts for corridors of power lines. Dynamic thermal ratings (DTR): calculations based on current weather and forecasted weather (t0 ... t0+3h). Exceptional weather events. Visualization. Integration platform and data exchange: SUMO BUS.

ODIN VIS – Visualization Platform

SUMO BUS data structures keep measured and calculated data from different SUMO subsystems

ODIN server

SUMO BUS

SCADA

SUMO DB

Commercialy available DTR subsystems Weather assesment and forecast

ZM ZM DTR subsystems

ONAP

LF

LODF

DTR

OIAP

NOV

Load Flow Calculations

Forecast of Loads in Network Nodes

Physical Conductor Data and Power Line Spatial (GIS) Data, System Configuration Data

Load Flow for N-1 state

Exceptional Weather Data Notification

Figure 4-39: SUMO architecture

Dynamic thermal ratings (DTR) module The upper limit of a line capacity to transmit power flow is set by the weakest link in its path from a source to destination node. In the case of calculating DLR, the power line is split into sections. Each section has its own weather data, its own conductor physical properties, and its own geographical orientation. So, for each section, the thermal rating needs to be calculated. For the entire power line, the minimum thermal rating obtained is declared as the power line’s thermal rating. Exceptional weather events In the case of weather events, that could potentially lead to power line outages directly (thunderstorms, high wind speeds) or indirectly (high air temperatures and consecutively low ratings), the operator in charge is presented with warnings (Figure 4-40) on the live weather situation of the power grid. Through this the operator is warned of certain weather situations that can cause a line outage or impact its capacity. In case of a local thunderstorm, the operator can focus on the line in question and re-assess its outage in detail also through other tools (e.g. SCADA, other load-flow tools), thus confirming the outage’s influence on the rest of the system.

91

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Symbol

Meaning Thunderstorm – lightning activity High wind speeds (gale or storm) High air temperatures Low air temperatures Extreme rainfall

Figure 4-40: Exceptional weather events

Figure 4-41: Thunderstorm – lightning activity and rainfall event notification

Visualization The visualization provides the means to aggregate the vast amount of data in a convenient and easyto-understand manner. The results are presented in real time to dispatchers in the network control center (NCC) via advanced visualization platform ODIN-VIS (Figure 4-42).

Figure 4-42: Visualization platform ODIN-VIS screenshot

On the center of the screen, a part of the transmission grid is shown. The power lines are colored according to ratio of the actual current to the actual rating. On the right side of the screen is the SUMO panel that for each power line shows the following: 

 

“Four quadrant” view of the relative line load: o Upper left: actual line current versus actual line rating for actual network topology. o Upper right: forecasted line current versus forecasted line rating for actual network topology. o Lower left: actual line current versus actual line rating for N-1 network topology. o Lower right: forecasted line current versus forecasted line rating for N-1 network topology. Exceptional weather events. N-1 power line – the power line in the transmission grid, when tripped, that causes the largest rise of load on the power line of interest.

The quadrants are colored green if the ratio of the line current versus the rating is less than 90%. If the ratio is between 90 and 100%, the quadrants are colored orange. If the ratio is 100% or more, the quadrants are colored red and additionally show the safe remaining operating time.

92

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

4.3.13.2 Other dynamic line rating technologies Lindsey Manufacturing Co. offers a transmission line dynamic rating and forecasting system using real-time measured conductor data combined with reliability-based methods [34]. The ratings are developed by actively learning how the conductor behaves with regard to conductor temperature, weather, current, and the conductor’s exact clearance-to-ground. The system uses measurement data continuously collected by self-powered, line-mounted monitors that can measure critical data directly. This can include conductor current, conductor temperature, ground temperature, conductor vibration, and the actual conductor-to-ground distance measurement via built-in LiDAR. The latter eliminates the need for sag estimations. Genscape’s LineVision Transmission Line Monitoring and Dynamic Line Rating system uses electromagnetic field (EMF) to measure the line critical variables, including conductor clearance/sag, line loading (current), conductor temperature, thermal rating, VARs, voltage excursions, and conductor horizontal displacement (blowout). The main advantage is that the sensors are not installed in the conductor but on the ground underneath the line. Therefore, they can be easily deployed under critical lines without the need for line outages or installation crews [35]. Forecasting capabilities are not yet available with this technology. A different approach for DLR has been developed by LineAmps Systems. The system does not use direct measurements from sensor to calculate the rating of a line; rather, it uses an expert system to estimate line ampacity during steady state, dynamic state and transient conditions. It was developed by the application of artificial intelligence using object-oriented knowledge base design of the power line environment. The expert system provides hourly values of line ampacity up to seven days in advance. The main advantage of this system is that it does not require the conductor temperature sensors, meteorological sensors, or telecommunication system. It can be used for monitoring overhead line conductors, underground cables, and substation equipment for over temperatures [36].

4.3.14 Cable thermal monitoring This monitoring system shown in Figure 4-43 provides the ability to see up-to-the-minute calculations of the capacity of circuits on the grid, identification of hotspots, and an indication of the time available to resolve potential overloading. The circuit thermal monitor performs continuous calculations; should an unexpected failure occur, the system automatically updates the control room with new capacity and time estimates.)

Figure 4-43: National Grid (U.K.) cable thermal monitor

93

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

4.4 SUMMARY OF INDUSTRY SURVEY A survey was prepared and distributed among both Cigre members and non-members across a diverse geographical area with the intention to collect information about existing practices, research and development efforts, and respondent opinions on the use of innovative data-analytics methodologies and software tools for improving system operator decision support. The survey was structured in five sections. Section 1 - Basic Information: requests respondent and system information. Section 2 - Opinion on the use of data-analytics techniques for transmission operation improvement. Section 3 – Description of data-intensive applications for transmission operations. Section 4 – Description on the use of asset/equipment health and condition information in the control center to facilitate decision-making. Section 5 – Description of visualization to support system operation. Even though the questionnaire was sent out to a significantly large number of prospective responders, only a small number of responses were received. Because of that, responses to the survey do not allow for robust, generalized statistical assessment of the collected information. There is, however, valuable information that can be extracted from the responses. Because of the way the survey was set up, only Section 1 contains statistical-type information. Responses on the rest of the questionnaire are descriptive and hence not intended for comparison or for statistical analysis. In this section, we present insights gained from questions of Section 1 and a summary of responses to other sections. In Section 1, responders were asked for their opinions and visions regarding the use of data analytics for system operations support. Two questions were presented for that purpose, which are reproduced below for clarity.

Question 1: Responders were asked to indicate the extent to which they agree or disagree with each of the following statements, by using the following 5-point rating scale: Strongly Agree – Agree – Neutral – Disagree – Strongly Disagree.

Q1.1

Analytics practice in electric utilities is lagging other industries such as transportation, healthcare and financial services, in terms of actual implementation

Q1.2

Data analytics technologies that use multiple data sources can play a significant role in improving situational awareness tools

Q1.3

The value and accuracy of data analytics solutions that integrate various data sources is not well understood, and that affect implementation and adoption of the data analytics technology

Q1.4

There is a need to develop standardized data structures and data models for effective deployment of enterprise’s data analytics capability

Q1.5

My organization is planning to introduce new data-analytics techniques and tools for transmission operation improvement within the next 2 to 4 years

Q1.6

Data quality issues is a major barrier for wide spread use of Synchrophasor data to improve system operations

The responses to these questions are shown in Figure 4-44. It can be observed in this figure that for the first two statements, responses are equally divided between those who agree or strongly agree with the statement and those who don’t have a strong opinion about it (neutral). None of the responders seems to disagree. Responses for Q1.4 indicate that there is strong consensus among the responders about the need for standardized data structures and data models for effective deployment

94

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

of enterprise’s data analytics capability. Surprisingly, there is quite divided opinion about the importance of data quality for widespread use of Synchrophasor s for improving system operations.

Figure 4-44: Responses to survey – Section 1, Question 1

Question 2: Responders were asked to prioritize a provided list of data analytics use cases, by using the following scale: Very Important – Important – Neutral – Not Important – Not at all Important

Results are presented in Figure 4-45. In can be seen that most of the data analytics use cases rank high in terms of importance, system event detection being the one that seems to be most relevant for responders.

Figure 4-45: Responses to survey – Section 1, Question 2

95

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Data analytics applications In what follows, we summarize information provided by the responders related to data analytics tools implemented in their systems.

Terna SpA Italy: Terna’s generation system has installed capacity of 114 GW, with 27 GW of renewable variable generation nameplate capacity. The peak load is around 59 GW. Terna’s transmission system is interconnected with five other systems and has about 800 substations. Terna has developed in-house a system to access and analyze a wide-area monitoring database that enables users to promptly identify faults, oscillations, and other perturbations in the system. Measurement data is provided by PMUs. The system has graphic representation of status of frequency, voltage, oscillations, and other parameters. Terna has also developed and implemented in operation a real-time dynamic stability assessment (DSA) tool. It includes functions for voltage and angle stability and assessment of N-k dynamic evaluation of possible effects in case of islanding of part of the grid. Regarding equipment heath monitoring, Terna has developed an application called MBI, which is intended to inform the maintenance department about possible asset risk of failure. Within the trending and forecast category, they have developed an advance dispatching tool for forecasting load close to real time. The tool takes into account renewables but does not forecast them explicitly. It forecasts the total load and the net load (the load seen from 400- and 220-kV grid), and thus “forecasts” renewables, because all of them are connected on a 150-kV grid or below. Finally, according to the expert from Terna, the new visualization tools for control rooms should be a “smart approach” that includes representation of measurement trending, cockpit customization, and alarm customization.

Dominion Virginia Power – USA Dominion Virginia Power is an electric utility in the Eastern Interconnection of the U.S. The main characteristics of Dominion’s power system are as follows: -

System peak load: 21,651 MW Installed generation capacity: 24,300MW Voltage levels: 500, 230, 138 115, 69 Number of interconnections with other systems, control areas, regions: 30 transmission level Number of measuring points: Roughly 130k + SCADA measurements

Dominion has implemented tools for on-line dynamic security assessment, on-line voltage stability analysis, and reactive power and voltage control. These tools are used by control room operators as well as study engineers. They reside in the EMS and use field measurement and simulation based on system models. For geographical visualization of transmission assets, Dominion uses Alstom/GE EMS applications provided by the eTerra Suite of productions. The system provides tabular display of various information provided by different systems and applications. One-line diagrams of complete systems are available via static tile map-board and also digitally at an operator console. The upgrade plan includes the implementation of a digital version that will be displayed on a video wall at new control centers. As part of the visualization features, a weather map for the state of Virginia is shown on a large wall board for all operators. Regarding the availability of visual information for other parts of the business besides control, the responder indicated that system operators have access to all enterprise level information. Also, they utilize substation security cameras when needed. The responder described the following as the main challenges in visualization: acceptance of operations personnel, constraints imposed by cyber security rules, and maturity of visualization tools available through the EMS platform. Regarding data analytics, he indicated that standardized data structures and data models are only one part of the picture, and that there is a strong need for production-ready, reliable analytics.

96

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

REN - Portuguese TSO System characteristics: -

Installed Generation Capacity: as of 2015 (including wind, solar, and CHP) – 18,533 MW Installed VG Capacity (include wind and other variable generation resources): total wind, solar, and CHP - 5868 MW Voltage levels: 400, 220, and 150 kV Number of interconnections with other systems, control areas, regions: Interconnections with Spain - 9 (6 x 400 kV and 3 x 220 kV) Total line length per voltage level: 400 kV - 2632 km, 220 kV - 3611 km, 150 kV - 2562 km

REN is implementing an analytic system for fault locations in OHL and incident postmortem analysis. It is intended to be used by control room operators and operation engineers. The system uses primarily data from digital protection relays. It has been tested in the lab, and it was in prototype phase at the time when this survey was conducted. Regarding forecasting and trending tools, REN buys four wind forecasts from vendors and combines and upscale them as needed by using forecasting tools developed in-house. REN uses asset health information defined dynamic limits of operations in real time and for outage scheduling. The responder indicated that asset health that influences operations limits should be considered in system operations, at OHL, power transformers, and breakers (minimum set). He also indicated that the consequence of not considering such information is reduced efficiency, because worst-case scenarios are to be used to make deterministic decisions.

TELETRANS / Transelectrica – Romania System characteristics: -

System peak load: 9,479 MW Installed generation capacity: 24617 MW Installed VG capacity (include wind and other variable generation resources): 4331 MW Voltage levels: 400, 220 kV Number of interconnections with other systems, control areas, regions: 8 Number of measuring points: about 8,000 Number of substations: 82

TELETRANS/Transelectrica has implemented a synchronized phasor measurement system (SPMS) developed by Schweitzer Engineering Laboratories. The SPMS covers 14 s/s, 400 kV upper voltage level where the phasor measurement units (PMUs) are installed. The PMUs are connected to protection secondary cores of the CTs and VTs. Additionally, in the s/s with cross border lines there are installed local phasor data concentrators (PDCs) to archive data for longer time frames. The main features of the application interface are: 

  

Expanded power system observability. Power system online data visualization offered by the SPMS is a facility to improve power system monitoring by operators in the shift. Further, by using the on-line accurate measurements of voltage magnitude and angle, of the active power flow on certain transmission lines it is possible to estimate the steady-state stability of the power system or a section of it more accurately and allow for prevention actions. Line parameter calculations and model validation. The ability of synchronized voltages and currents gathering of two bus-bars of adjacent substations connected by a transmission line enables line parameter calculation, thus improving the power system model validation. Detection of power system oscillations. By use of the SPMS, inter-areas oscillations can be detected and properly analyzed in terms of damping capabilities. Accurate post-event analysis based on voltage, current, and active and reactive power flow data obtained from the SPMS.

The system is used by control room operators, operational engineers, and protection engineers. It was deployed in 2009. The company plans to extend the SPMS in the short term to collect data from more substations and, in the medium-term time frame, to integrate it into the EMS-SCADA system at the NDC level.

97

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Regarding visualization, the company uses classical GIS system for OHL only and schematic diagrams in the specific SCADA/EMS. They share system data and visualization with the EAS (European Answers System), which is dedicated to the interconnection security.

Tohoku Electric Power Co., Inc. – Japan System characteristics: -

System peak load: about 14,000 MW Installed generation capacity: 17,810 MW Installed VG capacity (include wind and other variable generation resources): 3,230 MW Voltage levels: max 500 kV Number of interconnections with other systems, control areas, regions: 2 Number of substations: 624

The company has implemented a tool for assessing and monitoring reliability in on-line mode. It has several functions, the dynamic stability assessment module being the most important, which runs every 30 minutes. For trending and forecasting analysis, they have developed and implemented a system called Photovoltaics Output Estimating and Forecasting System, which is intended for control room operators and operation engineers. It estimates the amount of solar radiation from numerical weather forecasts and calculates photovoltaics output based on that. For visualization in the control room, measurement data, such as transmission line power flow, generator output, and bus voltages, are displayed on the system diagram every 10 seconds. Results from security analysis tools are also displayed in the monitor screen using a color code to highlight if a reliability violation occurs. Information about weather conditions and forecasts are also displayed. Relative to distributed generation, total current wind power output and photovoltaics output in the entire area are displayed on a dedicated screen, along with generation forecasts for a short- and medium-term period selected by the user. For the question about the biggest challenges and needs in visualization, the responder indicated that it is critical to improve accuracy of renewable energy and output prediction and use those results to estimate voltage variations and display them in a monitor screen alongside critical visualizations.

Kyushu Electric Power Company – Japan System characteristics: -

System peak load: about 15,082 MW Installed generation capacity: 28,036 MW Installed VG capacity (include wind and other variable generation resources): 5,793 MW Voltage levels: 500, 220, 110, 66, 22, 6.6kV Number of interconnections with other systems, control areas, regions: 1 Number of substations: 596

For wide-area monitoring, Kyushu Electric has implemented a system to evaluate current state of the power system as well as expected conditions in the near future (30 to 60 minutes). The application assesses power system reliability in terms steady and dynamic security, including frequency variations, overload, unscheduled power flows, voltage deviations, and dynamic and voltage stability. A separate tool is used for voltage and reactive power control. The tool determines optimal control actions in response to predicted demand and system operating conditions, with the objective to prevent voltage at critical buses to deviate from operating margins. The company has a renewable energy forecast system that forecasts output of solar photovoltaic every 30 minutes. The system uses radiation forecast data purchased from a weather information provider. Another application is used for short-term demand forecasts. Future demand is estimated based on accumulated historical demand, historical weather data, and weather forecast data purchased also from a weather service company. Visualizations to support system operation include:

98

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Weather and weather forecast: Weather information (cloud radar images, forecast information, etc.) and images from weather video cameras are displayed on a system panel. Temperature, other weather information, and lightning strike status are displayed on the operator monitors.

Distributed generation: Battery charge/discharge status and area renewable output and status are shown graphically on a system panel and operator monitors.

Anticipated state of the network: Results from power system reliability assessment tools are displayed on operator monitors.

Other types of visualizations specific to that control center: Dam level information is displayed on

operator monitors, and earthquake occurrence information is presented on the overall system panel. Visual information for other sectors of the company: Some power system information and demand/supply information are made available throughout the company. In addition, power generation forecast and lightning strike information are available in the company website. In accordance with other responses, the responder indicated that the biggest challenge in visualization, and other related analytics tools, is the needed for better renewable energy power output forecast.

National Grid - UK System characteristics: -

System peak load: about 60,000 MW Installed generation capacity: 120,000 MW Installed VG capacity (include wind and other variable generation resources): 20,000 MW Voltage levels: 400/275 kV Number of interconnections with other systems, control areas, regions: 4 Number of substations: 340 Number of measurement points: 1,000,000

National Grid has a tool called VISOR for wide-area monitoring developed by Psymetric and the University of Manchester. The tool performs real-time monitoring and alarming of sub-synchronous oscillation. Related to equipment health monitoring in the control room, they have developed and implemented a tool for monitoring underground cables, which uses data from temperature sensors around cables to evaluate cable conditions in terms of loading and capacity. For trending and forecast analysis, the company has developed an energy forecast system to predict national demand based on weather forecast data plus the contribution of both solar and wind generation to the generation mix. Visualization to support system operation includes:

Weather and weather forecast: TBD Distributed generation: Battery charge/discharge status and area renewable output and status are shown graphically on a system panel and operator monitors.

Anticipated state of the network: Results from power system reliability assessment tools are displayed on operator monitors.

Other types of visualizations specific to that control center: Dam level information is displayed on

operator monitors, and earthquake occurrence information is presented on the overall system panel. Visual information for other sectors of the company: Some power system information and demand/supply information are made available throughout the company. In addition, power generation forecast and lightning strike information are available in the company website.

4.5 REFERENCES [1]. Technology Assessment of Power System Visualization. EPRI, Palo Alto, CA: 2009. 1017795.

99

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

[2]. T. J. Overbye, D. A. Wiegmann, A. M. Rich, and Y. Sun, “Human factors aspects of power system voltage contour visualizations,” IEEE Transactions on Power Systems, pp. 76-82, February 2003 [3]. T. J Overbye, D. Wiegmann and R. J. Thomas, “Visualization of Power Systems”, PSERC Final Project Report, Publication 02-36, Nov. 2002. [4]. D. A. Wiegmann, G. R. Essenberg, T. J. Overbye, Y. Sun, “Human Factor Aspects of Power System Flow Animation,” IEEE Trans. on Power Systems, vol. 20, August 2005, pp. 1233-1240. [5]. ORNL VERDE: Visualizing Energy Resources Dynamically on Earth http://techportal.eere.energy.gov/technology.do/techID=17 [6]. Woody Rickerson, “A Control Room View of the ERCOT Grid”, ERCOT Public April 19, 2016 http://www.ercot.com/content/wcm/key_documents_lists/81724/5_A_Control_Room_View_of_the _ERCOT_Grid.pdf [7]. https://www.coreso.eu/mission/ [8]. http://www.tscnet.eu/ [9]. Advanced Data Analytics Techniques: Analysis and Applications for Power System Operation and Planning Support. EPRI, Palo Alto, CA: 2015. 3002007076. [10]. V. Malathi, N. S. Marimuthu, S. Baskar, and K. Ramar, “Application of extreme learning machine for series compensated transmission line protection,” Engineering Applications of Artificial Intelligence, vol. 24, no. 5, pp. 880-887, 2011. [11]. T. S. Sidhu, H. Singh, and M. S. Sachdev, “Design, implementation and testing of an artificial neural network based fault direction discriminator for protecting transmission lines,” IEEE Trans. Power Delivery, vol. 10, no. 2, pp. 697-706, 1995. [12].

Review of Synchrophasor Applications, EPRI, Palo Alto, CA: 2014. 3002002870

[13]. Alison Silverstein, Kyle Thomas, and Jim Kleitsch, “Using Synchrophasor Data to Diagnose Equipment Mis-operations and Health”, NASPI Work Group Meeting October 22, 2014 [14]. U.S. Department of Energy, “Advancement of Synchrophasor Technology in ARRA Projects”, March 2016 –https://www.smartgrid.gov/files/20160320_Synchrophasor _Report.pdf [15]. THE VALUE PROPOSITION FOR SYNCHROPHASOR TECHNOLOGY, North American Synchrophasor Initiative NASPI Technical Report, October 2015. https://www.naspi.org/sites/default/files/reference_documents/5.pdf?fileID=1571 [16]. Catalog of Data-Intensive Applications for Transmission Systems. EPRI, Palo Alto, CA: 2015. 3002005231. [17]. High-Performance Hybrid Simulation/Measurement-Based Tools for Proactive Operator Decision-Support – U.S. Department of Energy (DOE) DE-OE0000628 - Final Report, September 2014. [18]. E. Farantatos, A. Del Rosso, N. Bhatt, K. Sun, Y. Liu, L. Min, C. Jing, J. Ning, M. Parashar, “A Hybrid Framework for Online Dynamic Security Assessment Combining High Performance Computing and Synchrophasor Measurements”, 2015 IEEE PES General Meeting [19]. Case Study: Demonstration of the Quantum Weather Storm-Prediction Model and Application, EPRI, Palo Alto, CA: 2016. 3002004268 [20]. Situational Awareness – Opportunities for the Electric Power Industry, EPRI, Palo Alto, CA: 2016. 3002007606 [21]. Djurica, V., Milev, G., An Application to Display Lightning Data Using SCALAR Information System,(2014), 23rd International Lightning Detection Conference, Tucson, USA [22].

https://www.gridprotectionalliance.org/products.asp#XDA

[23]. U. Minnaar, “The Characterisation and Automatic Classification of Transmission Line Faults”, Ph.D. Thesis, University of Cape Town, September 2013.

100

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

[24]. Mahfuz Ali Shuvra and Alberto Del Rosso, “Root Cause Identification of Power System Faults using Waveform Analytics”, accepted for the CIGRE US National Committee 2017 Grid of the Future Symposium [25]. Lakota, G., et al., “Real-Time and Short-Term Forecast Assessment Of Power Grid Operating Limits – SUMO”, 5th International Scientific and Technical Conference - CIGRE B5, Sochi, Russia, 2015. [26]. Djurica, V., dr. Kosmač, J., Milev, G., “A Multiple Power Line Corridor and Lightning ErrorEllipse Spatial Processor for Real-Time Correlator”, (2008) 20th International Lightning Detection Conference, Tucson, USA. [27]. CIRED Paper / 0406 / June 2015:2D AND 3D VISUALIZATION STRATEGIES FOR DISTRIBUTION MANAGEMENT, Siemens AG: Sonja Sander / Siemens AG: Dr. Roland Eichler [28].

User Interface: Spectrum PowerTM 7 / Siemens AG

[29].

User Interface: SIGUARD PDP / Siemens AG

[30]. U.S. Department of Energy – Electricity Delivery & Energy Reliability, “Dynamic Line Rating Systems for Transmission Lines”, Topical Report, Smart Grid Demonstration Program, April 25, 2014 (available online at www.smartgrid.gov) [31]. Integrating Dynamic Thermal Circuit Rating into System Operations: Utility Experiences and Technology Roadmap. EPRI, Palo Alto, CA: 2011. 1021751. [32]. Increased Power Flow: Overhead Transmission Line Rating Research Advancements. EPRI, Palo Alto, CA: 2015. 3002005709. [33]. Cigre Working Group B2.36 Technical Brochure, “Guide for Application of Direct Real-Time Monitoring Systems”, June 2012. [34].

http://lindsey-usa.com/dynamic-line-rating/

[35].

http://info.genscape.com/physical-grid-monitoring

[36].

http://www.lineamps.net/about.shtml

[37]. Alarm Grouping and Event Root Cause Analysis for Transmission Control Centers. EPRI, Palo Alto, CA: 2016. 3002008275. [38]. Alarm Management Philosophy for Transmission Operations Control Centers, EPRI, Palo Alto, CA: 2016. 3002008274. [39].

DNV GL Smart Cable Guard www.dnvgl.com.

101

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

5. DATA INTEGRATION AND MODELING Success of advanced data management and analytics greatly relies on the accessibility, flexibility, scalability, comprehensiveness, and efficiency of data modeling for system operations as in the old saying: Data is only as good as the way it is packaged. This section examines typical data modeling processes in some utility companies and regional transmissions organizations to explain how data are assembled in the power industry for secure and reliable real-time grid operations in EMS, including introduction of the model information and its usage and model update procedure and lifecycle. In order to exchange the operational data between control centers and throughout the industry, common data exchange format and protocol need to be in place. The most important international industry standards, such as IEC 61850, CIM (Common Information Model), and COSEM, are presented in the second part of this section. The intentions of the standards are briefly discussed, followed by specific introduction of each standard and the harmonization efforts to unify them. With the information and technology explosion in the modern era, the power industry is also being profoundly affected, and the utility companies have been making substantial efforts to adapt. This section touches upon the impact of new technologies and new data sources on the data modeling approaches in this changing technological and regulatory macroscopic environment. The new technologies and new data sources include Synchrophasor , renewable energy, and equipment health condition monitoring on operations data, and their impacts on operations data modeling will be individually explored. Based on the extensive analysis of the real-time operational data modeling, an example of an actual data integration project is presented.

5.1 DATA MODELING PROCESSES FOR SYSTEM OPERATIONS In order to ensure reliable and economic operation of the electric transmission network, the real-time monitoring and control system and energy management system (EMS) have to establish operation schedule and remote control of voltage, power flow, and power system equipment. EMS provides the information about the past states of the network, and it is capable to export snapshots of its network data models. Thus, accurate and up-to-date models used by EMS are very important for a reliable system operation. The following two aspects of data modeling processes for system operations are discussed in details: 

Model data information and its usage



Model update procedure and lifecycle

Depending on the interconnection of the regional transmission operator (RTO) or the transmission system operator (TSO), the EMS models might be described in different layers such as the proprietary detailed footprint and less detailed neighboring RTOs or TSOs. Modeling a large system and keeping the model up-to-date could be very complex, therefore requiring cooperation between control centers and member companies, as well as neighboring utilities, RTOs, TSOs, etc.

5.1.1 Model information and its usage The operating entities are required to create and maintain an accurate model of their electric systems. The computer representation of the power system facilities model requires the input data from various sources such as generator owner (GO), transmission owner (TO), load serving entities, and other reliability coordinators need to be timely and accurate, because it may impact the reliable operation of the system. In general, telemetry data are required to include, but not limited to, the following: 

Voltages at interest location above certain level (e.g. 69 kV).



MW and MVAR values for all generating units, transmission facilities, and injections at certain voltage level and above have greater than certain level of power flow (e.g. 1 MW).



MVAR values for synchronous condensers and static VAR compensators.



Transformer phase angle regulator (PAR) and load tap changer (LTC or TCUL) tap positions for modeled and controlled transformers.



Circuit breaker status for each modeled facility at certain voltage level and above (e.g. 69 kV).



Frequencies at selected stations.

103

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

TOs, GOs, and other electricity utility companies are responsible for providing the information and data for an accurate modeling of their electrical system. Usually, the data and information need to include, but not limited to, the following (PJM Operation Support Division, August 25, 2016): 

Substation topology (including generator substations), facility connectivity, and physical location upon request (state and global positioning satellites (GPS) coordinates) 




Equipment names or designations 




Facility physical characteristics including impedances, transformer taps, transformer tap range, transformer nominal voltages, etc.



Facility limits and ratings



Voltage control information and recommended set-points



Recommended contingencies to be studied



Protective device clearing times, as appropriate, to support real-time transient stability analysis



Buses, breakers, switches, and injections or shunts such as loads, capacitors, SVCs, etc.



Lines and series devices (reactors or series capacitors) 




Transformers and phase shifters



Generator auxiliary, station service, or common service loads (MW & MVAR) 




Generator step-ups to be modeled for Bulk Electric System (BES) generators 




Generator “D” curve limits 




Real-time analog and equipment status telemetry for transmission elements, including, but not limited to: o

o o

Breaker, switch, or other equipment status required to determine connectivity to real (MW) and reactive (MVAR) power flow for lines, transformers (high or low-side), and phase shifters Real (MW) and reactive (MVAR) for loads and/or other injections as appropriate Reactive (MVAR) power flow for capacitors and SVCs

Figure 5-1 shows typical data involved in EMS modeling for power system operations. The EMS modeling includes telemetry data, connectivity data, and electrical parameter data. Note that the basic connectivity information is necessary to include external system models. In order to collect the EMS data, communication, construction design, transmission and planning, operations modeling engineers, and RTO need to be involved in this cooperative process as the figure demonstrates.

104

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-1: Dominion Virginia Power EMS modeling data

In this example diagram, RTO receives the input data. Based on the input data and real-time telemetry data, the real-time model (EMS), steady-state model, and real-time transient stability model are created and maintained. The EMS model together with real-time locational marginal price (LMP) and security constrained economic dispatch (SCED) provide secure and economic operating points. By using a state estimator (SE), the EMS model calculates the real-time state of the electric system. The established system operating limits could be assessed by EMS model as well. The steady-state model is usually maintained in seasonal builds; thus, the updated line impedances and connectivity information are very necessary. The steady-state state estimation requires voltage telemetry information, branch flow, and breaker status. The real-time transient stability analysis (TSA) is an optional tool for some control centers. TSA depends on the EMS data and model, and it also depends on the SE solution as the initial condition to perform transient stability analysis. It should be made clear that Figure 5-1 is an example to illustrate the common utility EMS data modeling processes, and that there may be reasonable deviations from the processes shown in the figure in a utility company to serve a specific company organizational structure or due to historical maintenance practices. There are many other types of data and information that could be utilized to improve the accuracy of the EMS model. For instance, the dynamic line rating (DLR) for EMS could maximize the use of the transmission system while ensuring reliable and efficient market operations. An accurate real-time ampacity monitoring system will allow the operators to exploit the full capabilities of existing lines. CIGRE/IEEE dynamic thermal model (IEEE Standard for Calculating the Current-Temperature Relationship of Bare Overhead Conductors, 2013) requires following weather-related data for ampacity and sag calculation: 

Ambient temperature



Solar intensity



Wind speed



Wind direction



Rain rate

Cybersecurity concerns, integration of DLR into system operations, and verification of the financial benefits [3] are very important issues to address before applying DLR in EMS. DLR could provide a better knowledge of the actual line rating that that is provided by static ratings. PMU’s will be playing a more important role in the next-generation EMS [4]. Some of the PMU data are used for model validation on generator model and line impedance. Other advanced PMU applications

105

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

[5] include angular separation, dynamics oscillations monitoring, disturbance location identification, and islanding and resynchronization.

5.1.2 Model update procedure and lifecycle This EMS model requires significant coordination between system operators and stakeholders. Summer and winter builds are two updates commonly known as the regularly-scheduled builds in North America, and the other regions of the world may have similar model update processes. A new build usually includes two essential types of changes: 

Topology changes



Parameter changes

TSOs and power utilities are responsible for providing data about all construction projects that will impact the RTO model. They are typically required to notify the RTO from six months to one year in advance of system topology changes. The EMS network model updates accordingly a few times (e.g. four times each year) to reflect the topology changes. Thus, to ensure that the EMS update includes a facility addition, revision, or deletion, all model information must be submitted to the RTO or the TSO accurately and timely. An example of an EMS model build lifecycle is shown in Figure 5-2.

Jun-Sept 25% Jan-Jun 38%

Dec 6%

Jun-Sept

Jul-Sept

Jul-Sept 19% Oct-Nov 12% Oct-Nov

Dec

Jan-Jun

Figure 5-2: EMS winter build lifecycle



June–September, TOs are required to submit data.



July–Sept, RTOs package the submitted data.



October–November, RTOs test new model.



December, RTOs implement the model build.



December–January, TOs check implementation changes.



January–June, cut-ins, outage request, telemetry, and ratings are required in model build.

The summer build has a similar lifecycle but shifts six months starting in December. Interim builds between summer and winter are often implemented if the topology and parameter changes greatly required in the system [1]. Figure 5-3 shows a common utility EMS modeling update process diagram. It shows a typical lifecycle in details of an EMS modeling update. The TSO or power utility not only has the obligation to fulfill the modeling date update deadlines that the RTO/TSO requires, but it also needs to cooperate the conduction of its own energization target three to six months prior to energization target date, with the update request.

106

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-3: Common utility EMS modeling update process

The TSO/electrical utility company construction has the network change information updated, and RTO/TSO tests and implements the updated data model for its EMS. Typically one to two weeks prior to the energization target data, the TSO or power utility will check the implementation with the outage scheduling.

5.2 DATA MODELS AND OPEN STANDARDS 5.2.1 Why do we need a common data model? Evolutions of electrical grids induced by smart grids accelerate changes in transmission and distribution. Data exchanges are increasing, market deregulation has led to a proliferation of actors, and applications become more complex. These are all reasons why actors of energy markets decide to use international standards. Due to the growing need of smart grid stakeholders to deploy solutions offering a semantic level of interoperability, data modeling appears to be the key element and the foundation of the smart grid framework. Furthermore, data modeling seems much more stable than communication technologies, which makes this foundation even more important.

5.2.2 IEC standardized data models Currently, the IEC framework relies on three main standards for the field of data modeling, represented in Figure 5-4.

107

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-4: Data modeling on smart grid architecture model framework

  

The CIM (IEC 61 970, IEC 61 968, IEC 62 325) provides the information model containing equipment and functions and their properties for power system management, analysis, and related use cases (generation, market, and grid). IEC 61850 provides the information model containing equipment and functions and their properties for power utility automation use cases. The COSEM Companion Specification for Energy Metering provides the information model containing equipment and functions and their properties for metering and related use cases.

Figure 5-4 also shows the three ongoing harmonization efforts in progress (i.e. the definition of unified shared semantic sub-areas or formal transformation rules), which are needed to allow an easy bridging of these semantic domains:   

Harmonization between CIM and IEC 61 850, mostly to seamlessly connect the field to operation and enterprise level (cf. § 5.2.5). Harmonization between CIM and COSEM, mostly to seamlessly interconnect electricity supply and grid operation. Harmonization between COSEM and IEC 61 850, where smart metering may co-habit with power utility automation systems.

The following subsections discuss the three main standardized data models in more detail.

5.2.2.1 CIM The Common Information Model (CIM), developed by the IEC (International Electrotechnical Commission), is an abstract information model that can be used to model an electrical grid and the variety of equipment used on the grid. By using a common model, utilities and vendors can reduce their integration costs, which should allow more resources to be applied toward increased functionality for managing and optimizing the electrical system [6]. The model covers all the necessary data in the study and operation of electrical systems, including market transactions between companies or between producers and consumers. Operations include the grid control, its constitution, its maintenance, its evolution, and for all the business of the energy sector from production to distribution, and consumer to marketing. Standard series arises from this work: 

IEC 61 968 series, which defines the interfaces for the main elements of a distribution management system (DMS). A DMS consists of various distributed application components for the utility to manage electrical distribution networks. These capabilities include monitoring and control of equipment for power delivery, management processes to ensure system reliability, voltage management, demand-side management, outage management, work management, automated mapping, and facilities management. All these applications put

108

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

together constitute the Interface Reference Model (IRM). Communication between application components of the IRM requires compatibility on two levels: o Message formats and protocols. o 



Message contents must be mutually understood, including application-level issues of message layout and semantics.

IEC 61970 series, which provides among others a set of general guidelines and infrastructure capacity lines necessary for the implementation of EMS-API interface standards (Energy Management System - Application Program Interface). ENTSO-E in Europe, for example, uses the Common Grid Model Exchange Standard (CGMES), which is a superset of the IEC CIM standard. It was developed to meet necessary requirements for TSO data exchanges in the areas of system development and system operation. IEC 62325 series, which specifies the CIM for communications for deregulated energy markets. The IEC developed these standards as a framework for energy market communications encompassing two market styles: European style and North American style markets.

The foundation of the IEC 62325 series is: 

IEC 62325-301 “CIM extensions for markets” standard, which is an abstract model that caters for the introduction of the objects required for the operation of electricity markets.



IEC 62325-450 “Profile and context modeling rules,” the international standard for the generation of profiles.

For each standard, there are degrees of freedom that must be defined. The CIM standard must be adapted within the energy companies according to their needs. For example, the energy company EDF described the M-SITE model. This is a UML model derived from the CIM model for network domain requirements. It defines CIM UML classes as well as specific M-SITE additions to describe networks and extensions used to support a number of study functions. It is the reference (data dictionary) for defining classes, associations, and UML attributes used to construct exchange interfaces based on the MSITE model. Industrials must appropriate and adapt the standards to his needs while respecting the basic rules in order to remain CIM compliant.

5.2.2.2 IEC 61850 IEC 61850 is a standard established by the TC 57 of the IEC. This standard defines common communication architecture for systems inside the substation (process level, cubical level, and station level). Historically, IEC 61850 is based on IEC 60870 and IEEE UCA. The information exchange mechanisms rely primarily on well-defined information models. These information models and the modeling methods are at the core of the IEC 61 850 series. The IEC 61850 series uses the approach to model the common information found in real devices as depicted in Figure 5-5 [7]. All information made available to be exchanged with other devices is defined in the standard. The model provides for systems for power utility automation an image of the analog world (power system processes, switchgear).

109

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-5: IEC 61 850 modeling approach

Implementations to reach interoperability have to be based on a common understanding of definitions. The approach of the standard is to decompose the application functions into the smallest entities, which are used to exchange information. This is described in the IEC 61850-5. The granularity is given by a reasonable distributed allocation of these entities to dedicated IEDs. These entities are called logical nodes (LN). A logical node corresponds to a functionality of the electrical system (for example overcurrent protection). IEC 61850-7-4 describes the structure of 128 logical nodes, ranking them into 19 groups, such as a virtual representation of a circuit breaker class, with the standardized class name XCBR. The logical nodes are modeled and defined from the conceptual application view in IEC 61850-5. In the standard, implementation of the logical nodes in the substation structure is not mandated. It depends more on the implementation of the features in the substation. Logical nodes are themselves composed of dataObject; some are mandatory. DataObjects have a particular type. The different types of the standard are described in IEC 61850-7-3. This type decomposes dataObejcts in dataAttributes. This is the lowest level description of the standard.

5.2.2.3 COSEM DLMS/COSEM is standardized internationally via the IEC and CENELEC technical committees TC13. IEC TC13’s working group 14 on meter data exchange defines the standards issued under the IEC 62 056x series: standardization framework (which requires profiles to be created based on existing lower layers, mainly from ITU, IEC, and IETF), OBIS codes, interface classes, COSEM application layer, and other lower layers (see Figure 5-6 [8]).

110

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-6: Sources and actors

DLMS/COSEM is the protocol of choice for communication between multi-energy smart meters, gateways, and backhaul systems and the guarantee of interoperable systems. The DLMS User Association is responsible for editing the DLMS/COSEM specifications and the transmission via a D liaison to the standardization organizations (IEC, CENELEC), edition of the Conformity Test Tool (CTT), control of product conformity, member support, training, and promotion. The association currently has more than 300 members from more than 50 countries, and over 700 products from more than 100 manufacturers have been certified to date. COSEM uses an object modelling technique to represent all functions of the meter, without making any assumptions about which functions need to be supported, how those functions are implemented, and how the data are transported. DLMS/COSEM has been designed for separation between the COSEM object model and the DLMS communication protocol. The formal specification of COSEM interface classes forms a major part of COSEM. The COSEM object model represents the product’s interface description. The definition of OBIS, the Object Identification System, is another essential part of COSEM that organizes data and methods according to the object model. The communication protocol allows transmission of coded messages between a server (the product) and a client (a gateway or more generally a remote IT system) through a scope of services. The various physical media transport layers are out of scope of the specification but are supported via a set of profiles. The standardized COSEM interface classes form an extensible library. Manufacturers use elements of this library to design their products. Objects are made up of attributes and methods. Similar objects are grouped into interface classes. Some major categories are storage, access control and management, time and event management, prepayment, and communication configuration. The COSEM data model is independent of the underlying media communications layer, and all profiles use the DLMS/COSEM application layer. The connection manager is independent of the media, such as TCP or CIASE S-FSK. An adaptation or convergence layer is introduced whenever required. Profiles are developed according to market requirements, such as CENELEC A band OFDM G3-PLC and S-FSK PLC. The security approach is end-to-end multilevel, protecting both COSEM payload and xDLMS messaging. Authentication is supported on both server and client side with configurable security policy, as well as a fine-grained access control depending on the client role. Specific alarms and alerts for security are also supported. DLMS/COSEM specifies high-level security algorithms based on NSA Suite B, NIST, and FIPS, with cryptography based on Diffie-Hellman elliptic curves (ECDSA/ECDH) and variable symmetric or asymmetric key sizes. Finally, optional compression can be implemented according to ITU V.44.

111

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

5.2.3 Example of harmonization between CIM and IEC 61850 Every interface between systems that are not covered by the same standard requires a mapping or transformation from one standard’s format to another in addition to the mapping from proprietary formats to standard formats when a system interface does not support any standard at all. This verdict has prompted the development of a common semantic model for the IEC 61968 standards (CIM) and the IEC 61850 substation automation standards. Goals of such work are to: o

Enable the entry and update of substation configuration data once.

o

Enable access to real-time data from IEC 61 850 devices to directly feed SCADA and back office systems on the CIM standards.

Without the harmonization of these standards, the development and implementation of systems and applications will result in a significant amount of engineering and design that applies to only one implementation. The harmonization can be done by mixing equipment topological approach of CIM and functionality approach of substation configuration description language (SCL). SCL is the language and representation format specified by IEC 61850 for the configuration of electrical substation devices. IEC TC57 is involved in the harmonization CIM & SCL.

5.3 IMPACT OF NEW TECHNOLOGIES AND NEW DATA SOURCES ON DATA MODELING In the past few decades, innovative technologies have been burgeoning on an unprecedented scale thanks to both the pull from fast-growing electricity demand, especially the demand for green energy, and the push on the other side from the quickened advancements in supporting industries such as IT, telecommunication, data processing, and so on. Among the major technology advancements in the power industry, renewable (solar, energy storage), Synchrophasor , and equipment health condition monitoring are the ones that are changing the paradigm of real-time grid operations and are requiring significant process improvements in the operations data modeling.

5.3.1 Impact of Synchrophasors on operations data modeling PMU is widely accepted as the most important measuring device in the future of the power system that will revolutionize the way power systems are monitored and controlled, and it is anticipated that the migration towards full PMU implementation for power systems is underway with accelerated momentum in almost all major utilities and RTOs. With the exponential growth of Synchrophasor data in control centers of a wide range of utility companies, there is an ever-increasing emphasis to successfully integrate existing model-based EMS applications with the PMU measurement-based applications. This is necessary to gain EMS operators’ confidence, who will continue to use existing systems and procedures they are accustomed to while embracing these new measurement-based tools and techniques. For example, when grid oscillations or sudden rate of change events are detected by the fast measurement-based applications, these notifications are presented in the traditional EMS alarms display and will also trigger traditional EMS tasks to allow the operator to drill down using EMS displays to discover specific details of these events, as well as launching “what if” analyses to determine the severity of the event. One of the “what-if” analyses is to study potential contingencies and to simulate transmission stress to determine the most limiting operational limit (OL) for a particular transmission corridor. This OL can then be used in the faster measurement-based analytic to quickly alert the operator when an OL is being reached [9]. The growth of new Synchrophasor applications for reliable and secure operations of power systems is already making progress in the electric utility industry. A good number of new Synchrophasor applications for reliable and secure operations of power systems have moved beyond the conceptual and development stages to the PoC stage. A few such new applications include: 

Situational awareness, visualization, and alarming



Abnormal angles alarm



Dynamics oscillations (small signal oscillation) monitoring



Line overloads monitoring with considerations for per phase analysis



Abnormal voltages alarm

112

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Enhancements of alarming, cognitive task analysis



Voltage stability indicators



Enhanced EMS state estimation (SE)



Down-sampling the Synchrophasor measurements and adding the data stream to existing SE measurements



Observing dynamic state changes of the grid during disturbances



Linear state estimation



Islanding, resynchronization, and blackstart (IRB) simulator



High-resolution data driven equipment condition/health assessment



Short-term equipment failure precursor



Long-term equipment cumulative age calculation



On-line apparatus electric parameter validation and PT/CT calibration



GIS integrated enhanced fault location

The new application runs using PMU data from a phasor data concentrator (PDC) and can conceivably run for every PMU sample set (i.e. 30 to 120 times/second). If this complete data is not available in the EMS, which is likely because down-sampling to 1 sample/second is typically used, the natural choice for the application is to reside on the same system as PDC. In such scenario, there may be a need for some EMS information to be periodically available to the application and the results of the application transferred to EMS. This will necessitate a reliable, secure data transfer mechanism, hopefully using a web service [10]. With the integration of fast Synchrophasor measurements (at rates of 50 to 60 measurements per second) into the control center data model, the EMS now has real-time visibility of the dynamics of the power system. This complements the visibility of the steady-state behavior of the grid with traditional SCADA measurements. Many of the new Synchrophasor analytics complement and corroborate traditional EMS analytics and can therefore be used together to jointly validate and fine-tune the analytics for improved precision and accuracy. For example, the oscillation monitoring analytic using a network dynamic model can be “married” with its counterpart measurement-based analytic to compare results and to gradually improve the network dynamic model parameters.

5.3.2 Impact of renewable energy on operations data modeling In many parts of the world, the government-mandated Renewable Portfolio Standards (RPS) requires electricity suppliers to obtain a minimum percentage of their power from renewable energy resources by a certain date in response to the recent emphasis on environmental issues and concerns for global warming. There have been a wide variety of financial incentives that are being put in place by governments around the globe to the boost economy and employment and to mitigate the impacts of the looming climate crisis. These incentives are expected to spur investments and growth in wind and solar industries. All those factors are causing wind and solar energy to expand at an ever-quickening pace, leading to high levels of penetration in a relatively short time. Utilities and power system operators must prepare to integrate and manage more of these variable renewable electricity sources on a much larger scale [11]. Apart from the many benefits owing to the ever-increasing amount of variable resources, most renewable resources required in the RPS are variable resources characterized by their high level of variability and uncertainty, and the variability with these resources remains a major concern for utilities in terms of grid operations. First of all, the task of controlling the power system and balancing supply and demand becomes more of a challenge for the grid operators. In addition to the inherent variability and unpredictability associated with these resources, the fast ramping associated with wind and solar photovoltaic resources will further challenge the utility companies. The task of balancing and controlling the power system is further complicated by the fact that, in current practice, in most balancing areas, renewable resources are treated as “must take” resources, requiring the grid operators to look for additional fast responding resources to compensate for the

113

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

variability, uncertainty, and the fast ramping of variable resources. In order to accommodate the increasing penetration levels of variable resources, balancing areas will need to adopt strategies and implement new tools to provide better visibility into variable resource operations, to better forecast their expected generation levels on a short-term basis, and to dispatch and control these resources. The operator of these resources, on the other hand, will require tools with adequate datasets and advanced data models to interface with the balancing area operators, and to facilitate and automate the participation of variable resources in various energy and ancillary services markets. Integrating data from large utility-scale variable generation presents unique challenges. These challenges call into question the long-standing set of assumptions that determined how utilities operated the power systems for decades. Power systems are designed to handle significant amounts of load variations and other uncertainties. Thus, managing risks is not new for grid operators. The expected increase in wind and solar generation, however, introduces new operational paradigms: how to ensure system controllability and observability and how to manage new kinds of variability and uncertainty. Operational integration deals with how operating characteristics of wind and solar plants are combined with existing operating policies (e.g. system balancing, ancillary services, ramping resources up/down) and decision-support tools deployed to support the utility control-room operators who run the power grid. Operating policies include different heuristics that are used to ensure balance between load and generation. With increased variable generation, policies on how this balance is maintained can be expected to change. Wind and solar energy generation are intermittent resources and, as such, can make it difficult to operate the power grids to which they are connected. The primary requirement for integrating these variable generations with utility operations is having access to forecast information about the quantity and availability of the power output from wind or solar plants. Thus, reliable forecasting systems are necessary to achieving increased wind and solar energy penetration. The use of forecasting in control rooms is the key to managing variability and reducing uncertainty, operational impacts, and costs. Forecasting allows operators to anticipate generation levels from wind and solar plants and adjust the remaining generation units accordingly. Accurate short-term wind production forecasts enable grid operators to make better day-ahead operational decisions, including scheduling the mix of generation resources to be dispatched. What constitutes a challenge is how to integrate wind forecast data with existing tools used in control centers. The goal of the data modeling and integration must be to enhance operators’ local and global situational awareness in light of increased variability and uncertainty. Toward this end, existing EMS, GMS (generations management system) and MMS (market management system) applications must be enhanced by incorporating wind forecast information and by making changes to different applications such as unit commitment, automatic generation control, and special protection schemes. Below is the high-level information flow and datasets of the variable renewable energy integration for grid operations.

114

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Generation/Load Balance

State Estimation

Look Ahead Analysis

Parameter Configuration

Applications Interface Wind Areas

Turbine Types

Telemetry

Facility Owner

S.E. Setup

Enterprise Data Bus

Generation Schedules (Conv, Wind, Solar, etc.)

Wind/Solar Forecast

Day Ahead Congestion Forecast

Figure 5-7: RES data integration/modeling diagram

Wind and solar forecasts are developed using weather models that contain abundant geographically distributed data. This data provides information needed to support decisions, as well as input for other forecasting tools such as load and transmission line thermal limit forecasting. Thus, the way this data is presented to operators will affect stability assessment (SA). Advanced forecasting systems can be used to develop early-warning systems that alert grid operators of the likelihood of extreme weather events so that the operator can take necessary actions. Equipped with the appropriate data, information, and tools that are fully integrated with renewable energy forecasts, operators will achieve a higher level of situational awareness and become more confident about managing variable resources. As a result, they are more likely to run the grid less conservatively, allowing a greater percentage of the renewable energy to actually be dispatched. The term “duck curve” was coined by the electric power industry to refer to its system’s load net of renewable generation resources (i.e. wind and solar) with the belly of the duck being primarily the effect of penetration of utility-scale solar. The net load required to be supplied by an electric system from dispatchable resources, including imports (i.e. system load minus load served by utility-scale variable generation—wind, solar PV, and solar thermal) has gotten lower and lower, and some of the regions in the U.S. and around the world have the “duck belly” more than 50% of the total load during the peak hours of the renewable energy output, much more quickly than originally projected.

5.3.3 Impact of equipment health condition monitoring on operations data modeling In recent years, transmission operators have actively pursued having power grid asset/equipment information available in control centers to facilitate their decision-making. There are several types of assets or equipment data currently received by the operators in control rooms and some specific equipment information, whether direct or derived, desired by operators to support their decisionmaking to maintain a more reliable and efficient power grid. The common business drivers or project justifications shared within utility space are discussed in the following: Equipment health diagnosis technologies, also known as sensor technologies, have made tremendous progress in recent years, and it makes practical sense to leverage the emerging technologies for the benefit of grid operations where practicable and cost effective. Situational awareness has become more and more important in control centers, and critical asset information will further improve the situational awareness of the grid operators. Asset condition or health information can provide the operator with “look ahead” capability to proactively plan ahead for potential security or emergency situations. Operator awareness of the present condition of equipment has great potential to avoid catastrophic equipment failures, which is of great benefit to the power system and public through improved overall system reliability.

115

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Asset/equipment information facilitates operator’s decision-making and operational risk management abilities. For example, if a piece of equipment is shown to be in poor or cautionary condition, the grid operator can weigh in this information, along with other relevant information (e.g., approaching storms), to decide whether to take the equipment out of service or reduce its loading to maintain system reliability. The desired asset condition information can also be integrated into the real-time contingency analysis to develop mitigation strategies. Thus, the asset/equipment information empowers the grid operator to proactively manage potential reliability risk and enhance overall system reliability. On the power grid asset management side, the benefits of the asset information being available for grid operators also include improved system reliability through reducing the risk of equipment failures, improved efficiency through dynamic equipment ratings, and potentially prolonging equipment life by avoiding costly failures. Finally, asset information provides the grid operator with another tool to comply with mandatory reliability standards in an increasingly complex and challenging operating environment that involves wider geographic areas, integration of renewables, and other novel supply- and demand-side technologies, as well as growing cyber security concerns [12]. It has been widely witnessed in the power industry that tremendous progress has been made in developing and demonstrating technologies that can diagnose the health of power transformers, circuit breakers, and other power system equipment. The useful information that these technologies can yield to improve operational awareness has been identified as: a) current equipment condition; b) sudden change in condition, if any; c) life expectancy, or at the least, likelihood of failure in the short term (i.e. days); d) loading margin (MW, Mvar); and e) prediction of system operational risk. Caveats should be given that the grid operators need to resist information overload by receiving volumes of asset/equipment data. Rather, the asset-related data needs to be filtered, processed, and analyzed before being sent to the grid operators so that the actionable information derived from such data can effectively support their decision-making. In a nutshell, grid operators need to see succinct, actionable information displayed clearly on EMS dashboards or other often-used EMS visualizations, which will minimize the need for additional training and operating procedures [13]. Figure 5-8 shows the aggregated response to an EPRI-conduced survey question, “What specific data/information do you currently receive in the control center regarding equipment health condition?” which summarizes the most relevant data the grid operators are interested in obtaining for their improved operation performance.

116

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-8: Equipment data/information currently received in control centers

For the purpose of concisely presenting the most relevant information to the critical mission that grid operators execute, it has also been an initiative to use probabilistic reliability assessment (PRA) to filter and analyze the asset condition information and make it more concise and ready for the operators. PRA methodology provides a technical approach to assess risk posed by undesirable system events during contingencies. Specifically, PRA combines probabilistic measure of the likelihood of undesirable events with potential consequences of these events to arrive at a reliability index—probabilistic risk index (PRI). There has been substantial research and development in the area of calculating, improving, and implementing the PRI algorithms and processes. Among those efforts, EPRI developed the PRA program as a tool for system planners and operators to perform riskbased reliability assessment as diagrammed in Figure 5-9, which shows an integrated methodology condition index calculation integrated into the calculation of the probability of the outage situation. The PRA starts with the equipment condition indices that are comprised of normal degradation and abnormal degradation condition components. When these indices are summarized and used to rank a transformer fleet, they will be referred to as the condition ranking indices (CRI) with a CRI-derived value used as a simple modifier to the unavailability. In the traditional, deterministic power system study, the contingencies are ranked according to the severity of their consequences (number of violations, sum of violations, average violation, margin to voltage collapse, etc.). This approach does not take into account the likelihood of the system to experience operation limit violations in contingency situations. The probabilistic approach weights the severity by a probability to yield the PRIs. Ranking the contingencies according to their probabilities and consequences gives an account of the risk posed by the contingencies to power system reliability. The PRA program uses the contingency analysis results as well as the equipment outage information as the inputs to compute an overall risk, PRI. The PRA program performs a detailed assessment of undesirable consequences of a contingency such as thermal and voltage violations, voltage collapse, and load loss. The PRA program can help system planners and operators to identify the most critical potential grid contingencies and compare their adverse impacts to other contingencies [12].

117

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-9: Proposed concept to incorporate equipment condition information indices into PRA calculations

The grid operators used to receive the asset/equipment information through asset management or field personnel, and this may still be the case in many utilities where operators sometimes have to scramble to gather adequate information about the equipment under duress and to perform quick assessments or detailed analyses, such as the real-time contingency analysis, to arrive at mitigation 1-2 measures. This communication method inherently introduces some delay and possibility of miscommunication. Now that most of the real-time equipment health information is available in the control center, the firsthand knowledge will help the grid operators in assessing the situation and the associated risk. However, this constitutes a widely recognized challenge of organizing/integrating the asset health related data into the current grid operation data models and merging the information into real-time SCADA data stream and connected network model. In the past decade, there have been some recommendations within the industry for the network architecture, communication protocol, and information model needed to integrate and transmit this equipment health data to grid operators efficiently. It presents an initial effort to identify functional specifications for an integrated "equipment health information system for grid operators," including conceptual visual displays. The use of CIM for asset health information sharing addresses how the CIM can be leveraged in defining both the shared semantic data model and the actual data exchanges required by the integration layers of the framework. There have been some preliminary results proposed by various research institutes and consulting firms on this forefront endeavor. Figure 5-10 illustrates a proposal from EPRI regarding integrating asset health information into the CIM-based EMS data structure for the gird operators and reliability engineers to consume.

118

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

119

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-10: Overview CIM class model for breaker health integration environment

Power transformers and circuit breakers are two of the most important transmission components due to their high costs and large impacts on system operations if a failure occurs. The diagram below is presenting rating the CIM extensions for circuit breaker data modeling for grid operations in the EPRI proposal to include asset health information in the overall network model standard formed by the IEC 61970 and IEC 61968. Figure 5-11 shows the extensions in relation to the framework set up in CIM for congregation and contextualizing the data needed for power system operations [14]. With the asset health information being in the same data structure alongside with the SCADA real-time data, the parameters, and connections of the major grid components, the operators will be better equipped with the information you need to operate the grid in a more effective, efficient, secure, and reliable way.

120

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Figure 5-11: Location of UML diagrams and modifications for the breaker health integration

5.4 ADVANCED DATA INTEGRATION MODELING CASE STUDY The big data challenges in power grids discussed above, high level and detailed alike, are demanding a comprehensive and state-of-art data modeling and integration strategy to meet the challenges and offer solutions to the issues. Striving to meet the challenges identified previously, Dominion Virginia Power, one of the largest energy producers and transporters in the U.S., with an asset portfolio of 28,000 megawatts of power generation and 6,500 miles (10,400 km) of electric transmission lines, is in the process of establishing a data integration, modelling, and analytics platform that will integrate the operational data with asset information to operate its grid more reliably and better manage its assets across electric power networks. Dominion has acquired an on-the-shelf commercial data historian, in conjunction with a network model management (NMM) application, as the integrated solution to enable such objectives. The proposed data platform being implemented in DVP for grid and field operations includes all major datasets that can improve the situational awareness of the operators, engineers, and technicians for better decision making to operate the grid with greater reliability, stability, and efficiency. Time-series data from a wide range of sensors and sources are integrated and brought into the data collector (the new Dominion data historian) by various interfaces and adaptors, including real-time dynamic data such as SCADA, PMU, and on-line monitors, off-line data such as field testing and lab testing results, filebased data such as oscillography COMTRADE files, and static data such as equipment ratings. Nontemporal data such as SAP that stores work orders, asset information, and maintenance records and geographical information such as ArcGIS are also being accounted for in the data platform. It should be mentioned that integration and modeling of other data sources will also be considered if deemed instrumental to the overall operational standard. Once all the data are consolidated in to new data historian, there central data repository can function as a strong engine to drive a great deal of applications related to enhance data utilization and situational awareness of the system. The data model can be derived from the integrated data; analytics can be developed to improve grid reliability and operational economy; event detection and notification can be

121

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

configured; visualization can be set up; dynamic asset management can be optimized by data driven algorithms; and even switching work order can be automatically generated for review if abnormality is found somewhere in the grid. Key to the success of data modelling and NMM implementation in this platform is the power network model management and the integration of asset-related data into the contextualized business intelligence and situational awareness data hierarchy for grid operations. The goal is to correlate asset information with the connectivity nodes in the network models for both planning and operations. Consequently, Dominion is looking to implement some initial use cases to leverage network model and connectivity information in business intelligence data structure such as dynamic equipment health assessment for strategic asset management and optimal VAr advisory for the optimization of reactive power flow. Traditionally, utilities gather asset health data from online and office line capabilities, as well as historical SCADA information. The advent of centralized network model management (NMM) capability has allowed utilities to not only streamline their network model usage across operation, planning, and engineering but also to make the network connectivity information available for asset management purposes. Given the business challenges that Dominion and more broadly the industry faces today and some of the technology investments it has committed to, the asset and network model integration solution architecture was developed to meet the business challenges. This architecture reflects the industry best practices around the technology, standards, and requirements of utility asset and network model management. The following diagram provides an overview of the architecture. It is believed that, implemented correctly, it will provide the right foundation for Dominion’s short- and long-term asset management and operational applications needs as business requirements grow and change over time.

Figure 5-12: Asset and network model integrated solution architecture

Specifically, this data flow diagram and solution architecture in Figure 5-12 propose the following key concepts: 

The data infrastructure being implemented in Dominion will be the main repository for operational, offline, and field test data with histories. This would allow Dominion to have access to a variety of operational data from a single source. Sources for this include but are not limited to: PMU, DFR, OLCM, field tests, and SCADA.



The integrated data model will provide the asset hierarchies so that users will be able to navigate to the desired operational data through a visual and searchable asset structure. This asset hierarchy model will be developed using Dominion’s asset data structure and be supplemented by IEC CIM asset-related model elements.



The data historian visualization suite: This is where asset management related analytics can be developed and used by end users, operators, engineers, technicians, and managers alike. Self data service business model will be the theme behind this concept.

122

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Network Model Management: This is a standalone application that provides a centralized environment for network model management and maintenance, leveraging the IEC CIM connectivity model structure. Vendor applications in this area can import and export a variety of model file formats including the IEC CIM-based connectivity model. Applications to interact with this environment will be those of EMS/DMS, planning, and protection.



Asset Repository: The main purpose of this asset repository (AR) is to store the historical information about the electrical connectivity information and their relationship to assets. This information is critical to support the desired use cases for situational awareness and operational excellence within and beyond the control rooms. There are several options to implement this asset repository. It can be implemented in the new business intelligence database server environment as a set of relational tables. It can be implemented as a standalone database. Or it can be implemented as part of the NMM application server environment. It also depends on Dominion’s desire for the long-term use of this asset repository. Asset data from SAP and GIS will be integrated into AR and then made available to the business intelligence data structure.



BI Tools: For analytics that go beyond the data in the data historian and the asset repository, business intelligence (BI) tools can be deployed to support reporting, query, and analysis of real-time and asset data to establish a picture of the prevailing system operating conditions.

Mixed data from multiple sources, validated network model with mapped asset data, and across-thesystem data semantics: all of those complex data properties are what the current electric utility data systems typically do not possess to conduct advanced analytics for grid situational awareness. Within this future-oriented data platform, however, the data integration process addresses those issues and makes those features available for the advanced analytics, which opens the gate for a great many leading-edge analytics for a more adaptable, more secure, and more reliable power grid, among which are 1) advanced/predictive restoration systems, 2) adaptive topology planning, 3) system dynamics and transients modeling validation, and 4) wide area profiling and system management, to mention but a few. All of those aforementioned easy wins and advanced grid analytics are all great examples of opportunities of this new data platform.

5.5 CONCLUSIONS Currently, the primary sources of data in utilities for grid operations are offered by SCADA gathered from various sensing devices, phasor measurement units (PMUs) distributed over transmission and distribution networks, consumption data collected by smart meters, which are deployed at customer premises, and intelligent electronic device data, which represent the data collected from individual grid components. In addition to the data directly obtained from the electricity infrastructure, utilities may collect data from other resources to facilitate system studies such as weather data, geographic information system (GIS) data, manufacturer data, electricity market data, and others. Due to the complexity of the power grids, the high volume, large variety, and fast velocity characteristics of the utility space big data, and lack of strategic methodology and plan to alleviate or eliminate the issues arising from it, the electric utilities are consistently confronted with day-to-day difficulties list below in the operations of their grid and the equipment within it. 1) Data silos 2) No semantics layer on top of the data 3) Lack of cross system integration 4) Not all relevant data is shared 5) Difficult to share data and models 6) Excessive time used to validate data/models, not running studies 7) Data accuracy and inconsistency 8) Common data not in sync and up to date 9) Impossible to propagate data change to all pertinent data destinations To cope with these challenges, it is first and foremost important for the electricity utility companies to properly integrate and model the data from the variety of data sources that we are relying on to carry out the daily operations of the grid in a stable, dependable, reliable, and more efficient fashion. As we described in Section 5.2, utilities are moving toward data description from international standards such as IEC 61850, CIM, or COSEM. Standards are likely to have evolved and others may emerge over the next few years, but the important thing is to continue the efforts. Indeed, the normative effort is crucial

123

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

because it makes it possible to improve the opportunities for interoperability of electrical systems, which are increasingly interconnected both between countries and between upstream and downstream with the arrival of new uses.

5.6 REFERENCES [1]. [2].

[3]. [4]. [5].

[6]. [7]. [8]. [9].

[10].

[11]. [12]. [13].

[14].

PJM Operation Support Division, "PJM Manual 3A: Energy Management System (EMS) Model Updates and Quality Assurance (QA)," August 25, 2016. "IEEE Standard for Calculating the Current-Temperature Relationship of Bare Overhead Conductors," IEEE Std 738-2012 (Revision of IEEE Std 738-2006 - Incorporates IEEE Std 7382012 Cor 1-2013), vol. no, pp. 1-72, 23 Dec 2013. U.S. Department of Energy, Electricity Delivery & Energy Reliability, "Dynamic Line Rating Systems for Transmission Lines," Smart Grid Demonstration Program, April 25, 2014. Power Systems Engineering Research Center, "The Next Generation Energy Management System Design," Sept 2013. J. Giri, M. Parashar, J. Trehern and V. Madani, "The Situation Room: Control Center Analytics for Enhanced Situational Awareness," IEEE Power and Energy Magazine, vol. 10, no. 5, pp. 2439, Sept.-Oct. 2012. EPRI CIM Primer 3rd edition Technical Report – 2015 IEC 61850-7-1 : Basic communication structure – Principles and models – 2011 DLMS WEBSITE : http://dlms.com/index2.php V. Madani, et al., "Advanced EMS Applications Using Synchrophasor Systems for Grid Operation," T&D Conference and Exposition, 2014 IEEE PES, Pages: 1 – 5 DOL 10.1109/TDC.2014.6863246. Jampala, et al., "Practical Challenges of Integrating Synchrophasor Applications into an EMS," 2013 IEEE PES Innovative Smart Grid Technologies Conference (ISGT), Pages: 1 - 6, DOI: 10.1109/ISGT.2013.6497847. F. Albuyeh, "Integrating Variable Renewable Generation in Utility Operations," Power and Energy Society General Meeting, 2010 IEEE, DOL10.1109/PES.2010.5590118. Integration of Asset Information into Control Centers: Prioritization of Asset Information and Concept Development. EPRI, Palo Alto, CA: 2012. 1024257. Integration of Equipment Condition Information into Control Center Operations: Survey on Equipment Condition Information for Transmission Operators. EPRI, Palo Alto, CA: 2014. 3002004614. “Standard Based Integration Specification, Common Information Model Framework for Asset Health Data Exchange”, EPRI 2014 Technical Update

124

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

6. DATA QUALITY AND VALIDATION 6.1 INTRODUCTION In the modern digitized world, most advanced industrial operations depend on information systems for control and analysis. Data is increasingly being considered a valuable asset, of equal worth to physical assets, and considerable costs are involved in collecting, storing, and acting upon the data. As with physical assets, the quality of the data is a prerequisite for ensuring reliable operations. Additionally, high data quality must be ensured to enable reuse of data and to enable analytics on historical data. Information and analytics-driven organizations, with no traditional physical operational commitment, rely solely on high-quality data to stay competitive. Data value chains are common in industry, production, and business operations. Data is born, follows a value chain, and is then refined and prepared for several different tasks. Thus, the user or system utilizing the data does not necessarily have knowledge of the data origin, quality level, weaknesses, legal or contractual obligations, semantics, changes in the system capturing data, and the context in which the data was born. In order to ensure both reliable operations and valid analytics, it is important that the data quality is assessed and continuously monitored for all critical systems and services. Organizations should define data quality policies, and processes should be in place to support these policies. The requirements for data quality and dataset definitions must be clearly stated, and measurement points should be implemented in order to verify compliance with requirements. Ideally, the measures should be in effect across the entire organization to ensure optimization and to avoid data quality assessment being performed in silos. The analytics applications and associated visualizations described in previous sections will provide reliable and useful actionable information to system operators as long as the input data they are fed with maintain high-level of quality. Hence, it is critical that data quality of all types of data used in operator support applications is assessed and continuously monitored. With the exceptional growth of data from sensors, intelligent electronic devices, and other sources in these analytics applications data quality has become a prevalent aspect. Indeed, issues such as empty value, redundancy, inconsistency, and inaccuracy have been increasingly detected in data from those sources. These data problems hinder successful implementation and deployment of situational awareness applications, as they have a direct impact on the accuracy and validity of the analysis. There is not a universal definition of data quality as applied to power system applications. One commonly used data quality concept derived from the ISO 9000:2015 standard states that data quality measures the degree to which a set of characteristics of data fulfils requirements. Examples of characteristics are: completeness, validity, accuracy, consistency, availability, and timeliness. Requirements are defined as the need or expectation that is stated, generally implied, or obligatory. Therefore, per this broad concept, essentially any aspect of data that bears on its ability to satisfy a given purpose falls under the umbrella of data quality [1, 2]. Establishing good practices to maintain data quality is an institutional issue. Organizations need to define and put in place data quality policies and processes to ensure data quality requirements are compiled at various levels. Ideally, the measures should be in effect across the entire organization to ensure optimization and to avoid data quality assessment being performed in silos. Different frameworks, methodologies, and approaches for data quality assessment and improvement have been developed. This report presents a general discussion on data quality issues in power system operations and describes methodologies for data quality assessment and improvement methods. Consistent with the overall approach of this technical brochure, references for detailed information are provided for the interested readers.

6.2 DATA QUALITY PROBLEMS Power data has the characteristics of large amount and many types. They are mainly derived from the control system, the production system, and the management system, such as data monitoring information, smart meter collection, device maintenance information, SCADA, Internet of Things (IoT), and energy management systems.

125

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

This data is used for supporting different kinds of application topics, such as sensing status, load forecasting, and user behavior analysis. Besides, many enterprise management systems (ERPs) are established to record and produce enterprise data such as financial and human resources data. However, in all the data mentioned above, there is a variety of data quality problems that may impact data analysis. The main manifestations are data incompleteness, data inaccuracy, data nonnormative, data inconsistency, and other aspects [5-7]. For example, the following are some common data quality problems found in smart meters: 

Incompleteness: The smart meter needs to collect multi-point data every day, including positive and negative active data, reactive power data, three-phase voltage data, etc. However, some data collection points are missing.



Inaccuracy: The equipment operation time is not accurate. In particular, the operation time information is not updated after the transmission line is replaced or broken. The customer contact information (telephone number and address) is not accurate. The information cannot be updated immediately after some changes.



Non-normative: Equipment manufacturer data is not standardized; multiple names may exist for the same manufacturer.



Inconsistency: The inherent connection association from transformer substation to line, to transformer, to substation area, to key users is not consistent by data level.



Non-uniqueness: The equipment master data is maintained by multiple sources in the infrastructure, materials, production, and dispatch systems. For example, one equipment may have different names and encodings.

Another example of data quality problems is related to a material system. This mainly focuses on the non-standard data entry, incompleteness, and duplicate data entry: 

Incompleteness: Kinds of fields such as material number, start time, production time, etc. are empty.



Inaccuracy: The contract amount is less than 0, or more than 10 billion.



Non-normative: The encoding method does not conform to the specification. The data type is not standardized.



Inconsistency: The same information such as line loss rates may be different in the statistics of the financial, planning, and operational system.



Non-uniqueness: The same information may be entered in multiple systems.

As a result of power systems being interconnected and networked, data inconsistency problems will be there. Due to personnel negligence, database failure, communication interruption, and other reasons, data association may be missed or mismatched, and data quality problems, such as data loss and data error, will be caused. Enterprise information level differences, data model differences, and other reasons will also cause a data interoperability problem. These data quality problems will have direct impacts on the results of subsequent data-analytics applications in a production operation. Therefore, it is necessary to focus on improving power data quality.

6.3 DATA QUALITY ASSESSMENT Data quality assessments are typically performed both bottom-up and top-down. The bottom-up approach utilizes profiling tools and schema inspections to perform a generic and usage-agnostic assessment. The bottom-up approach will reveal indicators of potential areas of data inconsistency. However, due to the generic nature of the method, this approach is also prone to detect false positives. The top-down approach will involve domain experts and actual usage scenarios to detect inconsistencies. Although this method does not typically result in false positives, it will not lend itself easily to automation and hence might not be as conclusive as the bottom-up approach. Normally, the bottom-up assessment provides valuable input to the top-down assessment, and hence both are required to perform an exhaustive evaluation. Performing an initial iteration is recommended in order to validate input data flow, map data paths and all transformations, identify enhancements and refinements on data, collect and use metadata

126

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

and schemas involved, and document this as part of the data quality assessment. The next iteration will detect any relationships between data feeds. These could include: (i) a sensor measuring power supply to a pump can be correlated to sensor data measuring performance of a pump, or (ii) starting an engine, resulting in a rise in engine oil temperature being detected by sensors. Subsequent iterations can assess whether the data quality is sufficient for the algorithms using them. If simulations and/or digital twins are used, these models should also be quality assured. For this purpose a digital twin can be viewed as a dataset in the context of the data quality method. The first assessment provides a baseline for measurements and the improvement cycle. ISO 8000-8 defines three categories for data quality measurements: syntactic, semantic, and pragmatic. Information and data quality are defined and measured according to these categories. For organizations with well-defined requirements, the assessment will tend towards that of the assessment model for ISO 8000-8. In this case, the data quality dimensions are categorized according to ISO 8000-8, and the appropriate methods are employed:   

Automatic syntax and integrity checks for syntactic quality. Correlation with reference models and sampling techniques for semantic quality. Algorithm sensibility for data quality issues, user feedback, and focus groups for pragmatic quality.

As described above, data quality can generally be understood as “the extent to which a set of intrinsic properties of data meets the requirements.” The intrinsic properties can be decomposed into five dimensions: 

Timeliness: the extent to which data generation and transfer meet the requirements of management and usage.



Integrality: the extent to which the data has or maintains its intrinsic information.



Compliance: the extent to which the type, format, dimension, and accuracy of the data meet the normative design.



Accuracy: the degree to which the data truly reflects the actual information.



Consistency: the data association obtained by different approach.

Different methods are applied to assess data quality problems in the various levels of the logic hierarchy model depicted in Figure 6-1. For example, information matching method is usually adopted to discover data quality problems in control layer; data analysis method and rule checking method can be used for the operation layer, management layer and analysis layer. Some of these methods are briefly described below.

6.3.1 DATA INTERPOLATION In power transmission and distribution systems, measurement of electrical magnitudes is generally redundant due to the coexistence of various monitoring systems, such as SCADA system, PMU systems, power quality monitoring systems, energy metering systems, and so on. The SCADA system is a system with relatively complete monitoring points and relatively high measurement frequency, while PMU based systems have relatively small number of monitoring points but a high measurement accuracy. When the same magnitude (e.g. voltage) is measured in both SCADA and PMU at the same time, the SCADA voltage data (mainly voltage amplitude) can be compared to voltage data measured by PMU. When data from the two systems do not match, data interpolation can be used to match the SCADA data and PMU data. For measuring and checking voltage data, the following two methods can be used: During the operation of power system, if the load does not change abrupt, for the sampling interval of SCADA system, the voltage amplitude at a certain time is usually related to its adjacent last time or next time. Therefore, the voltage amplitude of a certain time can be detected by comparing the amplitude of the voltage at the adjacent time. The voltage amplitude of a node can be obtained by calculating the measurement information and line parameters of multiple nodes with topological relevance (i.e. different locations or different space). It

127

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

can be used to detect and repair the corresponding voltage data of SCADA system if the calculation of the voltage amplitude is accurate.

6.3.2 DATA PROFILING Data profiling is the systematic analysis process of data structure, data content, and data relationship [11-13]. Through this method, it is possible to empirically examine the potential problems of data. Figure 6-1 shows the implementation of data analysis. table profile data structure profile

table description minimum length

column profile maximum length

data profiling

data content profile

column profile

data value distribution

Integrality

data pattern distribution

Effectiveness

placeholder format

Consistency

foreign key analysis cross-table profile correlation analysis

data relationship profile table profile

dependency conflicts

Figure 6-1: Implementation of data analysis

Through data profiling methods, the abnormal value that can be identified as follows: 

High frequency value: its frequency is greater than the expected value.



Rare value: its frequency is lower than the expected value.



Complete value: the empty value is higher than the expected number or percentage.



Frequent pattern: its frequency is larger than expected pattern.



Rare pattern: its frequency is lower than the expected pattern.



Value cardinality problem: the number of different values in columns is higher or lower than expected.



Accident value: value that does not conform to the defined range constraint.



Defaults: high frequency value or the empty value as the default value.



Orphan record: a record that has a foreign key but does not match the main key.



Mapping problems: the consistency of the values between columns in a single table or cross table does not conform to expectations.



Duplicate records.



Association relation: association relation does not follow the defined mapping expectation (for example, a primary key record is mapped to more than one foreign key record, but association relation requires one to one mapping).

128

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

Based on data profiling, validation rules can be set up while locating data problems, and the quality assessment model can be used to conduct comprehensive quality diagnosis for the analysis dataset.

6.3.3 DATA QUALITY ASSESSMENT FRAMEWORK Except data profiling results, data quality must need electrical business logic knowledge (such as charge of electricity requirement, relation between meter and measuring point) to determine whether the results are correct. So, based on data profiling results and business logic knowledge, we can analyze and evaluate the data quality by tools or program codes to provide better data quality analyzing basis. A data quality assessment framework can be developed from timeliness, integrality, compliance, accuracy, and consistency dimensions [6, 13, 15]. Each dimension can set up several rules to be described, which is shown in the figure below. Some rules are given as follows in detail. Data Timeliness Rule Timeliness

Record Integrity Rule Non-Blank Rule Primary Key Rule

Integrality Foreign Key Rule Type Rule Data Quality Assessment Framework

Format Rule Compliance

Dimensional Rule Precision Rule Range Rule Equivalent Function Dependence Rule

Accuracy

Logic Function Dependence Rule Code Rule

Equivalent Consistency Rule

Consistency

Logical Consistency Rule Figure 6-2: Framework of Data Quality Assessment



Data timeliness rule: the generation and circulation of data should meet (or meet certain conditions) timeliness requirements of management and use.



Record integrity rule: the number of centralized records of the test data should (or meet certain conditions) meet the business expectations.



Non-Blank Input rule: the tested data of the dataset should (or meet certain conditions) be not null.



Primary key rule: when a field of the tested dataset is the primary key, the value of the data should uniquely identify a record.



Foreign key rule: when a field of the tested dataset is the foreign key, that field should (or meet certain conditions) reference the primary key of another data table.

129

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Type rule: the data type of the tested dataset should (or meet certain conditions) meet the field type requirements predefined by the business system.



Format rule: the format of the tested data in the dataset should (or meet certain conditions) meet the field format requirements predefined by the business system.



Dimension rule: the dimension of data should (or meet certain conditions) meet the dimensional requirements predefined by the business system.



Precision rule: the precision of numerical data should (or meet certain conditions) meet the precision requirements predefined by the business system.



Value-domain rule: data values of the tested dataset should (or meet certain conditions) occur within a certain range, and the range can be determined by one or more means such as the data dictionary, business knowledge, distribution, and variation of historical data.



Equivalent function rule: in the same data table, one data should (or meet certain conditions) be calculated from another one or more data, and such equivalence calculation relationship must to be in line with the business characteristics.



Logical function dependence rule: in the same data table, one data should (or meet certain conditions) meet some kind of logical relationship (greater than, less than, earlier than, later than, etc.) with another one or more data, and this logical relationship must be in line with the business characteristics.



Code rule: the value of the data from the tested dataset should (or meet certain conditions) conform to the constraints of the source business system’s design.



Equivalency consistency dependence rule: in different data tables, one data should (or meet certain conditions) be calculated from one or more data from other data tables, and such equivalence calculation relationship must to be in line with the business characteristics.



Logical consistency dependence rule: in different data tables, one data (or meet certain conditions) should satisfies some logical relationship (greater than, less than, earlier than, later than, etc.). With one or more data from other data tables, this logical relationship needs to be in line with the business characteristics.

6.4 DATA QUALITY PROBLEM CORRECTION In data analysis and inspection process, many data quality problems would be revealed. Having identified the data quality problems that may adversely impact the reliability and validity of analytics applications, the next step is to design and implement a measure to solve the issues found [16, 17]. Figure 6-3 shows the main steps to correct data quality problems. The task involved in each of these steps are described next.

Impact assessment

Correction and cleaning

Monitoring and Prevention

Scavenging of essential causes

Figure 6-3: Concept to correct data quality problems

130

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

6.4.1 IMPACT ASSESSMENT At first, consider the following characteristics of the data quality problem [15, 18]: 

Scope of the influence. It identifies the extent to which business processes is impaired by the problem



Feasibility of correction. That is, the possibility to correct the data quality problem in question.



Feasibility of prevention. That is, the possibility of eliminating the root cause of the problem or identifying problems through continuous monitoring.

6.4.2 CORRECTION AND CLEANING The correct-and-clean process almost exists in all stages of data collection and storage, integration, analysis, and application. Figure 6-4 shows the entire flow of power system data from collection to application. In different stages of the data correction and clean, the methods and emphases are focused differently [6, 15].

Figure 6-4: Data flow and problem correction control points

a.

Data collection process

It is usually based on multi-source information from different sampling time and different monitoring sources with related topology nodes, as well as business common sense to do the abnormality judgment and deviation correction [18]. b.

ETL process

Data is not perfect. There is a gap between the raw data and the final result. It usually needs to be cleaned, converted, and sorted by the ETL (extract the transformation load). ETL includes three main links. The first one is data extraction. It implies reading data from original business systems, which could be different operating platforms or different databases. The second is data conversion. This process entails converting data under pre-defined set of rules, including the operation of fields merge and split, sorting, default value assignment, data aggregation, and so on. The third link is data load, during which the conversion data is loaded into the data warehouse [15, 19]. In the data conversion process, operations for resolving data quality problems include: 

Data integrality check and incomplete data filling



Incorrect data check and repair



Duplicate data inspection and handling

131

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Inconsistent data conversion



Data granularity transformation



Calculation based on business rules



Data desensitization

The data quality problem correction operation is an iterative process. Whether to perform the data extraction from the source business system to the ODS or the data warehouse, as well as whether to perform the repair operation on the data that does not meet the data quality rule, needs the managers of original business system and the data center to confirm. c.

Data analysis and application

In the ETL process, the data has been cleaned, converted, and collated. Then, they will be stored in the data warehouse. However, the ETL operations are not enough for the data quality requirement in the following data analysis topics or BI (business intelligence) analysis topics. There may still be some data quality problems in the different application topics, such as data mismatching and logical error. These problems should be corrected to satisfy the power business requirements. Thus, data quality problems should be differently considered during the different phases of power data lifecycle, which is shown in Figure 6-5 [6, 20-22].

Figure 6-5: Data cleaning in data analysis and application phase

Except the ETL process, the cleaning modes of data analysis/BI analysis in the above figure include but are not limited to the following: 

Replace standard content



Uniform field format and content



Null field assignment



Special character substitution



Multi-column logic operation or splicing



Duplicate records remove



Regular information extraction

132

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Rich data information



Replacement with regular expression



Parse special format data



Address standardization



Model-based continuous numerical value filling

When solving data quality problems in data warehouse and big data platforms, many challenges will be faced, like the amount of data is huge and the data types are diverse. It is recommended that run the batch, schedulable and rapid cleaning algorithms on distributed computing platform.

6.4.3 SCAVENGING OF ESSENTIAL CAUSES In order to substantially solve the quality problem, it is necessary to analyze where the problem comes from, and the best place to repair and eliminate the root causes. If the sources and the best place can be identified, it is possible to assess and correct the process to eliminate the essential causes of the quality problem. The possible essential causes include: 

Poor tunnel conditions



Arguments setting errors



System running failures



Extraction and conversion errors



Terminal failures



Human factors



Software systems lack verifications

Because the sources of data are diverse, models of data are large, and the conversion processes are complex. To identify data sources and the best place, the metadata-management-based data source tracking technology can actually improve the performance. Assessing and eliminating the quality problem causes can be considered from the following points: 

Assess the workload of every candidate program.



Choose one as the repair program.



Determine the repair time.



Design the development plan.



Design the test plan.

6.4.4 MONITORING AND PREVENTION If the workload of eliminating the above root causes goes beyond the organization’s capabilities, resources, or requirements, then monitoring procedures should be established for known data quality issues. When an error occurs to the monitoring procedures, the appropriate person can be notified to take appropriate action to delay or terminate the error until data processing continues normally.

6.5 CONCLUSIONS Data quality hasn’t been fully considered in the initial design phase of system function development of power control systems and enterprise management systems. In the process of data integration, analysis, and application, a series of problems such as data inconsistency, inaccuracy, and incompleteness must be faced and handled. However, these problems are extremely important for the system to achieve the desired effect in the applications of data analytics in system operations. For data quality measurement, kinds of methods including multi-source information matching, data analyzing, and rule testing can be used. For data quality correction, a set of technologies—including

133

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

issue influence analysis, correction, and cleaning; scavenging of essential causes; monitoring and prevention—is proven feasible to improve data quality, which can satisfy the needs of data analysis and application. DNV.GL proposed that the data quality assessment and improvement process include defining the scope, data exploration and profiling, data quality assessment, organizational maturity assessment, data quality risk assessment, and risk-based data quality improvement [6]. These measures have a positive effect on improving the enterprise’s data quality, maturity, and risk management.

6.6 REFERENCES [1]. [2]. [3]. [4]. [5]. [6].

ISO 8000-8 Information and data quality: Concepts and measuring ISO 9000 Quality management Data quality assessment framework, DNVGL-RP-0497, Jan. 2017 Yang et al. Journey to data quality, MIT Press 2006 X. Chen, et al., Integration of IoT with smart grid, IET ICCTA 2011, Beijing, China, 2011. T. Zhao, et al. Data quality assessment and improvement techniques for power system, SGCC Technical Report, 2017. [7]. G. Liu, et al., Evolving graph based power system EMS real time analysis framework", IEEE ISCAS 2018, Italy, 2018 [8]. H. Hang, Q.-L. Zhu, Development analysis and prospect of data quality control in smart grid, Science & Technology Information, pp. 92-93, Jul. 2012 [9]. K.-Y. Liu, et al, Detection and evaluation of SCADA voltage data quality in distribution network based on multi temporal and spatial information of multi data sources, Power System Technology, pp. 3169-3175, Nov. 2015 [10]. NASPI, PMU Data Quality: A framework for the attributes of PMU data quality and a methodology for examining data quality impacts to Synchrophasor applications, Mar. 2017 [11]. DAMA United Kingdom, The six primary dimensions for data quality assessment, Report, Oct. 2013 [12]. Sadiq, Shazia, Handbook of data quality: research and practice, Springer, 2013 [13]. C. Batini, and M. Scannapieco Data and information quality: Dimensions, principles and techniques, Springer, 2016 [14]. S. Keller, et al., The evolution of data quality: understanding the transdisciplinary origins of data quality concepts and approaches, Annual Review of Statistics and Its Application, vol. 4, pp 85108, 2017 [15]. Q/GDW 11570-2016, The common criteria of data quality evaluation based on power grid operation data, Enterprise Standard of SGCC, 2016 [16]. D. Loshin, The practitioner's guide to data quality improvement, 2011. [17]. H. Liu, et al, Research on the advanced computing method for supporting large data quality assessment and improvement, Advances in Computer Science Research, Jan. 2017 [18]. Y.-W. Cheah, and B. Plale, Provenance quality assessment methodology and framework, Journal of Data and Information Quality, vol. 5 (3), Feb. 2015 [19]. X. Chen, N. Li, F. Wu, and X. Li, Research on hierarchical information aggregation technology in the smart grid Internet of Things, Telecommunications for Electric Power System, vol.32 (230), pp.73-77, Dec. 2011 [20]. ISO 8000-110-2009-Data quality - part 110: Master data: exchange of characteristic data: syntax, semantic encoding, and conformance to data specification. [21]. K. Xing, et al, Mutual privacy-preserving regression modeling in participatory sensing, IEEE INFCOM 2013, Turin, Italy, Apr. 2013 [22]. W.H. Inmon Dan Linstedt, Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault, Morgan Kaufmann, Nov. 2014

134

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS

7. CONCLUSION With increasing complexity and interconnectivity of the grid, the scope and complexity of maintaining and increasing situational awareness have grown. As a consequence, there is the need to furnish system operators and operation engineers with better tools and visualizations for assessing system conditions, and for providing effective and timely decision-making and remedial reactions to an incident. It is not enough to just understand the current state. Situational awareness implies also the ability to foresee and anticipate system changes and their impact on system security. The large variety of internal and external data sources that are available today to electric utilities make it possible the implementation of advanced data analytics and visualization technologies to improve the way the system is operated and controlled. Analytics algorithms capable of synthesizing actionable information from the raw data can be used to provide tools that use real-time data streams to support fast, accurate, and adaptable decisions for solving critical problems in the right moment, as well as plan in advance mitigation actions to anticipated system security issues. Even though the use of data analytics for power system operation support is not new, its widespread use remains low. Hence, there is need to examine how advanced data analytics technologies can be further used to solve the emerging critical challenges in electric systems operations. This technical brochure provides an insight into how advanced data-analytics techniques and tools that integrate various data sources can be used to improve situational awareness of power system operators and support various operation functions. The content of this technical brochure is broken down into the major areas comprising the development and implementation of data analytics tools, which are: data and information sources, data analytic techniques to interpret these data, applications of these analytics in system operations, data integration and modelling to integrate data into operations, and data quality and validation. Some relevant observations and takeaway from this work are as follows: 

Utilities have started to realize the value and benefits of data analytics tools that integrate data from various data sources. Several software tools have been developed to serve a variety of functions to support system operation, including tools for system events detection, faults identification and analysis, wide-area monitoring, equipment health monitoring and analysis, trending and forecast of load, renewables and system conditions, and recommendation for operation. There is a recognized growing need to improve situational awareness for system operators, as it is also revealed in our industry survey. Advanced data management and analytics can help fill this need. One of the challenges for successful implementation of such tools is the difficulty to integrate data that is collected and resided in different enterprise systems. Hence, effective implementation of advanced analytics tools greatly depends on operational datamanagement policies and technologies. Proper data models allow the definitions and characteristics of the data to be clearly understood. Even though significant advances have been made to improve data interoperability, more effective and accurate data models and procedures are needed for ensuring data integrity and availability of the right data in the right format.



Another aspect that may hinder the implementation of data analytics solutions in system operation is the lack of understanding of the value and accuracy of these technologies. Traditionally, most of the tools used in control centers and operation engineering are based on system models and simulations. Engineers understand the capabilities and limitations of those tools, as well as the considerations that need to observed to develop and validate the simulation models. Data-analytics techniques that attempt to recognize and validate data patterns and trends and draw conclusions therefrom may be less understood. Advantages of both approaches—modelbased and data-based methods—can be combined in hybrid methodologies to developed superior technical approaches and software tools for use in system control rooms and to support various operation functions. Those tools will combine conventional analytics techniques based on physical models with heuristic data analytics and decision-making methodologies. For instance, simulations engines would perform contingency analysis across a number of scenarios, which in turns will be built with the help of data collected and integrated from a variety of sources. Data-analytics techniques will then be used to extract relevant patterns from the simulation results, assess vulnerability/risk, and classified critical conditions based on given risk criteria.

135

ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS



Effective use of these data sources in operation support tools relies upon the real-time exchange of this data through a high-performance, reliable, secure, and scalable communication network infrastructure. The new trend of integrated network architecture that is happening with the expanding smart grids investments will enable effective and reliable use of analytics tools that integrate various real-time data streams.



It’s widely recognized that visual analytics is key is to improve operator ability to understand the system situation and make effective decisions. Visualization technologies and techniques have advanced significantly since first developments in the early 1980s. Best practice from within the visualization industry and learning from other data intensive areas should be used. Newer visualization platforms include many advanced futures such as geographic-based dynamic visualization with user-friendly interfaces and real time measurements and analytical results from measurement-based and model-based tools that populate the system map. Visual aids strategies such as color contouring, 2D and 3D bubbles and cones, animation, geospatial representation, display profiles, and integrated system views are widely used in newer visualization tools. The most significant trend in new visualization is the integrated space-time concept, which is intended to help operators to assess current situations in a static fashion and to understand and visualize the conditions the system is evolving into, to get better prepared to implement effective control actions.



There is a fundamental need to understand the challenges and requirements that system operators are experiencing in terms of the main goals that drive their actions. Operators may need to reconcile multiple objectives and answer questions that may sound conflicting, such as: o

Is my goal purely economic?

o

Is my goal purely focused on maintaining system security at any cost?

It is not possible to utilize and deliver effective data-analysis tools and associated visualization without challenging these two key drivers. Also, another paradigm shift in terms of the approach for presenting data and information to the operator is required. The common current approach is “show the user lots of data,” but the rationale behind such simplistic strategy has never been clear. Perhaps the concept is that by providing the user all the data available there is less chance to filter out important information, or possibly because analytics tools used so far to process raw data have not provided successful results. Regardless of the cause, it is clear that the current approach will not improve situational awareness of system operators. This shift in thinking needs to be mainly based on timescales:



o

What can we tell from historic data? (have we been here before?)

o

Now and near now

o

Future (plus how long into the future do you need to understand)

Real time operations need to move away from a simplistic deterministic approach towards a decision-making process based on probabilistic scenario/contingency models, and by harnessing the insights provided by effective data analytics combined with advanced visualization. Currently, there is no single data-analytics solution for supporting operator decision-making that will fit all possible scenarios and requirements of modern power systems, but there is a potential for significant improvement as the data-analytics technologies continue to evolve. We expect that this technical brochure will be a positive step towards enabling readers to understand such a potential and the complexities involved in development and implementation process.

136

Related Documents


More Documents from "Max Rosas"

Design Engineer
March 2021 0