Rdbms Course Material

  • Uploaded by: Gopalan Ramakrishnan
  • 0
  • 0
  • January 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Rdbms Course Material as PDF for free.

More details

  • Words: 65,589
  • Pages: 281
Loading documents preview...
Relational Database Management System

Education & Research Department © Infosys Ltd.

COPYRIGHT NOTICE © 2009-2011 Infosys Limited, Bangalore, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this documentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the prior permission of Infosys Limited and/or any named intellectual property rights holders under this document. Education and Research Department Infosys Limited Electronic City Hosur Road Bangalore - 561 229, India. Tel: 91 80 852 0261-270 Fax: 91 80 852 0362 www.infosys.com mailto:[email protected]

COMPANY CONFIDENTIAL

Course Description and References

References 1. Database System Concepts, Henry F Korth, Abraham Silberschatz, Fifth Edition, McGraw-Hill International Edition, Computer Science Series 2. An Introduction to Database Systems, C.J.Date, Eighth Edition, Pearson Education 3. The Complete Reference SQL, James R. Groff and Paul N. Weinberg, Second Edition, Tata McGraw Hill Edition 2003 4. Scott Urman, Ron Hardman, Michael McLaughlin “Oracle Database 10g PL/SQL programming” Oracle Press 5. Kevin Loney, George Koch “Oracle 9i, The Complete reference” Oracle Press

Table of Contents

Table of Contents COPYRIGHT NOTICE ...................................................................................................... II PURPOSE .................................................................................................................... 1 1.

INTRODUCTION TO DBMS ......................................................................................... 2 1.1. WHAT IS A DATABASE? ........................................................................................... 2 1.2. WHAT IS A DATABASE MANAGEMENT SYSTEM? .................................................................... 5 1.3. FILE SYSTEM INTERFACE VERSUS DBMS INTERFACE ............................................................... 6 1.4. MASTER AND TRANSACTION FILES ................................................................................ 8 1.5. TRADITIONAL APPROACH TO INFORMATION PROCESSING ........................................................ 10 1.5.1. Disadvantages of the Traditional Approach to Information Processing ................... 11 1.6. WHY DBMS? ................................................................................................... 14 1.7. TYPES OF DATABASES .......................................................................................... 15 1.8. THREE LEVEL ARCHITECTURE FOR A DBMS ..................................................................... 19 1.9. DBMS USERS................................................................................................... 23 1.10. DATA MODELS .................................................................................................. 24 1.10.1. Object Based Logical Model ....................................................................... 25 1.10.2. Record Based Logical Model ....................................................................... 25 1.11. RDBMS ........................................................................................................ 28 1.12. SOME POPULAR RDBMS PACKAGES ............................................................................. 29 1.13. APPLICATION AREAS OF RDBMS ............................................................................... 29 1.14. KEYS ........................................................................................................... 29 1.15. SUMMARY ...................................................................................................... 36

2.

ENTITY-RELATIONSHIP (E-R) MODELING..................................................................... 38 2.1. INTRODUCTION ................................................................................................. 2.2. ENTITY AND RELATIONSHIP ..................................................................................... 2.3. CARDINALITY OF A RELATIONSHIP ............................................................................... 2.3.1. One to One Relationship ........................................................................... 2.3.2. One to Many Relationship ......................................................................... 2.3.3. Many to One Relationship ......................................................................... 2.3.4. Many to Many Relationship ........................................................................ 2.4. E-R DIAGRAM NOTATIONS ...................................................................................... 2.5. MODELING USING E-R DIAGRAMS ............................................................................... 2.5.1. Steps in E-R Modeling ............................................................................... 2.6. CASE STUDY 1: PROBLEM STATEMENT .......................................................................... 2.6.1. Case Study 1: Solution .............................................................................. 2.7. CASE STUDY 2: PROBLEM STATEMENT .......................................................................... 2.7.1. Case Study 2: Solution .............................................................................. 2.8. CASE STUDY 3: PROBLEM STATEMENT .......................................................................... 2.8.1. Case Study 3: Solution .............................................................................. 2.9. TRANSFORMING AN E-R MODEL INTO PHYSICAL DATABASE DESIGN ............................................. 2.10. MERITS AND DEMERITS OF E-R MODELING ...................................................................... 2.10.1. Merits of E-R Modeling ............................................................................. 2.10.2. Demerits of E-R Modeling .......................................................................... 2.11. SUMMARY ....................................................................................................

3.

38 39 39 40 40 41 41 42 45 45 45 46 48 49 50 51 54 56 56 56 57

NORMALIZATION .................................................................................................. 58 3.1. 3.2. 3.3.

INTRODUCTION ................................................................................................. 58 THE NEED FOR NORMALIZATION ................................................................................ 58 PROCESS OF NORMALIZATION ................................................................................... 59

Table of Contents

3.3.1. Determinant .......................................................................................... 3.3.2. Functional Dependency ............................................................................. 3.3.3. Full Functional Dependency ....................................................................... 3.3.4. Partial Dependency ................................................................................. 3.3.5. Transitive Dependency ............................................................................. 3.3.6. Key attributes........................................................................................ 3.3.7. Non key attributes .................................................................................. 3.4. TYPES OF NORMAL FORMS ...................................................................................... 3.4.1. First Normal Form (1 NF) .......................................................................... 3.4.2. Second Normal Form (2 NF) ....................................................................... 3.4.3. Third Normal Form (3 NF) ......................................................................... 3.5. MERITS AND DEMERITS OF NORMALIZATION ..................................................................... 3.5.1. Merits .................................................................................................. 3.5.2. Demerits .............................................................................................. 3.6. SUMMARY ...................................................................................................... 3.7. CASE STUDY .................................................................................................... 4.

60 60 61 61 62 62 63 63 63 64 67 68 68 68 70 71

STRUCTURED QUERY LANGUAGE (SQL) ..................................................................... 74 4.1. THE PURPOSE OF SQL ......................................................................................... 74 4.2. A BRIEF HISTORY OF SQL ...................................................................................... 75 4.3. DATA TYPES.................................................................................................... 76 4.4. STATEMENT TYPES.............................................................................................. 78 4.5. DATA DEFINITION LANGUAGE (DDL) STATEMENTS ............................................................. 78 4.5.1. CREATE TABLE Statement ......................................................................... 79 4.5.2. ALTER TABLE statement ........................................................................... 85 4.5.3. DROP TABLE statement ............................................................................ 87 4.5.4. TRUNCATE TABLE statement ...................................................................... 88 4.5.5. CREATE INDEX statement .......................................................................... 88 4.6. DATA MANIPULATION LANGUAGE (DML) STATEMENTS ......................................................... 91 4.6.1. INSERT Statement ................................................................................... 91 4.6.2. DELETE Statement .................................................................................. 95 4.6.3. UPDATE Statement .................................................................................. 96 4.6.4. SELECT Statement ................................................................................... 97 4.6.5. Sub-Queries.......................................................................................... 117 4.6.6. JOINS ................................................................................................. 123 4.6.7. Queries using EXISTS / NOT EXISTS ............................................................. 129 4.6.8. The Order of Execution of a SELECT statement .............................................. 130 4.7. VIEWS ......................................................................................................... 131 4.7.1. Horizontal View .................................................................................... 131 4.7.2. Vertical View ........................................................................................ 131 4.7.3. DROP VIEW Statement ............................................................................. 131 4.7.4. Joined Views ........................................................................................ 132 4.7.5. VIEW Updates ....................................................................................... 132 4.7.6. Checking View Updates (CHECK OPTION) ...................................................... 132 4.7.7. Advantages of Views ............................................................................... 134 4.7.8. Disadvantages of Views ........................................................................... 134 4.8. DATA CONTROL LANGUAGE (DCL) ............................................................................ 134 4.8.1. Granting Privileges ................................................................................. 135 4.8.2. Revoking Privileges (REVOKE) .................................................................... 136 4.9. BEST PRACTICES .............................................................................................. 137 4.10. SUMMARY ..................................................................................................... 140

5.

ON-LINE TRANSACTION PROCESSING(OLTP) ...............................................................141 5.1.

PURPOSE ...................................................................................................... 141

Table of Contents

5.2. TRANSACTION ................................................................................................. 141 5.3. TRANSACTION SYSTEMS........................................................................................ 143 5.3.1. Batch Transaction Processing System ........................................................... 143 5.3.2. On-line Transaction Processing System (OLTP) ............................................... 143 5.3.3. Real time Transaction Processing System...................................................... 144 5.4. TRANSACTION PROPERTIES .................................................................................... 144 5.5. REQUIREMENTS FOR AN OLTP SYSTEM ........................................................................ 145 5.5.1. Integrity .............................................................................................. 145 5.5.2. Concurrency ......................................................................................... 147 5.6. LOCKS......................................................................................................... 151 5.6.1. Shared Lock (S) ..................................................................................... 152 5.6.2. Exclusive Lock (X) .................................................................................. 152 5.7. GRANULARITY OF LOCKING .................................................................................... 153 5.8. INTENT LOCKING .............................................................................................. 155 5.8.1. Intent Share (IS) .................................................................................... 156 5.8.2. Intent Exclusive (IX) ............................................................................... 156 5.8.3. Shared Intent Exclusive (SIX) ..................................................................... 156 5.8.4. Case study for Intent Locks ....................................................................... 158 5.9. DEADLOCK .................................................................................................... 160 5.10. SECURITY ..................................................................................................... 161 5.11. RECOVERY .................................................................................................... 161 5.12. TRANSACTION LOG ............................................................................................ 163 5.12.1. Deferred update .................................................................................... 163 5.12.2. Immediate Update ................................................................................. 164 5.12.3. Check-Points ........................................................................................ 166 5.13. SUMMARY ..................................................................................................... 170 6.

INTRODUCTION TO PL/SQL ....................................................................................172 6.1. NEED FOR PL/SQL ........................................................................................... 172 6.2. PL/SQL ARCHITECTURE ...................................................................................... 173 6.3. PL/SQL BLOCK STRUCTURE ................................................................................... 173 6.4. COMMENTS IN PL/SQL ....................................................................................... 173 6.5. ANONYMOUS PL/SQL BLOCKS ................................................................................ 174 6.5.1. Declaration section ................................................................................ 174 6.5.2. Executable section ................................................................................. 175 6.5.3. Exception section ................................................................................... 176 6.6. PL/SQL BLOCK EXECUTION ................................................................................... 176 6.6.1. How a PL/SQL block can be executed? ......................................................... 176 6.6.2. Another way of executing the PL/SQL block .................................................. 177 6.7. NAMED PL/SQL BLOCKS ...................................................................................... 178 6.8. VARIABLES AND DATATYPES.................................................................................... 178 6.8.1. Scalar datatype - Character ...................................................................... 180 6.8.2. Scalar datatype – PLS_INTEGER .................................................................. 182 6.8.3. Scalar datatype - NUMBER ........................................................................ 182 6.8.4. Scalar datatype - Boolean ........................................................................ 182 6.8.5. Scalar Datatype - Date ............................................................................ 183 6.8.6. Scalar Datatype - Timestamp .................................................................... 183 6.9. DBMS_OUTPUT PACKAGE ................................................................................... 184 6.9.1. DBMS_OUTPUT procedures ....................................................................... 185 6.9.2. DBMS_OUTPUT procedures usages .............................................................. 185

7.

PL/SQL BASICS AND CONSTRUCTS ...........................................................................186 7.1. 7.2.

%TYPE ANCHORED DECLARATIONS ............................................................................ 186 BIND VARIABLES ............................................................................................... 187

Table of Contents

7.3. SUBSTITUTION VARIABLES ..................................................................................... 188 7.4. ACCEPTING INPUT IN PL/SQL................................................................................. 189 7.5. SET VERIFY ON/OFF ....................................................................................... 190 7.6. OPERATORS AND EXPRESSIONS................................................................................. 191 7.6.1. Concatenation operator ........................................................................... 191 7.6.2. Arithmetic operator - Addition .................................................................. 192 7.6.3. Arithmetic operator - Exponentiation.......................................................... 192 7.6.4. Usage of Arithmetic operators with DATE variables ......................................... 193 7.7. NESTED PL/SQL BLOCKS ..................................................................................... 193 7.7.1. Scope of variables .................................................................................. 196 7.7.2. Qualifying identifiers .............................................................................. 197 7.8. PL/SQL CONDITIONAL CONSTRUCTS .......................................................................... 198 7.8.1. IF THEN – END IF syntax ........................................................................... 198 7.8.2. IF THEN ELSE – END IF syntax ..................................................................... 199 7.8.3. Usage of inequality operator ( != or <> ) ...................................................... 199 7.8.4. IF THEN ELSIF – END IF syntax .................................................................... 200 7.8.5. LOOP.. END LOOP................................................................................... 202 7.8.6. Numeric FOR Loop .................................................................................. 203 7.8.7. Numeric FOR Loop – with REVERSE option ..................................................... 205 7.8.8. WHILE Loop .......................................................................................... 205 7.9. USING SQL STATEMENTS IN PL/SQL .......................................................................... 206 7.9.1. Using SELECT statements in PL/SQL ............................................................ 206 7.10. COMPOSITE DATATYPE......................................................................................... 208 7.10.1. %ROWTYPE ........................................................................................... 208 7.10.2. Using INSERT statements in PL/SQL............................................................. 209 7.10.3. Using UPDATE statements in PL/SQL ........................................................... 210 7.10.4. Using DELETE statements in PL/SQL ............................................................ 211 8.

PL/SQL EXCEPTIONS ............................................................................................211 8.1. INTRODUCTION ................................................................................................ 212 8.2. HOW TO HANDLE EXCEPTION? ................................................................................. 212 8.3. EXCEPTION SYNTAX............................................................................................ 212 8.4. EXCEPTION TYPES ............................................................................................. 213 8.4.1. Raising exceptions.................................................................................. 213 8.5. PREDEFINED ORACLE SERVER EXCEPTION ....................................................................... 213 8.5.1. NO_DATA_FOUND predefined exception ....................................................... 214 8.5.2. TOO_MANY_ROWS predefined exception ...................................................... 215 8.5.3. DUP_VAL_ON_INDEX predefined exception .................................................... 216 8.5.4. VALUE_ERROR predefined exception ........................................................... 216 8.5.5. INVALID_NUMBER predefined exception ....................................................... 217 8.6. NON-PREDEFINED ORACLE SERVER EXCEPTION ................................................................. 217 8.7. USER-DEFINED EXCEPTION ..................................................................................... 218 8.8. WHEN OTHERS EXCEPTION HANDLER ........................................................................ 219 8.9. USING SQLCODE AND SQLERRM ............................................................................ 220 8.10. RAISE_APPLICATION_ERROR BUILT IN PROCEDURE ....................................................... 221 8.11. EXCEPTION PROPAGATION ..................................................................................... 221 8.11.1. Exception raised in the declaration section ................................................... 222 8.11.2. Exception raised in the executable section ................................................... 223 8.11.3. Exception raised in the exception section ..................................................... 223

9.

PL/SQL CURSORS ................................................................................................225 9.1. 9.2. 9.3.

CURSORS ...................................................................................................... 225 IMPLICIT CURSORS ............................................................................................. 225 IMPLICIT CURSORS ATTRIBUTES ................................................................................ 226

Table of Contents

9.4. IMPLICIT CURSOR EXAMPLE .................................................................................... 228 9.5. EXPLICIT CURSORS ............................................................................................ 228 9.6. OPERATIONS ON EXPLICIT CURSOR ............................................................................. 228 9.6.1. Declaring the cursor ............................................................................... 228 9.6.2. Opening the cursor ................................................................................. 229 9.6.3. Fetching records from the cursor ............................................................... 230 9.6.4. Closing the cursor .................................................................................. 232 9.7. EXPLICIT CURSOR – SIMPLE LOOP .............................................................................. 232 9.8. EXPLICIT CURSOR – WITH GROUP BY CLAUSE .................................................................. 233 9.9. EXPLICIT CURSOR ATTRIBUTES ................................................................................. 234 9.10. USING RECORD VARIABLES WITH EXPLICIT CURSORS ............................................................ 234 9.11. NAVIGATING CURSORS WITH WHILE LOOP ................................................................... 235 9.12. CURSOR FOR LOOP .......................................................................................... 236 9.13. IMPLICIT CURSOR FOR LOOP ................................................................................. 237 9.14. CURSOR RELATED PREDEFINED ORACLE SERVER EXCEPTIONS.................................................... 238 9.14.1. INVALID_CURSOR exception ...................................................................... 238 9.14.2. CURSOR_ALREADY_OPEN exception ............................................................. 238 9.15. PARAMETERIZED CURSORS ..................................................................................... 239 9.16. EXPLICIT CURSOR – FOR UPDATE ............................................................................ 240 9.17. FOR UPDATE CURSOR DECLARATION ......................................................................... 240 9.18. WHERE CURRENT OF CLAUSE .............................................................................. 242 10. TRANSACTION PROCESSING IN PL/SQL .....................................................................242 10.1. 10.2. 10.3. 10.4.

USING COMMIT STATEMENT IN PL/SQL...................................................................... 242 USING ROLLBACK STATEMENT IN PL/SQL ................................................................... 244 USING SAVEPOINT IN PL/SQL .............................................................................. 244 CONCURRENCY CONTROL ...................................................................................... 246

11. ON LINE ANALYTICAL PROCESSING (OLAP) ................................................................248 11.1. DIFFERENCE BETWEEN OLTP AND OLAP ...................................................................... 249 11.2. DATA WAREHOUSE ............................................................................................ 250 11.2.1. Why data warehouse is needed? ................................................................. 250 11.2.2. Characteristics of Data Warehouse: ............................................................ 250 11.2.3. Data Warehousing Terminology.................................................................. 251 11.2.4. Data Collection for Data Warehouse Applications ........................................... 253 11.2.5. Storing of data in Data warehouse .............................................................. 253 11.2.6. Reporting of a Data warehouse application ................................................... 256 11.2.7. Difference between Data Warehouse and Data Mart ........................................ 258 11.2.8. Popular tools available for data warehousing ................................................ 258 11.3. SUMMARY ..................................................................................................... 259 APPENDIX-A .............................................................................................................260 BOYCE CODD NORMAL FORM (BCNF) ................................................................................... 260 EMBEDDED SQL ......................................................................................................... 262 Purpose ............................................................................................................ 262 Why Embedded SQL? ............................................................................................ 262 TIMESTAMPING .......................................................................................................... 265 GLOSSARY ...............................................................................................................268 INDEX .....................................................................................................................272

Relational Database Management System

PURPOSE All business activities deal with a lot of data. Examples: x Schools, colleges and universities store data about students, courses, trainers, etc. x Banks store data about their customers, transactions1 (deposits, withdrawals), loans, etc. A Database Management System (DBMS) provides an efficient storage and data management mechanism. All real life software projects use databases to store huge volumes of data. It is extremely important for software professional to understand the concepts of DBMS. The knowledge of DBMS enables a software engineer to: x x x x x x

Store data Access data Modify data Delete data Share data among the different users Ensure security of the data

In short, DBMS concepts and techniques help in the efficient management of data.

1

Transaction: It is defined as one or more processing steps that are treated as one activity to achieve a desired result. These collections of operations which form a single and atomic logical unit of work are called transactions. A database system ensures proper execution of transactions despite failures – either the whole transaction will executes, or none of it will execute.

1|Page

Infosys Foundation Program

Relational Database Management System

1. Introduction to DBMS 1.1.

What is a Database?

Database can be defined as an organized collection of interrelated data. Example: Consider a bank database. The bank stores data about their customers in a file known as Customer_Details. The Customer_Details file has the following fields: x x x x x x x x

Cust_ID: The customer’s identification number Cust_Last_Name: The customer’s last name Cust_Mid_Name: The customer’s middle name or initials Cust_First_Name: The customer’s first name Account_No: The customer’s account number Account_Type: The type of account that the customer has in the bank (Savings or Checking etc). Bank_Branch: The name of the bank branch Cust_Email: The customer’s email ID

The bank also stores data about the loan(s) taken by its customers. The loan details are stored in the file Customer_Loan. The Customer_Loan file has the following fields: x x x

Cust_ID: The customer’s identification number Loan_No: The loan number to identify the loan Amount_in_Dollars: The amount loaned by the bank to the customer

One customer can avail multiple loans from the bank. The data stored in the two files, Customer_Details and Customer_Loan constitutes interrelated data. Refer to Figure 1-1.

2|Page

Infosys Foundation Program

Relational Database Management System Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type A. Mike 1020Savings Downtown 101Smith 102Smith S. Graham 2348Checking Bridgewater 103Langer G. Justin 3421Savings Plainsboro 104Quails D. Jack 2367Checking Downtown 105Jones E. Simon 2389Checking Brighton

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details file

Cust_ID

Loan_No Amount_in_Dollars 101 1011 8755.00 103 2010 2555.00 104 2056 3050.00 103 2015 2000.00 Customer_Loan records from Customer_Loan file

Customer_Details file Customer_Loan file Bank Database

Figure 1-1: Example of a Bank Database

The data in the database is integrated which means that the database is a collection of several distinct2 files. These distinct files may have some duplicate data but the duplication of data is kept to the minimum. Example: Figure 1-1 shows two files, Customer_Details and Customer_Loan. The two files are distinct in the sense, Customer_Details file contains details about the bank’s customers and the Customer_Loan file contains details about all the loans taken by the customers of the bank. Both the files have the Cust_ID field. In order to sanction a loan to a customer, the bank requires the account number (Account_No) of the customer. The account number information is not required again in the Customer_Loan file, because it can always be discovered by referring to the Customer_Details file. The data present in the database can be shared. Sharing means individual pieces of data in the database can be shared with different users. Each of those users can have access to the same portion of data. They can use the data for different purposes. Refer to Figure 1-2.

2

Distinct: Not identical.

3|Page

Infosys Foundation Program

Relational Database Management System

User in the Bank’s Loan Department Customer_Detail records from Customer_Details file Cust_ID Cust_Last_ Cust_Mid Name _Name 101 Smith A. S. 102 Smith G. 103 Langer D. 104 Quails 105 Jones E.

Cust_First Account Account_ _Name _No Type Mike 1020 Savings Graham 2348 Checking Justin 3421 Savings Jack 2367 Checking Simon 2389 Checking

Bank_Branch Downtown Bridgewater Plainsboro Downtown Brighton

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

User in the Bank’s Fixed Deposit Department

Figure 1-2: Sharing the Account_No from the Customer_Details file

As depicted in Figure 1-2, the Account_No from the Customer_Details file is being accessed by the bank’s Fixed Deposit Department and bank’s Loan Department. The information would typically be used for different purposes by the two classes of users. The users can even concurrently access the database. Concurrent access implies that different users can access the same piece of data at the same time. Points to Remember: x x

4|Page

A database is defined as an organized collection of interrelated data Data in the database: o Is integrated o Can be shared o Can be concurrently accessed

Infosys Foundation Program

Relational Database Management System

1.2.

What is a Database Management System?

A Database Management System (DBMS) is a collection of interrelated files and a set of programs that allow users to access and modify these files. The primary goal of a DBMS is to provide a convenient and efficient way to store, retrieve and modify information. Figure 1-3 shows an end user in the bank database.

3

working with data from the Customer_Details file, maintained

End user working with data from the Bank Database

Bank Database Cust_ID

Account_No

Account_Type Bank_Branch Cust_Last_Name

Figure 1-3: Basic picture of a database system

The database systems are designed to: x Define structures for the storage of data x Provide mechanisms for the manipulation 4 of data x Ensure the safety and security of the data stored even in the cases of system crashes or attempts at unauthorized access x Share data among the different users In short, database systems are designed to manage large volumes of data.

3

End User: The person who will use the system or for whom a system is developed. Example: a bank teller is an end user of a bank system. 4 Manipulation: Data manipulation is addition of new data in to the database or modification of existing data in the database.

5|Page

Infosys Foundation Program

Relational Database Management System

1.3.

File System Interface versus DBMS Interface

In the traditional file approach, data is stored in flat files system, beneath the operating system’s control.

5

which are maintained by the file

Refer to Figure 1-4. The end users use the application programs to perform specific tasks. For example, personnel in the bank’s Loan Department make use of the Loan_Processing system to process the loan(s) of customer(s). These flat files are accessed through application programs.

Loan_Processing (Application Program)

Fixed_Deposit_Processing (Application Program)

Transaction_Processing (Application Program)

File System

Customer_Details.dat

Customer_Loan.dat

Customer_Fixed_Deposit.dat

Customer_Transaction.dat

Figure 1-4: Conventional method of Data Storage

In the DBMS approach, all requests to use the data stored in the database are managed by the DBMS. The end user can make use of either the application programs or the standard SQL6 to access the data. Refer to Figure 1-5.

5

Flat files: A flat file is a file containing records that has no structured interrelationship. Files used in structured programming (SP) projects were essentially flat files. 6 SQL: SQL stands for Structured Query Language. It is a language used by relational databases to fetch, update and manage data. Relational Database is explained in Section 1.10.2.3.

6|Page

Infosys Foundation Program

Relational Database Management System

The application programs are written in some programming language (COBOL, PL/I, C++ etc) or in some higher level fourth generation language7. The standard SQL interface is provided as an integral part of the database system software to access the database.

Loan_Processing (Application Program)

Fixed_Deposit_Processing (Application Program)

Transaction_Processing (Application Program)

DBMS

File System

Customer_Loan Customer_Details

Customer_Transaction Customer_Fixed_Deposit

Bank Database

Figure 1-5: DBMS handles all requests for access to the database

The DBMS acts as a layer of abstraction8 over the file system.

7

Fourth Generation Language (4GL): A 4GL is typically non-procedural and designed so that end users can specify what they want without having to know how the computer will process their requirement. 8 Abstraction: A simplified representation of something complex. It may not be always necessary to know everything in detail instead we may require knowing only the necessary things.

7|Page

Infosys Foundation Program

Relational Database Management System

Example: As depicted in Figure 1-6, in the file system interface, the end user uses an application program written in a high level language such as COBOL, to access the data from the Customer_Details file. The files are maintained by the file system under the operating systems control. In the DBMS interface, the end user uses an SQL interface to place a request to the DBMS to retrieve data from the Customer_Details table9.

End User

End User

Application Programs

Application Programs

Interface through Query (SQL) Interface through high level language

Ex: SELECT Cust_ID, Account_No FROM Customer_Details;

Ex: READ CUSTOMER_DETAILS-FILE AT END STOP RUN

DBMS Operating System (Disk Manager, File Manager)

Operating System (Disk Manager, File Manager) Customer_Details table Customer_Loan table

Customer_Details file Customer_Loan file

Database(Disk Storage)

File System (Disk Storage) File System Interface

DBMS Interface

Figure 1-6: File System Interface versus DBMS Interface

1.4.

Master and Transaction Files

A master file is used to store relatively static data about some entity10. A transaction file contains relatively transient data about a particular data processing task. Example: Consider the banking system consisting of two files, the Customer_Details and the Customer_Transaction file. 9

Table: A table is a two dimensional structure which can have rows and columns. Rows stored in a table are equivalent to records in the flat files. 10 Entity: An entity can be defined as a “thing” or “object” in the real world scenario which can be differentiated from other objects. Example: each person is an entity, and a bank accounts can be considered to be entities.

8|Page

Infosys Foundation Program

Relational Database Management System

In Figure 1-7, Customer_Details is the master file containing all the information about the bank’s customers. In Figure 1-8, Customer_Transaction is the transaction file containing information about all the transactions that a customer makes with the bank. The Customer_Details file is modified rarely. For example, when a new account is created or whenever the existing details of a customer changes. However, for every deposit or withdrawal made by the customer(s), the Customer_Transaction file is updated. Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type A. Mike 1020Savings Downtown 101Smith S. Graham 2348Checking Bridgewater 102Smith G. Justin 3421Savings Plainsboro 103Langer 104Quails D. Jack 2367Checking Downtown 105Jones E. Simon 2389Checking Brighton

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details file

Figure 1-7: Example of master file - Customer_Details

Account Transaction Transaction Transaction_Amount Total_Available_Balance _No _Date _Type _in_Dollars _in_Dollars 102012-Jan-2005 Deposit 5000.00 10000.00 234814-Jan-2005 Withdrawal 2500.00 13500.00 342114-Jan-2005 Deposit 2000.00 27234.00 236716-Jan-2005 Withdrawal 1200.00 12456.00 102017-Jan-2005 Withdrawal 1500.00 8500.00 Customer_Transaction records from Customer_Transaction file Figure 1-8: Example of transaction file - Customer_Transaction

Points to Remember: A master file x Stores relatively static data about an entity x Changes rarely A transaction file x Stores relatively transient data about a particular data processing task x Changes frequently as transactions happen more periodically and in large numbers

9|Page

Infosys Foundation Program

Relational Database Management System

1.5.

Traditional Approach to Information Processing

In the traditional file approach each application maintains its own master file and generally has its own set of transaction files. Files are custom-designed for each application and there is little sharing of data among the various applications. Application programs are data-dependent. It is impossible to change the physical representation (how the data is physically represented in storage) or the access technique (how it is physically accessed) without affecting the application. Refer to Figure 1-9. Although the traditional, file-oriented approach is still widely used, it has some serious disadvantages. The next section deals with the drawbacks of the traditional approach to information processing.

End User uses Application Programs

Loan_Processing Fixed_Deposit_Processing (Application Program) Transaction_Processing (Application Programs) Application programs use Transaction file(s) and Master file(s)

Customer_Transaction Customer_Loan Customer_Fixed_Deposit Transaction file

Customer_Details

Master file

Figure 1-9: The traditional approach to information processing

10 | P a g e

Infosys Foundation Program

Relational Database Management System

1.5.1. Disadvantages of the Traditional Approach to Information Processing The disadvantages of the traditional approach to information processing are discussed below: x Data Security: The data as maintained in the flat file(s) is easily accessible and therefore not secure Example: Consider the banking system. The Customer_Transaction file has details about the total available balance of all customers. A customer wants information about his or her account balance, but in a file system it is difficult to give the customer access to only his or her data in the file. This illustrates that it is difficult to enforce security constraints11 for only certain data items in a file. x Data Redundancy: Often the same information is duplicated in two or more files Refer to Figure 1-10. Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type 101Smith A. Mike 1020Savings Downtown S. Graham 2348Checking Bridgewater 102Smith 103Langer G. Justin 3421Savings Plainsboro 104Quails D. Jack 2367Checking Downtown E. Simon 2389Checking Brighton 105Jones

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details file

Cust_ID

Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent 101Smith A. Mike [email protected] 2011 8055.00 6.5 103Langer G. Justin [email protected] 2015 2060.00 6.5 104Quails D. Jack [email protected] 3010 3050.00 6.5

Redundant Data Customer_Fixed_Deposit records from Customer_Fixed_Deposit file

Figure 1-10: Data Redundancy in files

This duplication of data (redundancy) leads to higher storage and access cost. In addition it may lead to data inconsistency12. Example: Assume that the same data is repeated in two or more files. If a change is made to data in one file, it is required that the change be made to the data in the other file as well. If this is not done, it will lead to an error during access to the data.

11 12

Constraints: restrictions, limitations. Inconsistency: lacking uniformity or agreement.

11 | P a g e

Infosys Foundation Program

Relational Database Management System

Example: As depicted in Figure 1-10, customer’s details such as Cust_Last_Name, Cust_Mid_Name, Cust_First_Name, Cust_Email are stored both in the Customer_Details file and the Customer_Fixed_Deposit file. If the email address of one customer, for example, Langer G. Justin changes from [email protected] to [email protected], the Cust_Email must be updated in both the files; otherwise it will lead to inconsistent data. Although one can design file systems with minimal redundancy, data redundancy is sometimes preferred. Example: Assume that the customer’s details such as Cust_Last_Name, Cust_Mid_Name, Cust_First_Name and Cust_Email are not stored in the Customer_Fixed_Deposit file. If this customer information is required along with the fixed deposit details, it would mean that two different files would need to be accessed, and this would lead to increased overhead. It is thus preferred to store the information in the Customer_Fixed_Deposit file itself. x

Data Isolation: Data isolation means that all the related data is not available in one file. Generally, the data is scattered in various files, and the files may be in different formats, therefore writing new application programs to retrieve the appropriate data is difficult.

x

Program/Data Dependence: Under the traditional file approach, application programs are data-dependent. It is impossible to change the physical representation (how the data is physically represented in storage) or access technique (how it is physically accessed) without affecting the application. Changes in the physical format of the master file(s), such as addition of a data field requires that the change be made in all the application programs that accesses the master file(s). Consequently, for each of the application programs that a programmer writes or maintains, the programmer must also focus on data management issues. There can be no centralized13 execution of the data management functions. Data management is scattered among all the application programs.

Example: Consider the banking system. The master file, Customer_Fixed_Deposit contains details about the customers fixed deposit accounts. Refer to Figure 1-11. A customer’s fixed deposit record is described by: x Cust_ID x Cust_Last_Name x Cust_Mid_Name x Cust_First_Name x Cust_Email x Fixed_Deposit_No x Amount_in_Dollars x Rate_of_Interest_in_Percent 13

Centralized: Systems where decision making, flow of data, or the beginning of activities are initiated at the same central point and disseminated to remote points in the organization

12 | P a g e

Infosys Foundation Program

Relational Database Management System

An application program is available to display all the details about the fixed deposit accounts of all the customers. Assume that a new data field, the Fixed_Deposit_Maturity_Date is added to the master file. The application program must also be altered because it depends on the master file. If for example, the physical format of the master/transaction file such as the field delimiter or record delimiter is changed, it necessitates that the application program which depends on it, also be altered. Cust_ID

Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent [email protected] 2011 8055.00 6.5 101Smith A. Mike [email protected] 103Langer 2015 2060.00 6.5 G. Justin [email protected] 3010 3050.00 6.5 104Quails D. Jack Customer_Fixed_Deposit records from Customer_Fixed_Deposit file

Figure 1-11: Master file - Customer_Fixed_Deposit

x

Lack of Flexibility: The traditional systems are able to retrieve information for predetermined requests for data. If the management needs unanticipated data, the information can perhaps be provided if it is in the files of the system. Extensive programming is however required which may result in a delay. By the time the information is made available, it may no longer be required or useful.

Example: Consider the banking system. An application program is available to generate a list of customer names in a particular area of the city. However the bank manager requires a list of those customers who have an account balance greater than $10,000.00 and reside in a particular area of the city. An application program for this purpose does not exist. The bank manager has two choices: o o

Print the list of customer names in a particular area of the city and then manually find those with an account balance greater than $10,000.00 Hire an application programmer to write an application program

Both the solutions are cumbersome. x

Concurrent Access Anomalies: Many traditional systems allow multiple users to access and update the same piece of data simultaneously. But the interaction of concurrent updates may result in inconsistent data. To guard against this possibility, the system must maintain some form of supervision, but supervision is difficult because data may be accessed by many different application programs and these application programs may not have been coordinated previously.

13 | P a g e

Infosys Foundation Program

Relational Database Management System

Example: Consider the banking system. Assume that the bank manager is analyzing all the transactions made by the customers. At about the same time, a customer accesses his or her account to make a withdrawal. The account is both read by the bank manager and updated by the customer at the same time. This is called concurrent access. Because the customer’s account is being updated at the same time, there is a possibility of the bank manager reading an incorrect balance. These difficulties prompted the development of database systems. Points to Remember: Disadvantages of the traditional file approach: x Data Security – Data easily accessible by all and therefore not secure

1.6.

x

Data Redundancy – Same data is duplicated in two or more files which may lead to update anomalies

x

Data Isolation – All the related data is not available in one file. Thus writing a new application program is difficult

x

Program / Data Dependence – Application programs are datadependent. It is impossible to change the physical representation (how the data is physically represented in storage) or the access technique (how it is physically accessed) without affecting the application

x

Lack of Flexibility – Only pre-determined requests for information can be met. It is not flexible enough to satisfy unanticipated queries

x

Concurrent Access Anomalies – Same piece of data is allowed to be updated simultaneously which leads to inconsistencies

Why DBMS?

DBMS ensures the following: x Application programs and queries 14 are data-independent. They do not depend on any one particular physical representation of data in secondary storage or access technique

14

Queries: A query is a request that a user makes to the database.

14 | P a g e

Infosys Foundation Program

Relational Database Management System

x

Allows for sharing of data among different users. Users are also able to access the database concurrently without facing the issues of inconsistent data

x

Controls redundancy and inconsistency

x

Provides secure access to the database

x

Enforces integrity constraints 15 (also known as business rules) by preventing the entry of invalid information into the database

x

Enables backup and recovery from system crashes

1.7.

Types of Databases

There are two generic database architectures: centralized and distributed. The fundamental differences between the two architectures are: Centralized Distributed

15

Integrity Constraints: A set of restrictions for the correctness and accuracy of data.

15 | P a g e

Infosys Foundation Program

Relational Database Management System

Refer to Figure 1-12. Refer to Figure 1-13. x The entire data is located at a single x The database is stored on several site computers – personal computer or x Allows for greater control over mainframe system accessing and updating data x Computers in a distributed system can communicate with each other via x Vulnerable to failure because they depend on the availability of various communication media. eg resources only at the central site high speed networks or telephone lines databases are x Distributed Example: Consider the banking system where a customer is withdrawing money geographically separated and from ATM machine. The bank has account managed information present for every customer x Distributed databases are separately which needs to be made available at every administered ATM machine. Bank can choose to keep all x Distributed databases have a slower these information at a central place instead interconnection of keeping it at every ATM Machine and Example: Consider a multinational banking sharing it through network. system. The head office of bank is located at Chicago and the branches are at Melbourne and Tokyo. The bank database is distributed across these branches. The branch offices are connected through a network

16 | P a g e

Infosys Foundation Program

Relational Database Management System

Telecom Line, LAN or Direct Line ATM

ATM

Database Server ATM

ATM

Figure 1-12: Centralized Database

17 | P a g e

Infosys Foundation Program

Relational Database Management System

Tokyo

Workstation

Database server

Network

Chicago Database Server Database Server Melbourne

Workstation Workstation

Figure 1-13: Distributed Databases

The distributed databases can be classified as homogeneous16 or heterogeneous17. In a distributed system, it is easy to differentiate between local and global transactions. A transaction is said to be local, if it accesses data from the single site at which the transaction was initiated. A global transaction, on the other hand, accesses data from the sites different from the one at which the transaction was initiated. Example: Consider the multinational banking system where the bank’s head office is located at Chicago and the branch offices are located at Melbourne and Tokyo. The branch offices are connected through a network. Each branch office has its own computer and database consisting of all the accounts maintained at that branch. Refer to Figure 1-14. The head office maintains information about all the branches of the bank. Consider a transaction to add $50 to account number 1020 located at the Downtown bank branch in Tokyo. If the transaction was initiated at the Downtown bank branch in Tokyo, then it is considered as local transaction; otherwise, it is considered as global transaction. A 16 17

Homogeneous: All the same, uniform, harmonized. Heterogeneous: varied, mixed, diverse.

18 | P a g e

Infosys Foundation Program

Relational Database Management System

transaction to transfer $50 from account 1020 to account 2389, which is located at the Brighton bank branch in Melbourne, is a global transaction, since accounts present at two different sites are accessed as a result of its execution. Thus in a distributed database system: x x x x

The various sites are aware of one another Each site provides an environment to execute both local and global transactions If each site runs the same distributed database management software, it is called homogeneous distributed database systems If different sites run different database management software, it is difficult to manage global transactions. Such systems are called multi-database systems or heterogeneous distributed database systems

Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type A. Mike 1020Savings Downtown 101Smith S. Graham 2348Checking Bridgewater 102Smith G. Justin 3421Savings Plainsboro 103Langer 104Quails D. Jack 2367Checking Downtown 105Jones E. Simon 2389Checking Brighton

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details file

Figure 1-14: Customer_Details file

1.8.

Three level architecture for a DBMS

Most commercial databases are based on a three-level architecture model called the ANSI/SPARC model (American National Standards Institute/Standard Planning and Requirements Committee). Refer to Figure 1-15.

19 | P a g e

Infosys Foundation Program

Relational Database Management System

External / View Level (individual user views)

External Schema A

External Schema B

Conceptual level (common user view)

Conceptual Schema

Internal level (storage view)

Internal Schema

External Schema Z

Figure 1-15: The three levels of the architecture

The overall design of the database is called database schema. Schemas are not changed frequently. In general, database systems support one internal schema, one conceptual schema and several external schemas. External / View level: Many users of the database system are not concerned with all the information in the database. Instead, they need to access only a portion of the database. The external level of abstraction simplifies the end user’s interaction with the system. The system may provide many views for the same database. Conceptual / Logical level: The conceptual level describes about the data stored in the database and the relationship among those stored data. This level is used by the Database Administrator18 (s) (DBA), who in turn decide what information must be kept in the database. Internal / Physical level: The internal level is the lowest level of abstraction and describes the data storage and access methods. Database Administrator(s) may be aware of certain details of the physical organization of the data. Example: Consider a banking system. It uses: A bank database x Customer_Details table 18

Database Administrator: The DBA administers the DBMS and is in charge of creating, maintaining and modifying all the three levels of the DBMS. The DBA also controls the allocation of system resources, grants/revokes privileges to/from users and ensures the consistency of the database.

20 | P a g e

Infosys Foundation Program

Relational Database Management System

x

Customer_Transaction table

At the internal level, a Customer_Details record or Customer_Transaction record are stored as a block of consecutive storage locations (for example, words or bytes). As the language compiler hides this level of detail from programmers. Similarly, the database system hides the lowest-level storage details from database programmers. At the conceptual level, the table definition (the attribute19 data type and width definition) and the relationship among the data are explained. Finally at the external level, several views20 of the database are defined, and database end users are able to see these views. These views hide the conceptual level details. It also prevents users from accessing other parts of the database and hence it also provides security mechanism. For example, tellers in the bank will be able to work with only that part of the database that has data on customer accounts; they cannot see information such as salaries of bank employees. Detailed system architecture (Figure 1-16). The database management system (DBMS) is the software that handles all access to the database. Conceptually, what happens is the following: x A user issues an access request for data (typically using SQL) x The DBMS receives that request and analyzes it x The DBMS checks the external schema, external/conceptual mapping, conceptual schema, conceptual/internal mapping and the storage structure definition x The DBMS executes the required operations on the stored database

19

Attribute: The literal meaning is quality; characteristic; trait or feature. Entities are described in a database by a set of attributes. For Example, in the bank system, Cust_ID, Cust_Email etc. describe Customer-Detail entity set. 20 View: A view is a virtual table in the database defined by a query. For more details on views see Chapter 4.

21 | P a g e

Infosys Foundation Program

Relational Database Management System

Mike (User)

External Schemas

Graham (User)

Jack (User)

External View A

Schemas & mappings built & maintained by the DBA

Justin (User)

External View B

External / conceptual mapping

DBMS

Conceptual view

Conceptual Schema

Conceptual / Internal mapping Database Administrator (DBA) Storage structure definition (Internal Schema)

Database ( Internal view) Figure 1-16: Detailed System Architecture

Customer_Loan Cust_ID : 101 Loan_No : 1011 Amount_in_Dollars : 8755.00 CREATE TABLE Customer_Loan ( Cust_ID NUMBER(4) Loan_No NUMBER(4) Amount_in_Dollars NUMBER(7,2)) Cust_ID Loan_No Amount_in_Dollars

TYPE = BYTE (4), OFFSET = 0 TYPE = BYTE (4), OFFSET = 4 TYPE = BYTE (7), OFFSET = 8

External

Conceptual Internal

Figure 1-17: An example of the three levels

22 | P a g e

Infosys Foundation Program

Relational Database Management System

The above figure depicts the three levels of DBMS architecture. The external view is how the customer, Mike A Smith views it. The conceptual view is how the DBA views it. The internal view is how the data is actually stored.

1.9.

DBMS Users

The DBMS users, depending on their level of interaction with the system, fall into one of the three categories. x End User: End users deal only with the highest level of abstraction. End users may not be concerned with or even aware of the details of the DBMS. Typically, the end user is involved in updates to the database or queries on the database. x Application Programmer: Application programmer is responsible for writing database application programs in some programming language such as COBOL, PL/I, C++, or some higher-level fourth generation language. The application programs access the database by issuing the appropriate request to the DBMS x Database Administrator (DBA): The DBA can be a single person or a team comprising a group of persons. The functions of the DBA include the following: o Definition of the Conceptual Schema: It is the DBA’s job to decide exactly what information is to be held in the database. The DBA identifies the entities and the information to be recorded about those entities. This process is usually referred to as logical database design. Once the DBA has decided the content of the database at an abstract level, he creates the corresponding conceptual schema o Definition of the Internal Schema: The DBA must also decide how the data is to be represented in the database. This process is usually referred to as physical database design. Having done the physical design, the DBA must then create the corresponding storage structure definition. In addition, the DBA must also define the associated conceptual/internal mapping o Liaising with users: The DBA liaises with users to ensure that the data they need is available and to write the necessary external schema. In addition, the DBA must also define the associated external/conceptual mapping o Granting of authorization for data access: The granting of different types of authorizations (read, write, etc.) allows the DBA to regulate which parts of the database various users can access o Defining Integrity constraints: The data values stored in the database must satisfy certain consistency constraints Refer to Figure 1-18.

23 | P a g e

Infosys Foundation Program

Relational Database Management System

Works at the highest level of abstraction. Deals with updates and queries

External Level

End User

Conceptual Level

Writes application programs

Application Programmers

Defines the Conceptual, Internal and External schema, controls access privileges to users and ensures consistency of the database Internal Level

Data Base Administrator (DBA) Figure 1-18: Users of DBMS

1.10. Data Models A data model is a conceptual tool to describe data, relationships among data, semantics of data and consistency constraints of the data.

24 | P a g e

Infosys Foundation Program

Relational Database Management System

Two of the widely used data models are discussed in the next sections.

1.10.1.

Object Based Logical Model

Entity-Relationship Model (E-R Model) is a widely known object based logical model. These models are used to portray data at the conceptual and the view level. The E-R Model is based on the inspection of the real world that consists of a collection of various basic objects also called entities, and of relationships among these objects or entities. The E-R Model concept is covered in detail in Chapter 2.

Cust_First_Name Loan_No Account_No

Account_Type

Cust_ID

Cust_Mid_Name

Amount_in_Dollars Bank_Branch

Cust_Last_Name

Cust_Email

Cust_ID

Customer_Details

Customer_Loan

Entity Set Attribute Connects attributes to entity set and entity sets to one another

Figure 1-19: Entity Relationship Model

1.10.2.

Record Based Logical Model

They are used to depict data at the conceptual and the view level. They are used both to specify the general logical structure of the database and to supply a higher-level description of the implementation. There are three widely accepted record based logical models.

1.10.2.1.

Hierarchical Data Model

The hierarchical data model organizes data in a tree like structure. This hierarchy is also called parent – child hierarchy. This structure implies that a record can have repeating information (generally in the child data segments).

25 | P a g e

Infosys Foundation Program

Relational Database Management System

Data is represented by a collection of records (record types). A record type is corresponding to the table in a relational model, and the individual records are corresponding to the rows of the table. A parent child relationship is used to create links between these record types. In a hierarchical database the parent-child relationship is one-to-many. Because of this restriction a child segment can have only one parent segment. IBM's Information Management System (IMS) is a popular example of DBMS based on Hierarchical Database. The databases based on Hierarchical model were popular during 1960s to 1970s. Example: Consider the banking system. Figure 1-20 shows the hierarchical representation of Customer_Details and Customer_Loan records from Customer_Details and Customer_Loan files respectively. Note: Loan (Loan_No: 1011) is shown as taken jointly by Mike A. Smith and Graham S. Smith to explain the difference between hierarchical and network Model. ROOT

101 Smith

A.

Mike 1020 Savings Downtown 102 Smith

[email protected]

S.

Graham 2348 Checking Bridgewater

[email protected]

1011 8755.00 1011 8755.00 103 Langer

2010 2555.00

G.

Justin

3421 Savings

Plainsboro

[email protected]

2015 2000.00

Figure 1-20: Hierarchical Model

1.10.2.2.

Network Data Model

The network model permitted many-to-many relationships in data. A Conference on Data Systems Languages (CODASYL) formally defined the network model in 1971. Data in the network model is represented by a collection of records and the relationships among data are represented by links (pointers). The records in the database are organized as collections of graphs. Example: IDMS.

26 | P a g e

Infosys Foundation Program

Relational Database Management System

Example: Refer to Figure 1-21. Assume that loan (Loan_No:1011) is taken jointly by two customers (Mike A. Smith and Graham S. Smith). In the hierarchical model (Figure 1-20), the loan information has to be repeated for each customer individually because it does not permit many to many relationship. The parentchild relationship is one to many. However in the network model, because it allows many to many relationship, the loan information is stored only once and both the customers can refer to it. 101 Smith

A.

Mike

102 Smith

S.

Graham

[email protected]

1020 Savings Downtown

2348 Checking Bridgewater

[email protected]

3421 Savings

[email protected]

1011 8755.00

2010 2555.00 103 Langer

G.

Justin

Plainsboro

2015 2000.00 104 Quails

D.

Jack

2367 Checking Downtown

[email protected]

105 Jones

E.

Simon

2389 Checking Brighton

[email protected]

2056 3050.00

Figure 1-21: Network Model

1.10.2.3.

Relational Data Model

The relational model uses a set of tables (relations), each of which is assigned a unique name, to represent both data and the relationships among those data. A table has a specified number of columns but can have any number of rows. Rows stored in a table resemble records from flat files. A row in a table represents a relationship among a set of values. Refer to Figure 1-22, a row in the Customer_Loan table gives the details of a loan taken by a customer. Example: Customer (Cust_ID: 101) has taken a loan (Loan_No: 1011) of amount (Amount_in_Dollars: 8755.00) Since a table is a collection of such relationships, there is a close correspondence between the concept of table and the mathematical concept of relation, from which the relational data model takes its name.

27 | P a g e

Infosys Foundation Program

Relational Database Management System

Attributes / Columns / Fields Rows / Records / Tuples

Customer_Loan records from Customer_Loan table Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

No. of Records / Rows / Tuples: Cardinality of the Relation

No. of Attributes / Columns / Fields : Degree of the Relation

Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type A. Mike 1020Savings Downtown 101Smith 102Smith S. Graham 2348Checking Bridgewater G. Justin 3421Savings Plainsboro 103Langer D. Jack 2367Checking Downtown 104Quails E. Simon 2389Checking Brighton 105Jones

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details table

Figure 1-22: Relations - Customer_Loan and Customer_Details

1.10.2.4.

Structural Terminology

Formal Relational Term

Informal Equivalence

Relation Tuple Cardinality of a Relation Attribute Degree of a Relation Primary Key Domain

Table Row or Record Number of rows Column or Field Number of Columns Unique Identifier A pool of values from which the values of specific attributes of specific relations are taken

1.11. RDBMS Relational Database: Any database in which the data is logically organized based on relational model. RDBMS: It is a DBMS which manages the relational database.

28 | P a g e

Infosys Foundation Program

Relational Database Management System

An RDBMS is a category of DBMS that stores data in related tables.

1.12. Some Popular RDBMS packages RDBMS Package

Company / Corporation

Oracle Sybase Informix

Oracle Corp. Sybase Inc. Informix Software Inc.

MySQL DB2 Ingres II SQL Server

It is an Open source IBM Computer Associates International Inc. Microsoft

1.13. Application Areas of RDBMS Databases are widely used in real life applications such as: 1. Airlines: For reservations and schedule information. 2. Banking: For customer information, accounts, loans and banking transactions 3. Universities: For student information, course registrations and grades. 4. Telecommunications: For keeping track of calls made by customers, generating monthly bills of the customer and storing information about the communication networks. 5. Sales: For customer, product and purchase information in any industry.

1.14. Keys Candidate Key: A candidate key of a table is defined as a set of one or more attributes of the table that can uniquely identify a row in a given table. Example: Consider the Customer_Details table shown in Figure 1-23.

29 | P a g e

Infosys Foundation Program

Relational Database Management System Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type 101Smith A. Mike 1020Savings Downtown 102Smith S. Graham 2348Checking Bridgewater 103Langer G. Justin 3421Savings Plainsboro 104Quails D. Jack 2367Checking Downtown 105Jones E. Simon 2389Checking Brighton

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details table

Figure 1-23: Customer_Details Table

Assumptions: x One customer can have only one account Example: As depicted in Figure 1-23, customer, Mike A. Smith has a Savings account with Account_No: 1020. Similarly, customer, Justin G. Langer has a Savings account with Account_No: 3421. x An account can belong to only one customer Example: Account_No: 2367 belongs to Jack Quails. x

Two rows cannot have the same values in the attributes, Cust_Last_Name and Cust_First_Name attributes. if two rows have the same value for Cust_Last_Name, they differ in their values for Cust_First_Name

In the Customer_Details table, there are four candidate Keys. Out of the four, three are simple candidate keys and one is a composite candidate key. They are: Simple Candidate Key: A candidate key comprising of one attribute only. Example: x Account_No x Cust_ID x Cust_Email Composite Candidate Key: A candidate key comprising of two or more attributes. Example: {Cust_Last_Name, Cust_First_Name}. Cust_Last_Name alone is not sufficient to distinguish between rows in the table. But along with Cust_First_Name it can distinguish between rows in the table. Their combination constitutes a candidate key. Invalid Candidate Key: A candidate key should be comprised of a set of attributes that can uniquely identify a row. A subset of the attributes should not possess the unique identification property. Example: the combination of {Account_No, Account_Type} is an invalid candidate key. Although the attributes Account_No and Account_Type together can

30 | P a g e

Infosys Foundation Program

Relational Database Management System

distinguish rows, their combination does not form a candidate key, since the attribute Account_No alone is a candidate key. Candidate keys are identified during the design of the database. Primary Key: During the creation of the table (the implementation phase), the database designer chooses one of the candidate key (amongst the several available) to uniquely identify rows in the Customer_Details table. The candidate key so chosen is called the primary key. Example: The database designer chooses Account_No to differentiate between rows in the Customer_Details table. Account_No is the primary key for the Customer_Details table. Refer to Figure 1-24. Entity integrity constraint: The primary key of a table is always not null and unique. The attributes which constitute the primary key cannot have duplicate values in the rows of the table. It is mandatory to provide input for the primary key attributes. This constraint is referred to as the entity integrity constraint. A null value is used to represent an unknown value. It is not a blank character or a zero value. Primary Key is usually chosen from amongst the several candidate keys. The preference is given to a candidate key which consists of minimal number of attributes. Example: It is preferable to select the candidate key (Account_No) as the primary key as opposed to the candidate key (Cust_Last_Name, Cust_First_Name). Points to Remember: x Candidate key is used to uniquely identify a row in a given table. A candidate key can be a set of one or more attributes. x A table can have more than one candidate keys x Candidate keys are identified during the design phase x One of the candidate key is chosen as primary key by the database designer while creating the table x It is preferred to select a candidate key which is having a minimal number of attributes to function as a primary key Guidelines to select a primary key: x Give preference to numeric column(s). The search algorithm performs better when the primary key is numeric x Give preference to a single attribute. The search algorithm gives better output with a single attribute primary key than with a composite attribute primary key 31 | P a g e

Infosys Foundation Program

Relational Database Management System

x

x

Give preference to the minimal composite key. A composite key is a collection of two or more attributes. Example: if the candidate keys are {x1,x2,x3} and {y1,y2}, the composite key {y1,y2} is the minimal composite key and will therefore be chosen as the primary key Primary keys are chosen according to business convenience Primary Key of the table, Customer_Details

Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type 101Smith A. Mike 1020Savings Downtown S. Graham 2348Checking Bridgewater 102Smith G. Justin 3421Savings Plainsboro 103Langer D. Jack 2367Checking Downtown 104Quails 105Jones E. Simon 2389Checking Brighton

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details table

Figure 1-24: Customer_Details Table with Account_No as Primary Key

Foreign Key: A foreign key is defined as a set of attribute(s) in a table with a restriction that its value should be matched with the values of a candidate key in the same or another table. The foreign key attribute(s) can have duplicate or null values. Example: Consider the banking system where the details of the customers of the bank are stored in the Customer_Details table. Whenever the customer makes a transaction for example a deposit or withdrawal of funds from the bank, the transaction is recorded in the Customer_Transaction table. A transaction is allowed only if the customer has an account in the bank. The account number information is stored in the Customer_Details table. This information is referred to for every transaction. In case the account number does not exist, the transaction will not be allowed.

32 | P a g e

Infosys Foundation Program

Relational Database Management System Candidate Key of the table, Customer_Details

Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type 101Smith A. Mike 1020Savings Downtown S. Graham 2348Checking Bridgewater 102Smith G. Justin 3421Savings Plainsboro 103Langer D. Jack 2367Checking Downtown 104Quails E. Simon 2389Checking Brighton 105Jones

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details table

Account_No in Customer_Transaction table is the Foreign Key referring to Account_No in Customer_Details table

Account Transaction Transaction Transaction_Amount Total_Available_Balance _No _Date _Type _in_Dollars _in_Dollars 102012-Jan-2005 Deposit 5000.00 10000.00 234814-Jan-2005 Withdrawal 2500.00 13500.00 342114-Jan-2005 Deposit 2000.00 27234.00 236716-Jan-2005 Withdrawal 1200.00 12456.00 102017-Jan-2005 Withdrawal 1500.00 8500.00 Customer_Transaction records from Customer_Transaction table

Figure 1-25: Demonstration of Referential Integrity

In the example above, Account_No attribute of Customer_Transaction table is the foreign key referring to Account_No of Customer_Details table. The foreign key attributes can have duplicate values. In the example above, Account_No 1020 occurs in two rows of the table. The foreign key attributes can have NULL values. As per the referential integrity constraint, the values of a foreign key attribute must match the values of the values of the corresponding candidate key. The relation that contains the foreign key is the referencing relation (child table) and the relation that contains the corresponding candidate key is the referenced relation (parent table). Invalid foreign key value: A value of 1050 in the Account_No attribute of Customer_Transaction table is invalid because a value of 1050 is not present in the Account_No attribute of the Customer_Details table. Self Referencing: A table might include a foreign key, the values of which are required to match the value of a candidate key in the same table. This is known as self referencing.

33 | P a g e

Infosys Foundation Program

Relational Database Management System Manager_ID is the Foreign Key referencing Employee_ID

Employee_ID is the Candidate Key for the table Employee Employee_ Employee_ _ID Last_Name Mid_Name 2345Atherton S. 3556George A. 3620Jackson G. 22789Stevenson S. 23456Smith A. 30456Langer C. 31234Frost J. 32345Austen L.

Employee_ Employee_Email First_Name Cindy [email protected] Henry [email protected] Matt [email protected] Crystal [email protected] Luther [email protected] Christiana [email protected] Robert [email protected] Jane [email protected]

Department Grade Manager_ID HR Finance Design HR Finance HR Finance Design

1 1 1 2 2 3 3 2

NULL NULL NULL 2345 3556 2345 3556 3620

Records from Employee_Manager Table

Figure 1-26: An example of Self Referencing

As can be seen from Figure 1-26, each employee belongs to a department and department has manager. For example Cindy S. Atherton, Henry A. George and Matt G. Jackson are the managers of the HR, Finance and Design department respectively. A NULL value is unknown value or inapplicable value. A NULL value does not mean a blank value or zero. All employees including the manager have a unique Employee_ID. The Manager_ID attribute of the Employee_Manager table can only have any existing value from the Employee_ID attribute. Manager_ID is therefore the foreign key referencing the candidate key, Employee_ID. Example: Assume that the employees present in the organization have to undertake a course. Details of the courses are available in the table Course_Description as shown in the Figure 1-27. Some courses have a prerequisite, for example an employee must go through the Computer Hardware and System Software Concepts course before taking up the Programming Fundamentals course. Here, the attribute Prerequisite, is the foreign key referencing the candidate key, Course_ID. Prerequisite is the Foreign Key referencing Course_ID

Course_ID is the Candidate Key for the Table

Course_ID

Course_Title

Duration_in_ Prerequisite Days 121Computer Hardware & System Software Concepts 4 NULL 122Programming Fundamentals 7 121 123Relational Database Management System 7 122 124User Interface Design 1 122 125Object Oriented Concepts 1 122

Figure 1-27: Another Example of Self Referencing Table

34 | P a g e

Infosys Foundation Program

Relational Database Management System

Super Key: Any superset of a candidate key is a super key. Example: Consider the following sets comprising of attributes from the Customer_Details table. x {Account_No} x {Account_No, Account_Type} x {Account_No, Account_Type, Bank_Branch} {Account_No} is a candidate key for the Customer_Details table. {Account_No, Account_Type} is a superset of {Account_No} and therefore it is a super key for the Customer_Details table. Same is the case for the set {Account_No, Account_Type, Bank_Branch}. A super key may have unnecessary attribute(s). Although the combination of {Account_No, Account_Type} is a super key but {Account_Type} is an unnecessary attribute as {Account_No} is sufficient to uniquely distinguish between rows in the Customer_Details table. Non-Key Attributes: The attributes other than the candidate key attributes in a table/relation are called non-key attributes. Example: Cust_Last_Name, Cust_Mid_Name, Cust_First_Name, Bank_Branch, etc. are nonkey attributes in the Customer_Details table. Points to Remember: x A foreign key is defined as a set of attributes of a table, the values of which are required to match values of some candidate key in the same or another table x A referential constraint is defined as a restriction that values of a given foreign key must match the values of the corresponding candidate key x A table which has a foreign key which refers its own candidate key is known as self-referencing table x The foreign key attribute(s) can have duplicate or null values x Any superset of a candidate key is a super key x The attributes other than the candidate key attributes in a table or relation are called non-key attributes

35 | P a g e

Infosys Foundation Program

Relational Database Management System

1.15. Summary x x

x

x

x

x

x

x x x

A database is an organized collection of interrelated data Data in the database: o Is integrated o Can be shared o Can be concurrently accessed The database systems are designed to: o Define structures for the storage of data o Provide mechanisms for the manipulation of data o Ensure the security of the stored data even in case of system crash or attempts of unauthorized access o Share data among the different users A master file o Stores relatively static data about an entity o Changes rarely A transaction file o Stores relatively temporary data about a particular data processing task o Changes frequently as transactions happen periodically and in large numbers Disadvantages of the traditional file approach: o Data security o Data redundancy o Data isolation o Program / data dependence o Lack of flexibility o Concurrent access anomalies DBMS ensures the following: o Data independence o Allows for sharing of data among the different users o Allows concurrent access to the database o Controls redundancy and inconsistency o Provides a secure access to the database o Enforces integrity constraints by preventing the entry of invalid information into the database o Enables backup and recovery from system crashes Centralized Database: All data is located at a single site Distributed Database: The database is stored on several computers Three level architecture for a DBMS o External/View Level: Enables users to view/access only a part of the database

36 | P a g e

Infosys Foundation Program

Relational Database Management System

o

x

x

x x x

Conceptual/Logical level: Describes what data is stored in the database and what relationships exist among those data o Internal/Physical level: Describes the data storage and access methods DBMS Users: o End Users: Works at the external level and generally makes updates to the database or executes queries on the database o Application Programmer: Writes application programs o Database Administrator: Defines the conceptual, internal and external schema, control access privileges to/from users and ensures the consistency of the database Data Models: Is a conceptual tool which can be used to describe data, data relationships, data semantics and consistency constraints o Object Based Logical Model: E-R Model o Record Based Logical Model: ƒ Hierarchical Data Model: IMS ƒ Network Model: IDMS ƒ Relational Data Model: Relational data model uses a collection of tables (relations) to represent data and the relationships among those data. Example: Oracle, Sybase Relational Database: Any database in which the organization is based on relational data model RDBMS: A DBMS that manages the relational database Keys: o A candidate key is defined as a set of one or more attributes of the table that can uniquely identify a row in a given table o A table can have more than one candidate keys o Candidate keys are identified during the design phase o While creating tables the database designer chooses one candidate key from amongst the several available, to serve as a primary key o Any one of the candidate key can be selected as a primary key. Preference should be given to the candidate key with minimal number of attributes o A foreign key is a set of attribute(s) of a table, the values of which are required to match the values of a candidate key in the same table or in a different table o The constraint that the values of a given foreign key must match the values of the corresponding candidate key is known as referential integrity constraint o If there is a table which has a foreign key and it is referring to its own candidate key then this table is called self-referencing table o Any superset of a candidate key is a super key o The attributes other than the primary key attributes in a table/relation are called non-key attributes

37 | P a g e

Infosys Foundation Program

Relational Database Management System

2. Entity-Relationship (E-R) Modeling 2.1.

Introduction

Generally the business scenarios are complex in nature. A software application designer 21 who is not an expert in a particular business domain may fail to capture the exact business requirement to build a software application. This is one of the prominent causes of software project failure. Reviewing of requirement specification22 document with the business users23 (who are experts in their respective business domains) will not yield the expected results because of the following reasons: x

The application designer usually writes the requirement specification documents using software technology jargons24 which are difficult to understand by the business users

x

Usually these documents are quite lengthy and the business users will not be able to devote enough time to read and review the complete document

It is always better to represents all the business rules25 in pictorial format so that the business users can understand and review the business rules easily and correctly. One such technique, which is commonly used for designing of the databases, is EntityRelationship Modeling (E-R Modeling). The diagram used in this technique is called Entity Relationship Diagram (ERD). In Infosys, 60% to 70% of projects use this technique to capture the requirement specification for the database application design and development. Entity Relationship Diagram (ERD) was first defined in 1976 by Peter Chen. Since then James Martin and Charles Bachman have added some small refinements to the basic ERD principles. Due to its simplicity and ease of use, this technique attracted considerable attention during early 1990’s in both industry and research community.

21

Software application designer: The person who designs software applications. Requirement specification: A document which contains requirement for a specific application. 23 Business users: The users who owns the application. 24 Jargons: The specialized or technical language of a trade, profession, or similar group. 25 Business rules: The rules/policies which govern the functioning of the application. 22

38 | P a g e

Infosys Foundation Program

Relational Database Management System

2.2.

Entity and Relationship

Before learning E-R diagrams technique in detail let us understand Entity and Relationship. Entity Entity is a common word anything real or abstract26, about which we want to store data. Entity types fall into five categories: roles, events, locations, tangible27 things or concepts.

Some examples of entities are employee, hockey match, campus, book and department. Department is an entity, and Education & Research, HR, Finance, etc., are instances28 of the department entity. Similarly Henry, Luther, Crystal, Jane etc., are instances of employee entity. Attribute An attribute describes the property of an entity. An entity could have multiple attributes. Example: For an entity car, the attributes would be the color, model number, number of doors etc. Relationship Relationship is a natural association which exists among one or more entities. Example: Employee borrows books from the library.

2.3.

Cardinality of a relationship

Cardinality of relationship defines the type of relationship between two participating entities. Example: One employee can take many books from library. One book can be taken by only one employee. Cardinality of relationship between employee and book is “one to many”. One person can sit on only one chair at any point of time. One chair can accommodate only one person in a given point of time. This relationship has “one to one” cardinality.

26

Abstract: Conceptual/theoretical object. Tangible: Physical object. 28 Instance: Occurrence. 27

39 | P a g e

Infosys Foundation Program

Relational Database Management System

Points to Remember: Cardinality of relationship is different from cardinality of Relation (Cardinality of relation was discussed in chapter one which refers to number of rows in given relation). There are four types of cardinality relationship.

2.3.1.

One to One Relationship

In this relationship, one instance of entity is related to another instance of the entity. Both participating entities have a one to one relationship. Example1: One person (P1,P2,P3,P4) can sit on only one chair at any point of time. And also one chair (C1,C2,C3,C4) can accommodate a maximum of one person at any given time. In this relationship both the participating entities have one-to-one relationship.

P1

C1

P2

C2

P3

C3

P4

C4

Figure 2-1 : One to One Relationship

Example2: One country can have only one citizen as its president and one citizen can become president of only one country.

2.3.2.

One to Many Relationship

One instance of entity is related to multiple instance of another entity. Example1: One organization can have many employees but one employee can work only for one organization. O1 O2 O3

E1 E2 E3 E4 E5

Figure 2-2 : One to Many Relationship

40 | P a g e

Infosys Foundation Program

Relational Database Management System

Example2: One warehouse can be used to store many parts but one part can be stored only in one warehouse. In this example one instance of warehouse accommodates many parts. Hence the relationship between warehouse and part is one-to-many P1

W1

P2

W2

P3

W3

P4 P5

Figure 2-3 : One to Many Relationship

2.3.3.

Many to One Relationship

This is the reverse of the one to many relationship. Example: Many employees can work for only one department but one department can have many employees. The relationship between employee and department is many to one. E1

D1

E2

D2

E3

D3

E4 E5

Figure 2-4 : Many to One Relationship

2.3.4.

Many to Many Relationship

In this many to many relationship multiple instances of one Entity are related to multiple instances of another Entity. Example1: One student is enrolled for many courses and one course is enrolled by many students.

41 | P a g e

Infosys Foundation Program

Relational Database Management System

S1

C1

S2

C2

S3

C3

S4

C4

Figure 2-5: Many to Many Relationship

Example2: One student trained by many instructors and one instructor trains students.

S1

I1

S2

I2

S3

I3

S4

I4

many

Figure 2-6 : Example of Many to Many Relationship

Many to many relationship is superset of all the above mentioned relationships. All other relationships are special case of many to many relationships.

2.4.

E-R Diagram Notations Entity An entity is an object or concept about which business users store information. Weak entity A weak entity is defined as an entity which is dependent on another entity for its existence. For an example bank branch entity depends upon bank entity for its existence. Without bank entity it is impossible to identify bank branch uniquely.

42 | P a g e

Infosys Foundation Program

Relational Database Management System

Attributes Attributes are the properties or characteristics of an entity. Key attribute A key attribute of an entity is the unique, distinguishing characteristic of the entity. For example, employee number of an employee entity might be the employee's key attribute. Multivalued attribute A multivalued attribute can take more than one value. For example skill set attribute of an employee entity can have multiple values. Derived attribute A derived attribute is an attribute whose value is based on the value of another attribute. For example, the monthly salary attributes value is based on the value of employee's basic salary attribute and house rent allowance attribute. Relationships Relationships demonstrate how two entities share information in the database structure.

Cardinality Cardinality of a relationship is used to specify how many instances of an entity is related to one instance of another entity. M,N both values represent ‘MANY’ and 1 represents ‘ONE’ cardinality

43 | P a g e

Infosys Foundation Program

Relational Database Management System

Recursive relationship In some cases, entities can have relationship with itself. For example, employees can supervise other employees

Figure 2-7 : E-R Diagram Notations

44 | P a g e

Infosys Foundation Program

Relational Database Management System

2.5.

Modeling using E-R Diagrams

A model 29 is an abstract form of any system or process that hides the unnecessary details, while highlighting those details important to the application. We have noticed the model of huge campuses or buildings which help to visualize the structure, before they are built. On similar lines, we can also model our software applications before they are developed. This will help the business users to visualize the application before it is developed and propose changes, if it is not as per their requirement. Model of a City

Modeling the databases using E-R diagrams is called as E-R Modeling. This technique is also known as Top-Down approach, because one need not identify all the attributes to build model of the system using this technique.

2.5.1.

Steps in E-R Modeling

Usually the following six steps are followed to generate E-R Models. a) Find the entities: Look for general nouns in requirement specification document which are of business interest to business users b) Find the relationships: Identify the natural relationship and their cardinalities between the entities c) Find the key attributes for every entity: Identify the attribute or set of attributes which can identify instance of entity uniquely d) Identify other relevant attributes: Identify other attributes which are interest to business users and want to store the information in database e) Complete E-R diagram: Draw E-R diagram along with all attributes including key attribute f) Review your results with your business users - Look at the list of attributes you associated with each entity to see if anything is missing. Note that while this is an iterative30 approach and one cannot come to a final E-R model in a single step. It requires a great deal of patience and numerous revisions before the model is created.

2.6.

Case Study 1: Problem Statement

Let us apply the above methodology to model, University database application. x 29 30

An university has many departments

Model: A representation or a scaled down structure of an object. Iterative: Process of repeating the same task.

45 | P a g e

Infosys Foundation Program

Relational Database Management System

x x x x

Each department has multiple instructors; one among them is the head of the department An instructor belongs to only one department Each department offers multiple courses, each of which is taught by a single instructor A student may enroll for many courses offered by different departments

2.6.1.

Case Study 1: Solution

Step 1: Identify the Entities Generally the entities will have multiple instances in a given business scenario. As per this guideline, we can identify the following entities: 1. DEPARTMENT 2. COURSE 3. INSTRUCTOR 4. STUDENT “Head of the department” is NOT an Entity. It is a relationship between the Instructor and department entities. Note: One may be tempted to identify “university” as an entity, but it is not an entity because it has only one instance.

Step 2: Find relationships We can derive the following relationships: 1. The department offers multiple courses and each course belongs to only one department. So the cardinality between department and course is one to many.

Department

1

Offers

N

Course

2. One course is enrolled by multiple students and also one student enrolls for multiple courses. So the relationship is many to many. Course

M

Enrolled by

N

Student

3. One department has multiple instructors and also one instructor belongs to one and only one department. So the relationship is one to many. Department

46 | P a g e

1

Has

N

Instructor

Infosys Foundation Program

Relational Database Management System

4. Each department has one “Head of Department” and one Instructor is “Head of Department” for only one department, hence the relationship is one to one. Department

1

Headed by

1

Instructor

5. One course is taught by only one instructor but one instructor teaches many courses, hence the relationship between course and instructor is many to one. Course

N

Is taught by

1

Instructor

The relationship between instructor and student need NOT be defined in the diagram. The reasons are as follows: 1. There is no business significance of this relationship. 2. We can always derive this relationship indirectly through course and instructor, and course and students. Step 3: Identify the key attributes 1. DName (Department Name) which identifies the department uniquely will be the key attribute for “DEPARTMENT” entity. 2. STUDENT# (Student Number) which identifies the student entity uniquely will be the key attribute for “STUDENT” entity. 3. IName (Instructor Name) is the key attribute for “INSTRUCTOR” entity. 4. COURSE# (Course Number) is the key attribute for COURSE entity. Step 4: Identify other relevant attributes 1. For the “Department” entity, the relevant attribute other than “Department Name” is “Location”. 2. For the “Course” entity, the relevant attributes other than “Course Number” are “Course Name”, “Duration” and “Pre Requisite”. 3. For the “Instructor” entity, the relevant attributes other than “Instructor Name” are “Room Number” and “Telephone Number”. 4. For the “Student” entity, the relevant attributes other than “Student Number” are “Student Name” and “Date of Birth”. Step 5: Complete E-R diagram After considering all the above mentioned guidelines one can generate the E-R Model for the university database as shown in Figure 2-8.

47 | P a g e

Infosys Foundation Program

Relational Database Management System

Department Name

Location

Department

Pre Requisite

1

1

1 Headed by

Offers

Has

Course# 1

N N Duration

Course Name

Course

N

Is taught by

N

1 Instructor

Instructor Name

Room# Telephone#

Enrolled by

M

Student Date of Birth

Student#

Student Name

Figure 2-8 : E-R Diagram for University

2.7.

Case Study 2: Problem Statement

Let us consider a university library scenario for developing the E-R model. Assume in a university x There are multiple libraries and each library has multiple student members x Students can become members to multiple libraries by paying appropriate membership fee x Each library has its own set of books. Within the library these books are identified by a unique number x Students can borrow multiple books from subscribed library

48 | P a g e

Infosys Foundation Program

Relational Database Management System

x

Students can order books using inter-library loan. This can be useful if a student wishes to borrow books from a library where s/he is not a member. The student orders the books through a library where s/he is a member

2.7.1.

Case Study 2: Solution

Step 1: Identify the entities Generally the entities will have multiple instances in a given business scenario. As per this guideline, we can identify the following entities: 1. LIBRARY 2. BOOK 3. STUDENT In this business scenario BOOK is a weak entity because without knowing the library details, book cannot be identified independently. Book is always associated with its library. Step 2: Find Relationships We can derive the following relationships: 1. One library has many member students and each student can become member of many libraries, hence the cardinality between library and student is many to many. 2. One book belongs to only one library and one library can have multiple books, hence cardinality between library and book is one to many. 3. One library can loan multiple books and each book can be loaned to only one library, hence the cardinality between library and book is one to many. 4. One student can borrow multiple books and one book can be borrowed by only one student, hence the cardinality between student and book is many to one. Step 3: Identify the key attributes 1. Library# (library ID) is the key attribute for the entity “Library”, as it identifies the library uniquely. 2. Book# (book ID) and Library# are together key attributes for “Book” entity. 3. Student# (student number) is the key attribute for “Student” entity. Step 4: Identify other relevant attributes 1. For the “Library” entity, the relevant attributes other than “Library#” would be “Library Name” and “Location”. 2. For the “Book” entity, the relevant attributes other than “Book#” would be “Title” and “ISBN”. 3. For the “Student” entity, the relevant attribute other than “Student#” would be “Student Name” and “Date of Birth”. Step 5: Complete E-R diagram

49 | P a g e

Infosys Foundation Program

Relational Database Management System

After considering all the above mentioned guidelines, one can generate the E-R Model for above mentioned university library business scenario as shown in Figure 2-9. Library Name

Library#

Location

Library

1

1

M

Loans

Subscribed by

Has

Student#

M

N Student Name

Date of Birth

Student

1

M

Borrows

N Book

Book#

ISBN Title

Figure 2-9: E-R Diagram for University Library

2.8.

Case Study 3: Problem Statement

Let us consider a banking business scenario for developing the E-R model. Assume x There are many banks in the city and each bank has many branches. Each branch of the bank has multiple customers x Customers have opened various types of accounts x Some customers also had taken different types of loans from these branches of the bank

50 | P a g e

Infosys Foundation Program

Relational Database Management System

x

One customer can have many accounts and loans

2.8.1.

Case Study 3: Solution

Step 1: Identify the entities Generally the entities will have multiple instances in a given business scenario. As per this guideline, we can identify the following entities: 1. BANK 2. BRANCH 3. CUSTOMER 4. ACCOUNT 5. LOAN

BRANCH is considered a weak entity because without knowing the BANK, we cannot define the BRANCH independently. BRANCH is always associated with its BANK name. Example: Citi bank Branch, ICICI Bank Branch or State Bank of India branch. Note: One may be tempted to identify “City” as an entity, but it is not an entity because it has only one instance.

Step 2: Find Relationships We can derive the following relationships: 1. One bank has multiple branches and also the branches belong to only one bank, so the cardinality of the relationship between bank and branch is one to many. 2. One branch gives many loans and also each loan is associated with one branch, so the cardinality of the relationship between branch and loan is one to many. 3. One branch maintains multiple accounts and also each account is associated to one and only one branch, so the cardinality of the relationship between branch and account is one to many. 4. One Loan can be availed by multiple customers, and also each customer can avail multiple loans, so the cardinality of the relationship between loan and customer is many to many. 5. One customer can hold multiple accounts, and also each account can be held by multiple customers, So the cardinality of the relationship between customer and account is many to many. Step 3: Identify the key attributes 1. BankCode (Bank Code) is the key attribute for the entity “Bank”, as it identifies the bank uniquely. 2. Branch# (Branch Number) and BankCode (Bank Code) are together key attributes for “Branch” entity.

51 | P a g e

Infosys Foundation Program

Relational Database Management System

3. Customer# (Customer Number) is the key attribute for “Customer” entity. 4. Loan# (Loan Number) is the key attribute for “Loan” entity. 5. Account# (Account Number) is the key attribute for “Account” entity. Step 4: Identify other relevant attributes 1. For the “Bank” entity, the relevant attributes other than “BankCode” would be “Name” and “Address”. 2. For the “Branch” entity, the relevant attributes other than “Branch#” would be “Name” and “Address”. 3. For the “Loan” entity, the relevant attribute other than “Loan#” would be “Loan Type”. 4. For the “Account” entity, the relevant attribute other than “Account#” would be “Account Type”. 5. For the “Customer” entity, the relevant attributes other than “Customer#” would be “Name”, “Phone” and “Address”. Step 5: Complete E-R diagram After considering all the above mentioned guidelines, one can generate the E-R Model for above mentioned banking business scenario as shown in Figure 2-10.

52 | P a g e

Infosys Foundation Program

Relational Database Management System

Name

Bank Code

Address

Bank 1

has

N Name Branch#

Branch Address 1

1

Offers

Maintains

N

N

Loan#

Account No

Loan

Loan Type

Account

M

M

Availed By

Account Type

Held By

N

N Customer

Customer#

Address

Name

Telephone#

Figure 2-10 : E-R Diagram for Bank

53 | P a g e

Infosys Foundation Program

Relational Database Management System

2.9.

Transforming an E-R Model into Physical Database Design

E-R model helps mainly in capturing and analyzing the requirements. It can also be used during the design of the physical database. The following is a set of guidelines for converting an E-R model into a physical database design. 1. Each entity represented in the E-R model can be defined as a table in the relational schema. All attributes of the entity will become columns of the table. As per this guideline we can translate BANK, BRANCH, LOAN, ACCOUNT and CUSTOMER entities to following tables. Additional columns can be added to the below tables as per the business requirements at the later stage.

BankCode

Loan#

BANK Name

Address

LOAN LoanType

Customer#

BankCode

BRANCH Branch# Name

Address

ACCOUNT Account# AccountType

Name

CUSTOMER Telephone#

Address

Figure 2-11 : Entity based tables

Weak entity types are converted into a table of their own, with the primary key of the strong entity acting as a foreign key in the table. This foreign key along with the key of the weak entity form the composite primary key of this table. As per this guideline, a “Branch” table is created with the above mentioned structure, with BankCode and Branch# together as composite primary key.

54 | P a g e

Infosys Foundation Program

Relational Database Management System

2. Each relationship can be defined as separate table in relational schema. Key attributes of participating entities 31 will become key attribute of the relationship. As per this guideline we can define LOAN_OFFERING_DETAILS table between BRANCH and LOAN entities, BRANCH_ACCOUNT_DETAILS between BRANCH and ACCOUNT entities, LOAN_DETAILS table between LOAN and CUSTOMER entities, ACCOUNT_DETAILS tables between ACCOUNT and CUSTOMER entities. BANK and BRANCH relationship table is not defined because this information is already captured in BRANCH table. Usually relationship based tables will have their own attributes in addition to prime attributes of participating entities. For example, LOAN_DETAILS table contain prime attributes from LOAN and CUSTOMER table ( which together act as composite primary key) in addition to other attributes such as DateofSanction, IntRate, LoanAmount, Duration etc.

LOAN_OFFERING_DETAILS BankCode Branch# Loan#

BRANCH_ACCOUNT_DETAILS BankCode Branch# Account#

LOAN_DETAILS Loan#

Customer#

Dateof Sanction

IntRate

LoanAmount

Duration

ACCOUNT_DETAILS Account# Customer# DateofOpen

Figure 2-12 : Relationship based tables

31

Participating entities: The entities which are joined by the relation.

55 | P a g e

Infosys Foundation Program

Relational Database Management System

Note: At this stage entities and relationships are converted to tables hence table does not have any data. Actual database table designs are driven by business requirements. The principles of data base design from ERD might be subjected to small changes depending on the requirements. We will be exposed to these details once we get a real life project experience. Some of the aspects we might have to keep in mind are: x

x

In one to one and one to many cases, we may not always have separate tables for the participating entities and their relationship. One combined table for both the participating entities and related attributes of relationship may be sufficient In a many to many relationship, it is mandatory to create separate tables for entities which are participating in the relationship as well as for the relationships. For example, entities and relationship shown in Figure 2 11, CUSTOMER and LOAN entities are having a many to many relationship. Hence one should create three separate tables, two for CUSTOMER and LOANS entities and one LOAN_DETAILS for relationship.

2.10. Merits and Demerits of E-R Modeling The following sections discuss the merit and demerits of E-R modeling.

2.10.1.

Merits of E-R Modeling

1. Easy to understand. Represented in business users language and can be understood by non-technical specialist. 2. Intuitive32 and helps in physical database creation. 3. Can be generalized and specialized depending on needs. 4. Can help in database design. 5. Gives a higher level abstraction of the system.

2.10.2.

Demerits of E-R Modeling

1. Physical design that is derived from E-R Model may contain some amount of ambiguities or inconsistency. 2. Sometime diagrams may lead to misinterpretations.

32

Intuitive: Natural.

56 | P a g e

Infosys Foundation Program

Relational Database Management System

Example: 1 Student

M Borrows

Books

In a real situation, there could be several types of borrowing, for example long term, normal and short term. It is not immediately clear whether the above diagram represents all or some of these only. If this aspect is not clarified, then people could come to a wrong conclusion. Giving proper description of the relationship is extremely important for ensuring better understanding. The following set of figures describes relationship clearly to overcome misinterpretation.

1 Student

M Borrows

Books

Long Term

1 Student

M Borrows

Books

Short Term

2.11. SUMMARY x

x x x

Most of the application errors are found because of miscommunication between the user of the application and the designer of the application and between the designer of the application and the developer of the application This miscommunication can be handled by pictorially representing the business findings An E-R diagram is one of the many ways in which the business findings are pictorially represented Four types of cardinality of relationships are a. one to one b. one to many c. many to one d. many to many

57 | P a g e

Infosys Foundation Program

Relational Database Management System

x

ER modeling helps in database design

3. Normalization 3.1.

Introduction

Usually in the software industry, E-R modeling is used by designer as a requirement analysis tool. Database design using E-R diagram is a by-product. Database designed based on the E-R model may contain some amount of inconsistency, ambiguity33 and redundancy. To resolve these issues some amount of refinement is required. This refinement process of database design is referred as Normalization. As normalization involves building structures (like table/tables), starting from the stage of identifying the columns (attributes) associated in the table, it is also called “Bottom-Up” approach. This normalization technique is based on a strong mathematical foundation. Basically normalization eliminates the duplicate data and makes insert, update and delete operations much more efficient in terms of performance and space requirement to store the data. In Infosys, almost all the database designs are initially based on E-R modeling and later refined using normalization techniques before they are physically created.

3.2.

The need for Normalization

Consider a university scenario, where in the data associated with the students, courses and their results are maintained in a table called “Student_Course_Result”. Student_Course_Result Table Student_Details

33

Course_Details

Result_Details

101

Davis

11/4/1986

M4

Applied Mathematics

Basic Mathematics

7

11/11/2004

82

A

102

Daniel

11/6/1987

M4

Applied Mathematics

Basic Mathematics

7

11/11/2004

62

C

101

Davis

11/4/1986

H6

American History

4

11/22/2004

79

B

103

Sandra

10/2/1988

C3

Bio Chemistry

11

11/16/2004

65

B

104

Evelyn

2/22/1986

B3

Botany

8

11/26/2004

77

B

102

Daniel

11/6/1987

P3

Nuclear Physics

Basic Physics

13

11/12/2004

68

B

105

Susan

8/31/1985

P3

Nuclear Physics

Basic Physics

13

11/12/2004

89

A

103

Sandra

10/2/1988

B4

Zoology

5

11/27/2004

54

D

105

Susan

8/31/1985

H6

American History

4

11/22/2004

87

A

Basic Chemistry

Ambiguity: Uncertainty.

58 | P a g e

Infosys Foundation Program

Relational Database Management System 104

Evelyn

2/22/1986

M4

Applied Mathematics

Basic Mathematics

7

11/11/2004

65

B

Figure 3-1: Data file in table format

If we observe the table shown in Figure 3-1 closely, we would find that the table has many anomalies34. They are: Insert Anomaly In some cases, Insertion of new data is difficult. Example: We cannot insert prospective course which does not have any registered student or we cannot insert student details who is yet to register for any course. Update Anomaly In some cases, Updation of existing data is difficult. Example: If we want to update the course M4’s name we need to do this operation three times. Similarly we may have to update student 103’s name twice if it changes. Delete Anomaly In some cases, deletion of existing data is not possible. Example: If we want to delete a course M4, in addition to M4 course details, other critical details of student also will be deleted. This kind of deletion is harmful to business. Moreover, M4 appears thrice in above table and needs to be deleted thrice. Duplicate Data The table has lots of duplicate data. Example: Course M4’s data is stored thrice and student 102’s data stored twice. This redundancy will increase as the number of course offering and students increases. Hence we need to refine our design so that we make an efficient database in terms of storage space and Inserts, Updates and Deletes operations. This refining technique is called as normalization.

3.3.

Process of Normalization

As mentioned previously, normalization technique is based on strong mathematical foundation. Basically in software industry four normal forms are used to design the database. Before getting to know the normalization techniques in detail, let us define a few building blocks which are used to define normal forms.

34

Anomalies: Irregularities.

59 | P a g e

Infosys Foundation Program

Relational Database Management System

3.3.1.

Determinant

Attribute X can be defined as determinant if it uniquely defines the attribute value Y in a given relationship or entity. To qualify as determinant attribute need NOT be a key attribute. Usually dependency of an attribute is represented as X Æ Y, which means attribute X decides attribute Y. Example: In RESULT relation, Marks attribute may decide the grade attribute. This is represented as Marks Æ Grade and read as Marks decides Grade.

Marks

Grade

Figure 3-2: Determinant

In the RESULT relation, Marks attribute is not a key attribute. Hence it can be concluded that key attributes are determinants but not all the determinants are key attributes.

3.3.2.

Functional Dependency

Consider the following REPORT Relation REPORT (Student#, Course#, CourseName, IName, Room#, Marks, Grade) Where: x Student# - A unique number associated with each student called Student Number x Course# - A unique number associated with each course called Course Number x CourseName – Name of the Course x IName – Instructor Name who delivered the course x Room# - Room number assigned to respective instructor x Marks - Marks obtained in a particular course by a particular student x Grade – Grade obtained by a particular student in a particular course Student# Course# together (called composite attribute) determines EXACTLY ONE value of marks. This can be symbolically represented as Student# Course# Æ Marks This type of dependency is called as functional dependency. In above example marks is functionally dependent on Student# Course#. Other functional dependencies in above examples are:

60 | P a g e

Infosys Foundation Program

Relational Database Management System

x x x x

Course# Æ CourseName, Course# Æ IName (If we assume that one course is offered by one and only one instructor) IName Æ Room# (If we assume that each instructor has his/her own and non-shared room) Marks Æ Grade.

Formally we can define functional dependency as: In a given relation R, X and Y are attribute sets. Attribute set Y is functionally dependent on attribute set X if each value of X determines EXACTLY ONE value of Y. It is represented as: XÆY

3.3.3.

Full Functional Dependency

In above example Marks is fully functionally dependent on Student# Course# and not on sub set of Student# Course#. This means that you cannot determine Marks obtained by a student in a course if you know only the Student# OR Course#. It can be determined only using Student# AND Course# together. So in this example Marks is fully functionally dependent on Student# Course#. CourseName is not fully functionally dependent on Student# Course# because one of the subset Course#, is enough to determine the CourseName and Student# is not required in determining CourseName. So CourseName is not fully functionally dependent on Student# Course#. Student#

Marks

Course#

Figure 3-3: Full Functional Dependency

Formal definition of full functional dependency is: In a given relation R, X and Y are attributes. Y is fully functionally dependent on attribute X only if it is not functionally dependent on sub-set of X. However X may be composite in nature.

3.3.4.

Partial Dependency

In the above relationship CourseName, IName, Room# are partially dependent on attributes Student# Course# because Course# alone is enough to determine the CourseName, IName, Room#.

61 | P a g e

Infosys Foundation Program

Relational Database Management System

Student#

Course#

8 9

CourseName

IName

Room#

Figure 3-4: Partial Dependency

Formal definition of partial dependency is: In a given relation R, X and Y are attribute sets. Attribute set Y is partially dependent on the attribute set X only if it is dependent on subset of attribute set X.

3.3.5.

Transitive Dependency

In above example, Room# depends on IName and in turn IName depends on Course#. Hence Room# transitively depends on Course#.

IName

Course#

Room#

Figure 3-5: Transitive Dependency

Similarly Grade depends on Marks, in turn Marks depends on Student# Course# hence Grade fully transitively35 depends on Student# Course#.

3.3.6.

Key attributes

In a given relationship R, if the attribute X uniquely defines all other attributes, then the attribute X is a Key attribute which is nothing but the candidate key which is defined in Chapter One. Example1: Student# Course# together is a composite key attribute which determines all attributes in relationship REPORT (Student#,Course#, CourseName, IName, Room#, Marks, Grade) uniquely. Hence Student# and Course# are key attributes. Example2: Student# and EMailID also can be considered as candidate keys for entity student STUDENT(Student#, StudentName, DateofBirth, EMailID). Student# or EMailID uniquely defines all other attributes of student entity.

35

Transitive: In-direct.

62 | P a g e

Infosys Foundation Program

Relational Database Management System

3.3.7.

Non key attributes

The attributes other than the candidate key attributes in a table/relation are called Non-Key attributes. OR The attributes which do not participate in the candidate key. Example1: Student# and EMailID are the candidate keys of the entity STUDENT(Student#, StudentName, DateofBirth, EMailID) so StudentName and DateofBirth are the non-key attributes.

3.4.

Types of Normal Forms

3.4.1.

First Normal Form (1 NF)

A relation R is said to be in the first normal form (1NF) if and only if all the attributes of the relation R are atomic36 in nature. Consider the Student_Course_Result table which is reproduced from the Section 3.2 The need for Normalization. Student_Course_Result Table Student_Details

Course_Details

Results

101

Davis

11/4/1986

M4

Applied Mathematics

Basic Mathematics

7

11/11/2004

82

A

102

Daniel

11/6/1987

M4

Applied Mathematics

Basic Mathematics

7

11/11/2004

62

C

101

Davis

11/4/1986

H6

American History

4

11/22/2004

79

B

103

Sandra

10/2/1988

C3

Bio Chemistry

11

11/16/2004

65

B

104

Evelyn

2/22/1986

B3

Botany

8

11/26/2004

77

B

102

Daniel

11/6/1987

P3

Nuclear Physics

Basic Physics

13

11/12/2004

68

B

105

Susan

8/31/1985

P3

Nuclear Physics

Basic Physics

13

11/12/2004

89

A

103

Sandra

10/2/1988

B4

Zoology

5

11/27/2004

54

D

105

Susan

8/31/1985

H6

American History

4

11/22/2004

87

A

104

Evelyn

2/22/1986

M4

Applied Mathematics

7

11/11/2004

65

B

Basic Chemistry

Basic Mathematics

Figure 3-6: Data file in table format

Table shown in Figure 3-6, Student_Details, Course_Details and Results attributes can be further divided. Student_Details attribute is divided into Student# (Student Number), StudentName (Student Name) and DateofBirth (Date of Birth). Course_Details attribute is 36

Atomic: The smallest levels to which data may be broken down and remain meaningful.

63 | P a g e

Infosys Foundation Program

Relational Database Management System

divided into Course# (Course Number), CourseName, Prerequisites and Duration. Similarly Results attribute is divided into DateofExam, Marks and Grade. To make above table 1NF compliant, it is re-designed as shown below. Student_Course_Result Table Student#

Student Name

Dateof Birth

Course#

CourseName

101

Davis

4-Nov-86

M4

102

Daniel

6-Nov-86

M4

101

Davis

4-Nov-86

H6

103

Sandra

2-Oct-88

C3

Applied Mathematics Applied Mathematics American History Bio Chemistry

104

Evelyn

22-Feb-86

B3

Botany

102

Daniel

6-Nov-86

P3

105

Susan

31-Aug-85

P3

103

Sandra

2-Oct-88

B4

Nuclear Physics Nuclear Physics Zoology

105

Susan

31-Aug-85

H6

104

Evelyn

22-Feb-86

M4

American History Applied Mathematics

Pre Requisite Basic Mathematics Basic Mathematics

Duration InDays

DateOf Exam

Marks

Grade

7

11-Nov-04

82

A

7

11-Nov-04

62

C

4

22-Nov-04

79

B

Basic Chemistry

11

16-Nov-04

65

B

8

26-Nov-04

77

B

Basic Physics Basic Physics

13

12-Nov-04

68

B

13

12-Nov-04

89

A

5

27-Nov-04

54

D

4

22-Nov-04

87

A

7

11-Nov-04

65

B

Basic Mathematics

Figure 3-7: First Normal Form

In the new form, all the attributes are atomic, meaning they are not further decomposable37 . You can not divide Student#, StudentName etc further into smaller attributes. Hence this table is in 1NF. Let us re-visit the issues we had with un-normalized table. Even at this stage, it is difficult to add prospective course or student information. Still it is difficult to update or delete either Course or Student information. Hence anomalies in inserts, updates and deletes are still to be resolved. Unfortunately first normal form has all the problems which we faced in un-normalized table.

3.4.2.

Second Normal Form (2 NF)

A Relation is said to be in Second Normal Form if and only if: x It is in the First normal form, and x No partial dependency exists between non-key attributes and key attributes. Let us re-visit 1NF table structure.

37

Decomposable: Further split or reduce.

64 | P a g e

Infosys Foundation Program

Relational Database Management System

x x x x

Student# is the key attribute for Student relation or table , Course# is the key attribute for Course relation or table Student# Course# together form the composite key attributes for Result relation or table Other attributes like StudentName, DateofBirth, CourseName, DurationInDays , PreRequisite, DateofExam, Marks and Grade are non-key attributes.

To make this table 2NF compliant, we will have to remove all the partial dependencies. x StudentName and DateofBirth depends on Student# only x CourseName, PreRequisite and DurationInDays depends on Course# only x DateofExam depends on Course# only To remove this partial dependency we need to split Student_Course_Result table into four separate tables, STUDENT, COURSE, RESULT and EXAM_DATE tables as shown in Figure 3-8 .

STUDENT TABLE

Student#

StudentName

COURSE TABLE

DateofBirth

Course#

CourseName

101

Davis

4-Nov-86

M1

102

Daniel

6-Nov-87

M4

103

Sandra

2-Oct-88

104

Evelyn

22-Feb-86

105

Susan

31-Aug-85

C1

106

Mike

4-Feb-87

C3

Basic Mathematics Applied Mathematics American History Basic Chemistry Bio Chemistry

107

Juliet

9-Nov-86

B3

Botany

108

Tom

7-Oct-86

P1

Basic Physics

109

Catherine

6-Jun-84

P3

Nuclear Physics Zoology

H6

B4

RESULT Table Student#

Course#

PreRequisite

11 M1

5 C1

11 8 8

P1

13 5

EXAM_DATE Table

Marks

7 4

Grade

Course#

DateOfExam

M4

11-Nov-04

H6

22-Nov-04

C3

16-Nov-04

B3

26-Nov-04

101

M4

82

A

102

M4

62

C

101

H6

79

B

103

C3

65

B

104

B3

77

B

102

P3

68

B

P3

12-Nov-04

105

P3

89

A

B4

27-Nov-04

103

B4

54

D

105

H6

87

A

104

M4

65

B

65 | P a g e

DurationInDays

Infosys Foundation Program

Relational Database Management System

Figure 3-8: Second Normal Form

x

In the first table (STUDENT), Student# is the key attribute and all other non-key attributes, StudentName and DateofBirth are fully functionally dependant on the key attribute x In the second table (COURSE), Course# is the key attribute and all other non-key attributes, CourseName, PreRequisite and DurationInDays are fully functionally dependant on the key attribute x In third table (RESULT), Student# Course# together are key attributes and all other non key attributes, Marks and Grade are fully functionally dependant on the key attributes x In the fourth table (EXAM_DATE) Course# is the key attribute and the non-key attribute, DateOfExam is fully functionally dependant on the key attribute x These four tables are also compliant with the First Normal Form definition. x So the above four tables are said to be in Second Normal Form (2NF) At first look it appears like all our anomalies are taken away! Now we are storing Student 103 and M4 record only once. We can insert prospective students and courses at our will. We will update only once if we need to change any data in STUDENT, COURSE tables. We can get rid of any course or student details by deleting just one row. Let us analyze the following table. Student#

Course#

Marks

Grade

101

M4

82

A

102

M4

62

C

101

H6

79

B

103

C3

65

B

104

B3

77

B

102

P3

68

B

105

P3

89

A

103

B4

54

D

105

H6

87

A

104

M4

65

B

Figure 3-9: RESULT Table

We already concluded that: x All the attributes are atomic in nature x No partial dependency exists between the key attributes and non-key attributes. x RESULT table is in Second Normal form (2NF) Assume, at present, as per the university evaluation policy, x Students who score more than or equal to 80 marks are awarded with “A” grade x Students who score more than or equal to 65 up till 79 gets “B” grade x Students who score marks more than or equal to 50 up till 64 fetches “C” grade

66 | P a g e

Infosys Foundation Program

Relational Database Management System

x Students who score marks less than 50 is only “D” grade The university management which is committed to improve the quality of education wants to change the existing grading system to a new grading system as given below. x “A+” grade for 95 and above x “A” grade for 85 to 94 x “B” grade for 70 to 84 x “B-“ grade for 65 to 69 x “C” grade for 55 to 64 x “D” grade for 45 to 54 x “E” grade for less than 40 In the present RESULT table structure, x We do not have an option to introduce new grades like A+, B- and E. x We need to do multiple updates on the existing records to bring them to the new grading definition. x We will not be able to take away “D” grade if we want to. x 2NF does not take care of all the anomalies and inconsistencies.

3.4.3.

Third Normal Form (3 NF)

A relation R is said to be in the Third Normal Form (3NF) if and only if x It is in 2NF and x No transitive dependency exists between non-key attributes and key attributes through another non key attribute. In the above RESULT table Student# and Course# are the key attributes. All other attributes, except grade are non-partially, non-transitively dependent on key attributes. The “Grade” attribute is dependent on “Marks” and in turn “Marks” is dependent on Student# Course#. To bring this table to third normal form we need to take off this transitive dependency. After taking this transitive dependency we can infer the following table structures which are in 3NF. Student#

Course#

Marks

MARKSGRADE TABLE

UpperBound

LowerBound

Grade

101

M4

82

102

M4

62

101

H6

79

100

95

A+

103

C3

65

94

85

A

104

B3

77

84

70

B

65

B-

102

P3

68

69

105

P3

89

64

55

C

103

B4

54

54

45

D

87

44

0

E

105

67 | P a g e

H6

Infosys Foundation Program

Relational Database Management System

104

M4

65

Figure 3-10: Third Normal Form

After normalizing tables to Third Normal Form (3NF), we got rid of all the anomalies and inconsistencies. Now we can add new grade systems, update the existing one and delete the unwanted ones. Hence the Third Normal Form is the most optimal normal form and 99% of the databases which require efficiency in x INSERT x UPDATE and x DELETE operations are designed in this normal form.

3.5.

Merits and Demerits of Normalization

The following sections discuss merits and demerits of normalization.

3.5.1.

Merits

1) Normalization is based on mathematical foundation. 2) Removes the redundancy to the greater extent. After 3NF, data redundancy is reduced to the extent of foreign keys. 3) Removes the anomalies present in Inserts, Updates and Deletes.

3.5.2.

Demerits

1) Data retrieval (Select) operation performance will be severely affected. Example: Let us assume that the university management wants to have the report of students performance in the following format. UNIVERSITY REPORT

Student Name

Course Name

Date Of Exam

Grade

Daniel

Applied Mathematics

11-Nov-04

C

Daniel

Nuclear Physics

12-Nov-04

B

Davis

Applied Mathematics

11-Nov-04

A

Davis

American History

22-Nov-04

B

Evelyn

Botany

26-Nov-04

B

Evelyn

Applied Mathematics

11-Nov-04

B

Sandra

Bio Chemistry

16-Nov-04

B

Sandra

Zoology

27-Nov-04

D

Susan

Nuclear Physics

12-Nov-04

A

Susan

American History

22-Nov-04

A

Figure 3-11: Proposed University Report

68 | P a g e

Infosys Foundation Program

Relational Database Management System

After applying 3NF normalization technique for database design, a single table will not contain all the information as desired by the college management. We need to select Student Name from STUDENT table, Course Name from COURSE table, Date of Examination from EXAM_DATE table and Grade from MarksGrade table. In an un-normalized format we would have retrieved all these columns just from one table. Hence normalization will definitely slow down the Select operations. It is better to restrict normalization process to 2NF, if application has more data retrieval operations than insert or update or delete operations. If the application is used for querying a database, it is called as “Reporting System”. Let us take an example of a Railway enquiry system. This enquiry system is used to enquire about reservation availability and not used to book the tickets. On the other hand a Railway reservation system is called as “On-line application” because this system is used for booking tickets (inserts), changing travel plans (updates) and canceling tickets (deletes). Hence one may normalize only up to 2NF for Reporting System and up to 3NF for Online applications. 2) Normalization may not always correspond to real world scenarios. It should be borne in mind however that full normalization may not always be desirable and the database designer may take advantage of his/her intimate knowledge of the real world and choose not to normalize in some particular instance. Example: consider the following relation: CUSTOMER (Name, Street, City, Postcode. Strictly speaking, the attribute Postcode uniquely identifies City, hence transitive dependency exists in the above scenario. Postcode -> City Thus CUSTOMER table is not in 3NF. However in practice the attributes City and Postcode are always used together as a unit and decomposing the relation would not be advisable in this case. Note: Some time to increase the performance of select operations for reporting application, database design is taken back from higher normal form to lower normal form (ex: 3NF to 2NF). This process is called as de-normalization or Second Level Design (SLD).

69 | P a g e

Infosys Foundation Program

Relational Database Management System

3.6.

Summary

x

Normalization is a refinement process wherein it helps in removing anomalies in insert, update and delete operations. x Normalization is also called “Bottom-up approach”, because this technique requires full knowledge of every participating attribute and its dependencies on the key attributes. If you try to add new attributes after normalization is done, it may change the normal form of the database design itself. x There are three normal forms that were defined being commonly used. x 1NF is used to makes sure that all the attributes of the relation are atomic in nature. x 2NF removes the partial dependency. x 3NF removes the transitive dependency. x Excessive normalization adversely affects select or retrieval operations on the database. x It is always better to normalize up to 3NF for insert, update and delete intensive (online transaction) systems. x It is always better to restrict up to 2NF for select intensive (reporting) systems. While normalizing a database, use common sense and don’t use only the normal forms as absolute measures. Points to Remember: Normal Form Test 1NF

2NF

3NF

70 | P a g e

Remedy (Normalization) Attributes of every relation Form new relations for should be atomic. An each non-atomic attribute is atomic if domain attribute of the attribute includes only atomic (simple, indivisible) values. For relations where Decompose to form a candidate key contains new relation for each multiple attributes partial key with its (composite candidate key), dependent attribute(s). non-key attribute should not Also retain the relation be functionally dependent with the original on a part of the candidate candidate key and any key. attributes that are fully functionally dependent on it. Relation should not have any Decompose to form a non-key attribute relation that includes non-key functionally determined by the

Infosys Foundation Program

Relational Database Management System

any other non-key attribute. In other words there should be no transitive dependency of a non-key attribute on the candidate key through another non key attribute.

3.7.

attribute(s) that functionally other determine(s) non-key attribute(s).

Case study

Given below is the data in an un-normalized table. Normalize it to 1NF. Identify the problems encountered when the table is in 1NF but not in 2NF. Subsequently normalize to 2NF and 3NF, explaining the problems faced and the solution to it. Proj_No

Proj_Name

Emp_No

Emp_Name

Rate_Category

2023

Amsterdam travel site

101 102 103

Vincent R Pauline J Charles C

A B C

Hourly_Rate_in _dollars 60 50 40

2056

Real Estate 101 Agency 107

Vincent R David R

A B

60 50

Proj_Name

Emp_No

Emp_Name

Rate_Category

Amsterdam travel site Amsterdam travel site Amsterdam travel site Real Estate Agency Real Estate Agency

101

Vincent R

A

Hourly_Rate_in _dollars 60

102

Pauline J

B

50

103

Charles C

C

40

101

Vincent R

A

60

107

David R

B

50

Solution: Table (1NF) Proj_No 2023 2023 2023 2056 2056

71 | P a g e

Infosys Foundation Program

Relational Database Management System

Problems encountered when the table is in 1NF but not in 2NF: i. Wastage of space: Information that code 1023 refers to the Amsterdam travel site appears three (3) times. ii. Update Anomaly: If the project name has to be changed, it has to be done in all the rows that the project name appears in. If it has not been changed in just one row, this may lead to inconsistency problems. iii. Insert Anomaly: The information about a new employee cannot be inserted into the table unless the employee is assigned to a project. iv. Delete Anomaly: If there is only one employee working on a project, it is not possible to delete information about the employee without losing information about the project. In other words it is not possible to delete a subset of a record. Solution: Normalize to 2NF i. Take out the duplication ii. Look for partial dependencies i.e. fields that are dependent on a part of a key and not on the entire key. In the above table, the key is (Proj_No, Emp_No) The functional dependencies are as follows: Proj_No Æ Proj_Name Emp_No Æ Emp_Name, Rate_Category, Hourly_Rate_in_Dollars Rate_Category Æ Hourly_Rate_in_Dollars The above table should be decomposed as follows:

Employee_Project Table Proj_No Emp_No 2023 2023

101 102

2023

103

2056 2056

101 107

Employee_Table Emp_No Emp_Name 101 Vincent R 102 Pauline J

72 | P a g e

Rate_Category A B

Hourly_Rate_in_Dollars 60 50

Infosys Foundation Program

Relational Database Management System

103 107

Charles C David R

Project Table Proj_No 2023 2056

C B

40 50

Proj_Name Amsterdam Travel site Real Estate Agency

Problems faced with the table in 2NF i. Stores data redundantly: The Hourly_Rate_in_Dollars and Rate_Category are being stored in its entirety for each employee. ii. Update Anomaly: If the hourly rate in dollars has to be changed for a particular rate category, it has to be done in all the rows that the rate category appears in. If it has not been changed in just one row, this may lead to inconsistency problems. iii. Insert Anomaly: It is not possible to insert information about a new rate category and the corresponding hourly rate in dollars unless there is an employee in that rate category. iv. Delete Anomaly: If there is only one employee in a particular rate category, it is not possible to delete information about the employee without losing information about that rate category and the corresponding hourly rate in dollars. Solution: Normalize to 3NF i. Remove this excess data into its own table. ii. Look for transitive relationships or relationships where a non-key attribute is dependent on another non-key attribute. In the above table (Employee table), Hourly_Rate_in_Dollars is actually dependent on Rate_Category according to the functional dependency Rate_Category Æ Hourly_Rate_in_Dollars The above table (Employee) should be decomposed as follows: Employee Table Emp_No 101 102 103 107 Rate Table Rate_Category

73 | P a g e

Emp_Name Vincent R Pauline J Charles C David R

Rate_Category A B C B

Hourly_Rate_in_Dollars

Infosys Foundation Program

Relational Database Management System

A B C

60 50 40

4. Structured Query Language (SQL) SQL is used to interact with a database to manage and retrieve data.

4.1. The Purpose of SQL SQL is used to retrieve data from the database. The DBMS processes the SQL request, retrieves the requested data from the database, and returns it. This process of requesting data from the database and receiving back the results is called a database query and hence the name Structured Query Language. Refer to Figure 4-1.

SQL Request

DBMS Database

Data 01000101 11001010 01001011 Computer System

Figure 4-1: Using SQL for database access

SQL is used to control all the functions that a DBMS provides for its users, including: x x x

Data Definition: SQL allows a user to define the structure and the organization of the data to be stored and the relationships among the stored data items Data Retrieval: SQL allows a user or an application program to retrieve the stored data from the database Data Manipulation: SQL lets a user or an application program update the database by allowing to add new data, delete the existing data, and modify the existing data

74 | P a g e

Infosys Foundation Program

Relational Database Management System

x

Access Control: SQL can be used to restrict a user’s ability to retrieve, add, and modify data, thus protecting the stored data against unauthorized access

4.2.

A Brief History of SQL

Date 1970

2002

Event The relational model devised by Codd was explored during the 1970s, and commercial relational database products began to emerge in the 1980s, originally for mainframe systems and later for microcomputers. Edgar Codd first wrote about the concept of relational databases in his paper ‘A relational model of data for large shared data banks’ in 1970. Oracle Corporation introduced the first commercial RDBMS ANSI (American National Standards Institute) formed SQL Standards Committee IBM (International Business Machine) announced DB2 (a database) ANSI (American National Standards Institute) SQL1 standard is approved ISO (International Organization for Standardization) SQL1 standard is approved ANSI (American National Standards Institute) SQL2 standard is approved Microsoft Corporation introduces SQL Server 2000, aimed at enterprise applications Research firm Gartner ranked IBM as #1 database vendor over Oracle

2004

SQL: 2003 standard is published

1979 1982 1983 1986 1987 1992 2000

75 | P a g e

Infosys Foundation Program

Relational Database Management System

4.3.

Data Types

The data types are used to specify the type of data that will be stored in each column of the table. The following table lists the typical data types 38used in Oracle 8i and Oracle 9i: Data Syntax

NUMBER(P, S)

CHAR (SIZE)

VARCHAR2 (SIZE)

LONG

38

Type Oracle 8i

Oracle 9i

Explanation (if applicable) The maximum The maximum Where p is the precision is 38 digits. precision is 38 digits. precision and s is the scale. Example: numeric (7, 2) is a number that has 5 digits before the decimal and 2 digits after the decimal. Up to 2000 bytes in Up to 2000 bytes in Where size is the number of characters Oracle 8i. Oracle 9i. to store. Fixed-length strings. Space padded. Example: if the width of a character variable is 10 and the string stored in it is ‘RDBMS’, it will be stored as ‘RDBMS ‘ Up to 4000 bytes in Up to 4000 bytes in Where size is the Oracle 8i. Oracle 9i. number of characters to store. Variablelength strings. Example: if the width of a character variable is 10 and the string stored in it is ‘RDBMS’, it will be stored as ‘RDBMS’ Up to 2 gigabytes. Up to 2 gigabytes. Variable-length strings. (backward 39 compatible )

Data Types: The description of the kinds of data stored, passed and used.

76 | P a g e

Infosys Foundation Program

Relational Database Management System

DATE

A date between Jan A date between Jan Example: 1, 4712 BC and Dec 1, 4712 BC and Dec ‘25-JAN-2005’ 31, 9999 AD. 31, 9999 AD.

39

Backward Compatible: A design that continues to work with earlier versions of a language, program, etc.

77 | P a g e

Infosys Foundation Program

Relational Database Management System

4.4.

Statement types

The following table lists the three types of SQL statements: Type of SQL statement

SQL keywords

Function

Data Definition Language (DDL)

CREATE ALTER DROP

Used to define, change and drop the structure of a table

TRUNCATE

Used to remove all rows from a table Used to enter, modify, delete and retrieve data from a table

Data Manipulation Language(DML)

SELECT INSERT INTO UPDATE DELETE FROM

Data Control Language (DCL)

GRANT REVOKE

Used to provide control over the data in a database

COMMIT ROLLBACK

Used to define the end of a transaction

Note: All keywords must be entered as described otherwise users get syntax errors.

4.5.

Data Definition Language (DDL) Statements

DDL statements help us in defining the table structure. x Define and create a new table x Remove a table that is no longer needed x Change the definition of an existing table x Define a virtual table (view) of data (Covered in section 4.7) x Build an index40 to access a table faster (Covered in section 4.5.5) CONSTRAINTS 40

Index: Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes; they are just used to speed up queries. More on index is covered in Section 4.5.5.

78 | P a g e

Infosys Foundation Program

Relational Database Management System

Data types help us to specify the nature or the kind of data that can be stored in a table. But datatype specification alone is not enough. For example, a column to store a product price should accept only positive values. We do not have a data type which accepts only positive numbers. Another requirement could be to specify constraints on column data. For example, product number should be a column in the product table which should contain unique values for identifying product information. SQL allows the definition of constraints on columns and tables. A user cannot store data in a column violating the constraint specified on that column. This scenario would throw an error. Types of Constraints: x Column Constraint: A constraint specified at the column level and is applied only to a specific column in addition to the column definition. x Table Constraint: A constraint specified at the table level after completion of all column definitions. This constraint is applied when we want to specify a constraint which involves more than one column in a table.

4.5.1.

CREATE TABLE Statement

The CREATE TABLE statement can: x Create a table x Define column constraints x Define table constraints Refer to Figure 4-2.

79 | P a g e

Infosys Foundation Program

Relational Database Management System

(---------- Column-Definitions ---------) Table-Constraint-Definitions

CREATE TABLE table-name

Column-Definition: column-name data-type [ DEFAULT value ]

Table-Constraint-Definition: CONSTRAINT constraint-name

primary-key-constraint foreign-key-constraint uniqueness-constraint check-constraint

Primary-Key-Constraint: PRIMARY KEY ( column-name )

Foreign-Key-Constraint: FOREIGN KEY ( column-name ) REFERENCES table-name [ column-name ]

Uniqueness-Constraint: UNIQUE ( column-name ) Check-Constraint: CHECK ( search-condition )

Figure 4-2: CREATE TABLE syntax

Note: Anything enclosed between [ ] is optional.

80 | P a g e

Infosys Foundation Program

Relational Database Management System

Example: 1. Create a table Customer_Details with the following specifications Column Name Data Type and Width Cust_ID NUMBER(5) Cust_Last_Name VARCHAR2(20) Cust_Mid_Name VARCHAR2(4) Cust_First_Name VARCHAR2(20) Account_No NUMBER(5) Account_Type VARCHAR2(10) Bank_Branch VARCHAR2(25) Cust_Email VARCHAR2(30)

Constraint NOT NULL NOT NULL

PRIMARY KEY NOT NULL NOT NULL

Syntax: CREATE TABLE Customer_Details( Cust_ID Number(5) CONSTRAINT nn_cust_custid NOT NULL, Cust_Last_Name VarChar2(20) CONSTRAINT nn_cust_lastname NOT NULL, Cust_Mid_Name VarChar2(4), Cust_First_Name VarChar2(20), Account_No Number(5) CONSTRAINT pk_cust PRIMARY KEY, Account_Type VarChar2(10) CONSTRAINT nn_cust_accounttype NOT NULL, Bank_Branch VarChar2(25) CONSTRAINT nn_cust_bankbranch NOT NULL, Cust_Email VarChar2(30)); 2.

Create a table Employee_Manager with the following specifications

Column Name Emp_ID Emp_Last_Name Emp_Middle_Name Emp_First_Name Emp_Email Department Grade Manager_ID

81 | P a g e

Data Type and Width Constraint NUMBER(6) PRIMARY KEY VARCHAR2(25) VARCHAR2(5) VARCHAR2(25) VARCHAR2(45) VARCHAR2(10) NUMBER(2) NUMBER(6) Foreign Key Referencing Emp_ID

Infosys Foundation Program

Relational Database Management System

Syntax: CREATE TABLE Employee_Manager( Emp_ID NUMBER(6) CONSTRAINT pk_emp PRIMARY KEY, Emp_Last_Name VARCHAR2(25), Emp_Middle_Name VARCHAR2(5), Emp_First_Name VARCHAR2(25), Emp_Email VARCHAR2(45), Department VARCHAR2(10), Grade NUMBER(2), Manager_ID Number(6) CONSTRAINT fk_emp_managerid REFERENCES Employee_Manager(Emp_ID)); A column level constraint follows a column definition whereas a table level constraint follows a table definition. A table level constraint generally involves two or more columns. 3. Applying primary key as a column constraint Syntax: CREATE TABLE Customer_Details( Cust_ID NUMBER(5) CONSTRAINT nn_cust_custid NOT NULL, Cust_Last_Name VARCHAR2(20) CONSTRAINT nn_cust_lastname NOT NULL, Cust_Mid_Name VARCHAR2(4), Cust_First_Name VARCHAR2(20), Account_No NUMBER(5) CONSTRAINT pk_cust_accountno PRIMARY KEY, Account_Type VARCHAR2(10) CONSTRAINT nn_cust_accounttype NOT NULL, Bank_Branch VARCHAR2(25) CONSTRAINT nn_cust_branch NOT NULL, Cust_Email VARCHAR2(30)); The primary key definition in the above example follows the column (Account_No) definition. A column definition includes the name of the column, data type and length or size of the column. 4. A primary key as a table constraint Syntax: CREATE TABLE Customer_Details( Cust_ID NUMBER(5) CONSTRAINT nn_cust_custid NOT NULL, Cust_Last_Name VARCHAR2(20) CONSTRAINT nn_cust_lastname NOT NULL, Cust_Mid_Name VARCHAR2(4), Cust_First_Name VARCHAR2(20), Account_No NUMBER(5), Account_Type VARCHAR2(10) CONSTRAINT nn_cust_accounttype NOT NULL,

82 | P a g e

Infosys Foundation Program

Relational Database Management System

Bank_Branch VARCHAR2(25) CONSTRAINT nn_cust_bankbranch NOT NULL, Cust_Email VARCHAR2(30), CONSTRAINT pk_cust_email PRIMARY KEY(Cust_ID, Account_No)); The primary key definition in the above example follows the table definition i.e. the primary key definition occurs after all the columns have been defined in the table for their data type and width. 5. How to create a new table from another existing table ? Syntax: CREATE TABLE Cust_Details AS SELECT Cust_ID, Account_No, Account_Type, Bank_Branch, Cust_Email FROM Customer_Details; In the above example, Cust_Details table is created from Customer_Details table. Cust_details table is created with attributes Cust_ID, Account_No, Account_Type, Bank_Branch and Cust_Email. If the new Cust_Details table created should be of the same structure as that of the existing Customer_Details table the syntax would be as follows: CREATE TABLE Cust_Details as SELECT * FROM Customer_Details; In the example, above not only is the structure copied but the data is also copied. To copy only the structure and not the data CREATE TABLE Cust_Details as SELECT * FROM Customer_Details WHERE 1=2; Note: When a table is created from another table, only the NOT NULL constraints are copied. All the other constraints are not copied. 6. Domain integrity constraint (check constraint – column constraint) CREATE TABLE Customer_Details( Cust_ID NUMBER(5) CONSTRAINT nn_cust_custid NOT NULL CONSTRAINT cc_cust_custid CHECK( Cust_ID BETWEEN 101 AND 105), Cust_Last_Name VARCHAR2(20) CONSTRAINT nn_cust_lastname NOT NULL, Cust_Mid_Name VARCHAR2(4), Cust_First_Name VARCHAR2(20), Account_No NUMBER(5) CONSTRAINT pk_cust_accountno PRIMARY KEY, Account_Type VARCHAR2(10) CONSTRAINT nn_cust_accounttype NOT NULL, Bank_Branch VARCHAR2(25) CONSTRAINT nn_cust_branch NOT NULL, Cust_Email VARCHAR2(30));

83 | P a g e

Infosys Foundation Program

Relational Database Management System

7. Domain integrity constraint ( check constraint – table constraint) CREATE TABLE Customer_Details( Cust_ID NUMBER(5) CONSTRAINT nn_cust_custid NOT NULL, Cust_Last_Name VARCHAR2(20) CONSTRAINT nn_cust_lastname NOT NULL, Cust_Mid_Name VARCHAR2(4), Cust_First_Name VARCHAR2(20), Account_No NUMBER(5) CONSTRAINT pk_cust_accountno PRIMARY KEY, Account_Type VARCHAR2(10) CONSTRAINT nn_cust_accounttype NOT NULL, Bank_Branch VARCHAR2(25) CONSTRAINT nn_cust_bankbranch NOT NULL, Cust_Email VARCHAR2(30), CONSTRAINT cc_cust_email CHECK(Cust_ID BETWEEN 101 AND 105 AND ACCOUNT_TYPE in (‘Savings’, ‘Checkings’)) ); Note: Although giving a name to a constraint is optional, it is a good programming practice to give every meaningful constraint name which is unique and can’t be applied to any other constraint of any table. The name of the constraint is required when the constraint has to be dropped. A NOT NULL constraint on a column(s) implies that value has to be provided for that column(s) compulsorily. A UNIQUE constraint on a column(s) implies that the values in the column(s) should be distinct. A column with a UNIQUE constraint can have NULL values. Mostly in DBMS, a PRIMARY KEY constraint implicitly imposes a NOT NULL and UNIQUE constraint. If the table has a composite primary key, each of the attribute constituting the primary key is NOT NULL. In other words, column involved in the composite primary key cannot have NULL value. However the combination of attributes constituting the primary key should offer a unique value. A FOREIGN KEY constraint on a set of attribute(s) does not prevent them from having duplicate or NULL values. Note: Users can use the DESCRIBE or DESC statement to see the structure of the table.

Example: DESCRIBE 84 | P a g e

Customer_Details; Infosys Foundation Program

Relational Database Management System

Or DESC Customer_Details;

4.5.2.

ALTER TABLE statement

The ALTER TABLE statement can be used for the following purpose: • To add a new column definition to an existing table • Drop a column from an existing table • Add or drop a primary key to / from an existing table • Add or drop a foreign key to / from an existing table • Add or drop a unique constraint to / from an existing table • Add or drop a check constraint to / from an existing table

ALTER TABLE table name

Add column-definition DROP column-name ADD primary-key-definition foreign-key-definition unique-constraint check-constraint DROP CONSTRAINT column-name

Figure 4-3: ALTER TABLE statement syntax

Note: The check constraint enforces the domain integrity constraint. It permits only values allowed by the constraint into the column(s). The domain integrity constraint will be covered in detail in chapter 5.

Example: 1. Adding a new column Add a phone number to the Customer_Details table ALTER TABLE Customer_Details ADD Contact_Phone CHAR(10); 2. Modifying an existing column definition Modify the size of the Contact_Phone column

85 | P a g e

Infosys Foundation Program

Relational Database Management System

ALTER TABLE Customer_Details MODIFY Contact_Phone CHAR(12);

3. Adding a NOT NULL Constraint Add the NOT NULL constraint on the Contact_Phone column ALTER TABLE Customer_Details MODIFY Contact_Phone CHAR(12) CONSTRAINT nn_cust_phone NOT NULL; 4. Adding a UNIQUE Constraint Add the UNIQUE constraint on the Contact_Phone column ALTER TABLE Customer_Details ADD CONSTRAINT uq_cust_phone UNIQUE (Contact_Phone); 5. Dropping a constraint Drop the NOT NULL constraint on Contact_Phone column ALTER TABLE Customer_Details DROP CONSTRAINT nn_cust_phone; 6. Dropping a column Drop the Contact_Phone column from the Customer_Details table ALTER TABLE Customer_Details DROP (Contact_Phone); 7. Adding a simple PRIMARY KEY Make the Account_No column as the primary key ALTER TABLE Customer_Details ADD CONSTRAINT pk_cust_accountno PRIMARY KEY (Account_No); 8. Table level constraint - Adding a composite PRIMARY KEY to a table Make the Account_No and Cust_ID columns as the primary key ALTER TABLE Customer_Details ADD CONSTRAINT pk_cust_accountno_custid PRIMARY KEY (Account_No, Cust_ID); 9. Adding FOREIGN KEY Make Account_No column in Customer_Transaction table as the foreign key referencing Account_No column of Customer_Details

86 | P a g e

Infosys Foundation Program

Relational Database Management System

ALTER TABLE Customer_Transaction ADD CONSTRAINT fk_cust_trans_accountno FOREIGN KEY (Account_No) REFERENCES Customer_Details (Account_No); 10. Adding a CHECK constraint ALTER TABLE Customer_Details ADD CONSTRAINT cc_cust_custid CHECK (Cust_ID 105);

BETWEEN 101 AND

11. Dropping a simple or composite PRIMARY KEY constraint Drop the primary key constraint ALTER TABLE Customer_Details DROP PRIMARY KEY; Or ALTER TABLE Customer_Details DROP CONSTRAINT Pkey1; Note: The syntax for dropping a simple or composite primary key constraint is one and the same.

While creating a table one can have only one primary key but any number of foreign keys. If a table already has a primary key column, adding another primary key column to the same table using the ALTER TABLE statement would result in an error. RDBMS will not allow us to have a PRIMARY KEY constraint on column(s) if the column(s) has NULL or duplicate values. Note: We cannot change the name of column or the name of a table using ALTER TABLE command. However, we can change the datatype or length of the column using the same command. If the table has only one column, the ALTER TABLE statement cannot be used to drop that column because that would render the table definition invalid.

4.5.3.

DROP TABLE statement

The DROP TABLE statement is used to drop or remove a table permanently from the database.

DROP TABLE table-name Figure 4-4: DROP TABLE statement syntax

87 | P a g e

Infosys Foundation Program

Relational Database Management System

Both the schema/structure of the table and all of its contents are lost when DROP table command is used. There is no way to recover the data.

Note: Most RDBMS will restrict the dropping of a table if it has attribute(s) being referred to by attribute(s) of another table. This is called the referential integrity constraint.

Example: Discard Customer_Details table DROP TABLE Customer_Details;

4.5.4.

TRUNCATE TABLE statement

The TRUNCATE TABLE statement is used to remove/delete all rows from a table.

TRUNCATE TABLE table-name Figure 4-5: TRUNCATE TABLE command syntax

When the TRUNCATE TABLE statement is used, all the contents of the specified table are lost but its definition remains intact. There is no way to recover the data. It releases the secondary memory occupied by the contents of the specified table. Example: Delete all rows from the Customer_Details table TRUNCATE TABLE Customer_Details;

4.5.5.

CREATE INDEX statement

An index is a structure which provides quick access to the rows of a table, based on the values of one or more columns. The index stores the data values and pointers (physical address information) to the rows where those data values occur. In the index, the data values are arranged either in descending or in ascending order, so that the RDBMS can quickly lookup the index to find a particular value. It then follows the pointer to locate the row containing the value. In Error! Reference source not found., the index is created on the Account_No which in turn oints to the corresponding rows in the table.

88 | P a g e

Infosys Foundation Program

Relational Database Management System

Note: The presence of an index or its absence is unknown to the SQL user, who accesses the table.

Cust_ID Cust_Last_ Cust_Mid Name _Name 101Smith A. 102Smith S. 103Langer G. 104Quails D. 105Jones E.

Cust_First Account Account_ Bank_Branch _Name _No Type Mike 1020Savings Downtown Graham 2348Checking Bridgewater Justin 3421Savings Plainsboro Jack 2367Checking Downtown Simon 2389Checking Brighton

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Customer_Detail records from Customer_Details file

INDEX

1020 2348 2367 2389 3421

Figure 4-6: An index on Account_No column of Customer_Details table

Advantages of having an INDEX: x Referring to indexed column(s) in search conditions speeds up the execution of SQL statements. x It is most appropriate when retrieval of data from tables is more frequent than inserts and updates Disadvantages of having an INDEX: x It consumes additional disk space x The INDEX must be updated whenever a row is added to the table or whenever updation of indexed column happens in an existing row. This imposes additional overhead on INSERT and UPDATE statements for the table Note: x Most RDBMS products automatically create an index for the primary key column of a table because they anticipate these columns to be most frequently accessed x

89 | P a g e

Most RDBMS products also automatically create an index on any column (or column combination) defined with a unique constraint. The RDBMS must check the value of such a column every time a new row is inserted, or an existing row is updated, to make certain that the value does not duplicate a value already contained in the table. Without the

Infosys Foundation Program

Relational Database Management System

index on the column(s), the RDBMS would have to sequentially search through every row of the table to check the constraint. With an index, the RDBMS can simply use the index to find a row (if it exists) with the value in question, which is a much faster operation than a sequential search x

When the primary key of the table or the unique constraint on column(s) is dropped, the index which was built on them is also dropped automatically

CREATE [UNIQUE] INDEX index-name on table-name (column-name) Figure 4-7: CREATE INDEX statement syntax

DROP INDEX index-name Figure 4-8: DROP INDEX statement syntax

Example: 1. Create a simple index for the Customer_Details table on Cust_ID CREATE UNIQUE INDEX Cust_Idx ON Customer_Details (Cust_ID); 2. Create a composite index for the Customer_Details table on Cust_ID and Account_No CREATE UNIQUE INDEX ID_AccountNo_Idx ON Customer_Details (Cust_ID, Account_No); 3. Drop the index created earlier DROP INDEX ID_AccountNo_Idx; Note: The keyword UNIQUE in the CREATE INDEX statement is optional. If the keyword UNIQUE is omitted, the index table may have duplicates entries.

90 | P a g e

Infosys Foundation Program

Relational Database Management System

Points to Remember: x The CREATE TABLE statement creates a table with column definitions, PRIMARY KEY, FOREIGN KEY(s) and other constraints like UNIQUE and NOT NULL

4.6.

x

The DROP TABLE statement removes an existing table from the database

x

The ALTER TABLE statement can be used to add a new column to an existing table, modify an existing column definition, add/drop a PRIMARY KEY, FOREIGN KEY and other constraints like UNIQUE and NOT NULL

x

The CREATE INDEX statement can be used to define indexes, which speeds up database queries but add overheads to database updates

Data Manipulation Language (DML) Statements

The DML statements are used to: x Insert data into the table x Delete data from the table x Retrieve/Fetch data from the table x Modify/update data in the table

4.6.1.

INSERT Statement

Single-row insert: A single-row INSERT statement adds a single record (new row) of data to the table. Refer to Figure 4-10. The Single-Row INSERT statement INSERT INTO table-name [ column-name(s) ] VALUES ( ----------- constant (s) -------------) NULL

Figure 4-9: Single-row insert statement syntax

91 | P a g e

Infosys Foundation Program

Relational Database Management System

INSERT INTO Customer_Details ( Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name,Account_No, Account_Type, Bank_Branch, Cust_Email) VALUES ( 106,’Costner’, ‘A.’, ‘Kevin’, 3350, ‘Savings’, ‘Brighton’, ‘[email protected]’ );

106Costner

A.

Kevin

3350Savings

Brighton

[email protected]

Customer_Details table Cust_ID Cust_Last_ Cust_Mid Cust_First Account Account_ Bank_Branch Name _Name _Name _No Type 101Smith A. Mike 1020Savings Downtown 102Smith S. Graham 2348Checking Bridgewater G. Justin 3421Savings Plainsboro 103Langer D. Jack 2367Checking Downtown 104Quails E. Simon 2389Checking Brighton 105Jones

Cust_Email [email protected] [email protected] [email protected] [email protected] [email protected]

Figure 4-10: Inserting a single row

Note: The column list specified in the INSERT statement help us to match the data values in the VALUES clause. The number of columns mentioned in the column list and its data type must exactly match with the data values specified in the VALUES clause or else an error will occur. Data of type CHAR, VARCHAR2 and DATE are always enclosed within single quotes. Example: ‘Costner’, ‘12-Jan-2005’. Users can use the SELECT * from to view the records inserted into the specified table. The SELECT statement is covered in detail in Section 4.6.4. Example of Invalid INSERT statements: INSERT INTO Customer_Details (Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name, Account_No, Account_Type, Bank_Branch) VALUES (106, ‘Costner’, ‘A.’, ‘Kevin’, 3350, ‘Savings’, ‘Brighton’, [email protected]’); The above INSERT statement is invalid because the number of values in the VALUES clause exceeds the number of columns that are to receive them. INSERT INTO Customer_Details

92 | P a g e

Infosys Foundation Program

Relational Database Management System

(Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name, Account_No, Account_Type, Bank_Branch, Cust_Email) VALUES (106, ‘Costner’, ‘A.’, ‘Kevin’, 3350, ‘Savings’,‘Brighton’); The above INSERT statement is invalid because the number of values in the VALUES clause is less than the columns that are to receive them. Assume Account_No is the Primary Key for the Customer_Details table INSERT INTO Customer_Details (Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name, Account_Type, Bank_Branch) VALUES (106, ’Costner’, ’A.’, ’Kevin’, ‘Savings’, ‘Brighton’); The above INSERT statement is invalid because the column list specified does not include the attribute, Account_No which is the primary key. Because Account_No has not been included in the column list, SQL automatically assigns a NULL value to it. Being a primary key attribute it cannot have a NULL value.

Insertion of NULL values SQL supports missing or unknown or inapplicable data by means of a NULL value. A NULL value stored in a table implies that value for that row-column intersection is missing or unknown or inapplicable. But the NULL value is not the actual data value like 0, 473.83 or ‘John Clark’. The NULL value occupies space. Refer to Figure 4-11. Value Unknown

Customer_Details Table Cust_ID Cust_Last_ Cust_Mid Name _Name 101Smith A. 102Smith S. 103Langer G. 104Quails D. 105Jones E.

Cust_First Account Account_ Bank_Branch _Name _No Type Mike 1020Savings Downtown Graham 2348Checking Bridgewater Justin 3421Savings Plainsboro Jack 2367Checking Downtown Simon 2389Checking Brighton

Cust_Email NULL [email protected] [email protected] [email protected] [email protected]

Figure 4-11: Storing NULL Values in the Customer_Details Table

When a new row is inserted to a table, SQL automatically assigns a NULL value to any column whose name is missing or omitted from the column list in the INSERT statement.

93 | P a g e

Infosys Foundation Program

Relational Database Management System

Example: INSERT INTO Customer_Details (Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name,Account_No, Account_Type, Bank_Branch) VALUES ( 106,’Costner’, ‘A.’, ‘Kevin’, 3350, ‘Savings’, ‘Brighton’);

106Costner

A.

Kevin

3350Savings

Brighton

NULL

Explicit assignment of NULL value can be made by including these columns in the column list and by correspondingly specifying NULL in the values list. Example: INSERT INTO Customer_Details (Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name,Account_No, Account_Type, Bank_Branch, Cust_Email) VALUES ( 106,’Costner’, ‘A.’, ‘Kevin’, 3350, ‘Savings’, ‘Brighton’, NULL);

106Costner

A.

Kevin

3350Savings

Brighton

NULL

Inserting all columns: SQL permits omitting of the column list from the INSERT statement. When we explicitly do not mention the column list in the INSERT statement, by default all the columns of the table are included, in sequence from left to right. Example: INSERT INTO Customer_Details VALUES ( 106,’Costner’, ‘A.’, ‘Kevin’, 3350, ‘Savings’, ‘Brighton’, NULL );

106Costner

A.

Kevin

3350Savings

Brighton

NULL

Note: When the column list is omitted, the NULL keyword has to be used in the values list to explicitly assign NULL values to columns. In addition the order of columns mentioned in the column list and the order of data values must exactly match. Example: To Insert a row into the Customer_Transaction Table

94 | P a g e

Infosys Foundation Program

Relational Database Management System

INSERT INTO Customer_Transaction VALUES (2367,'17-JAN-2005','Deposit’, 2000.00, 14456); Note: The Date value should be input in the format ‘dd-mmm-yyyy’ or ‘dd-mmm-yy’.

Example of Invalid INSERT into the Customer_Details Table INSERT INTO Customer_Details VALUES (106,‘Costner’); In the above INSERT statement, the column list is omitted. The value for all columns should have to be provided, but the value for only Cust_ID and Cust_Last_Name is provided.

4.6.2.

DELETE Statement

The DELETE statement can delete one or more rows from a table. Refer to Figure 4-12. Note: Even if all the rows are deleted from the table, the table definition and its column details are still stored in the database. Thus the table still exists. To erase the table definition also from the database, the DROP TABLE statement must be used. The DELETE statement cannot delete column(s) from a table. It deletes only row(s). To delete a given column from a table, the ALTER TABLE statement must be used.

DELETE FROM table-name [ where search-condition ]

Figure 4-12: The DELETE statement syntax

Example: 1. Deleting all rows of a table - Delete all current customers DELETE FROM Customer_Details; 2. Deleting some rows of a table- Delete Customer with Cust_ID=102 customers

from the list of

DELETE 95 | P a g e

Infosys Foundation Program

Relational Database Management System

FROM Customer_Details WHERE Cust_ID = 102; 3. Examples of invalid DELETE Statements DELETE * FROM Customer_Details; OR DELETE Cust_ID FROM Customer_Details;

Difference between TRUNCATE and DELETE statement x TRUNCATE is a Data Definition Language (DDL) statement whereas DELETE is a Data Manipulation Language (DML) statement x TRUNCATE deletes all records from the table whereas DELETE can be used to selectively delete records from a table by using the WHERE clause x TRUNCATE releases the secondary storage occupied by the records of the table whereas DELETE does not do so x Data removed using TRUNCATE cannot be recovered whereas data removed using DELETE can be recovered (a DCL statement called ROLLBACK can be used which is covered in chapter 5)

4.6.3.

UPDATE Statement

One or more column values can be modified in the selected rows of a table using UPDATE statement. The table to be modified is mentioned immediately after the UPDATE keyword. The ‘WHERE clause’ identifies the rows of the table to be modified. The ‘SET clause’ specifies which columns have to be updated and assigns the new values for them.

UPDATE table-name SET column-name1 = expression1, column-name2 = expression2, ----[ WHERE search-condition ]

Figure 4-13: UPDATE statement syntax

Example: 1. Changing all rows

96 | P a g e

Infosys Foundation Program

Relational Database Management System

Until fresh instructions come in, delete Rate_of_Interest values for all customers. UPDATE Customer_Fixed_Deposit SET Rate_of_Interest_in_Percent = NULL; 2. Changing some rows For customers with a fixed deposit > 3000, increase Rate_of_Interest to 7.3%. UPDATE Customer_Fixed_Deposit SET Rate_of_Interest_in_Percent = 7.3 WHERE Amount_in_Dollars > 3000; 3. Changing more than one column value. Change the Email_ID and Rate_of_Interest of Customer (Cust_ID = 105) UPDATE Customer_Fixed_Deposit SET Cust_Email = ‘[email protected]’, Rate_of_Interest_in_Percent = 7.4 WHERE Cust_ID = 105;

4.6.4.

SELECT Statement

The SELECT statement helps us in retrieving data from the database and returns the resultant set of record(s) in the form of query results. Refer to Figure 4-14. SELECT [ ALL / DISTINCT ] column-name1, column-name2, ------ FROM table-specification [ WHERE search-condition ] [ GROUP BY grouping column ] [ HAVING search-condition ] [ ORDER BY sort-specification ] Figure 4-14: SELECT statement syntax

The result of a SQL query is a table of data, having one or more rows and columns. Refer to Figure 4-15.

97 | P a g e

Infosys Foundation Program

Relational Database Management System

Query Step 1

SELECT Cust_ID, Account_No FROM Customer_Details; Query Results Cust_ID 101 102 103 104 105

Account_No 1020 2348 Step 3 3421 2367 2389

DBMS

Step 2

Database

Figure 4-15: The tabular picture of SQL query results

4.6.4.1.

Simple SELECT Statement

The SELECT statement is used to select either some or all the columns from a table. The asterisk (*) is a wildcard character that is used to denote all columns. Avoiding use of SELECT * is a good programming practice. It is better to list the column names explicitly. Example: 1. Selecting all columns List all information about all the customers SELECT Cust_ID, Cust_Last_Name, Cust_Mid_Name, Account_No, Account_Type, Bank_Branch, Cust_Email FROM Customer_Details; Or

Cust_First_Name,

SELECT * FROM Customer_Details; 2. Selecting some columns List Cust_ID, Account_No of all customers SELECT Cust_ID, Account_No FROM Customer_Details;

98 | P a g e

Infosys Foundation Program

Relational Database Management System

4.6.4.2.

Avoiding duplicates (DISTINCT)

By default the SELECT statement retrieves all rows that are filtered by the SELECT statement. This may however contain duplicates rows. Mentioning DISTINCT keyword in the column list, before all the columns helps us in eliminating duplicate rows in the result set returned by the SELECT statement. The default keyword used is ALL. Example: 1. List all customers name SELECT ALL Cust_Last_Name FROM Customer_Details; This is equivalent to: SELECT Cust_Last_Name FROM Customer_Details; 2. This is likely to return duplicate rows. To avoid this: SELECT DISTINCT Cust_Last_Name FROM Customer_Details; 4.6.4.3.

Row Selection (WHERE clause)

The WHERE clause specifies a selection criteria or condition that limits the number of rows retrieved. It is a row wise operation. Refer to Figure 4-16. For each row in the table, the search condition can produce one of the three results: x If the condition is true, then the row is considered in the query results x If the condition is false, then the row is discarded from the query results x If the column being searched has a NULL value, then the row is excluded from the query results Problem Statement: To select rows which have 102 in the Manager column.

99 | P a g e

Infosys Foundation Program

Relational Database Management System

102 = 102 TRUE

Name Gautam Kumar D.K. Singh Tapas A.P. Vikas S. S.P. singh

Query Results Name Commission D.K. Singh 1200 Vikas S. 1350

Manager 101 102 103 102 NULL

FALSE 103 = 102 Unknown NULL = 102

Figure 4-16: Selection of rows with the WHERE clause

Example: 1. List all customers with an available account balance in dollars greater than $20000 SELECT Account_No, Total_Available_Balance_in_Dollars FROM Customer_Transaction WHERE Total_Available_Balance_in_Dollars > 20000.00; 2. List the Cust_ID, Account_No of ‘James’. SELECT Account_No, Cust_ID FROM Customer_Details WHERE Cust_First_Name = ‘James’; Note: The comparison is case sensitive. The column-names are not case-sensitive; the values of the column(s) are case sensitive. For Example: ‘JAMES’ is not the same as ‘james’ or ‘James’ The WHERE clause can be used with any comparison operators such as =, >, <, >=, <=, <> or the logical operators (AND, OR, NOT).

Expression1 -------------------------

= --------------------- Expression2 <> < <= > >=

Figure 4-17: Comparison test syntax

100 | P a g e

Infosys Foundation Program

Relational Database Management System

When SQL evaluates the values of the two expressions in the comparison test, three results can occur: 1. The test may yield a TRUE result 2. The test may yield a FALSE result 3. If either of the two expressions produces a NULL value, the comparison yields a NULL result. Example: 1. List all Account_No where total available balance in dollars is atleast $20000.00 SELECT Account_No FROM Customer_Transaction WHERE Total_Available_Balance_in_Dollars >= 20000.00; 2. List all Cust_ID, Cust_Last_Name from Customer_Details table where Account_type is ‘Savings’ and Bank_Branch is ‘Downtown’. SELECT Cust_ID, Cust_Last_Name FROM Customer_Details WHERE Account_Type = ‘Savings’ AND Bank_Branch = ‘Downtown’; 3. List all Cust_ID, Cust_Last_Name from Customer_Details Account_type is ‘Savings’ and nor Bank_Branch is ‘Downtown’.

table

where

neither

SELECT Cust_ID, Cust_Last_Name FROM Customer_Details WHERE NOT Account_Type = ‘Savings’ AND NOT Bank_Branch = ‘Downtown’;

4. List all Cust_ID, Cust_Last_Name where either Account_type is ‘Savings’ or Bank_Branch is ‘Downtown’. SELECT Cust_ID, Cust_Last_Name FROM Customer_Details WHERE Account_Type = ‘Savings’ OR Bank_Branch = ‘Downtown’; The Multi-Row INSERT statement A multi-row INSERT statement extracts rows of data records from one table and inserts it into another table. Refer to Figure 4-18.

101 | P a g e

Infosys Foundation Program

Relational Database Management System

INSERT INTO table-name [ column-name(s) ]

query

Figure 4-18: Multi-row INSERT statement syntax

The data values for the new rows are not explicitly specified in this form of INSERT statement, within the statement text. Instead, the source of new rows is a database query, as shown in Figure 4-19. Example: INSERT INTO OldCust_details (Account_No, Transaction_Date,Total_Available_Balance_in_Dollars) SELECT Account_No,Transaction_Date,Total_Available_Balance_in_Dollars From Customer_Transaction WHERE Total_Available_Balance_in_Dollars > 10000.00; Account Transaction Transaction Transaction_Amount Total_Available_Balance _No _Date _Type _in_Dollars _in_Dollars 102012-Jan-2005 Deposit 5000.00 10000.00 234814-Jan-2005 Withdrawal 2500.00 13500.00 342114-Jan-2005 Deposit 2000.00 27234.00 236716-Jan-2005 Withdrawal 1200.00 12456.00 102017-Jan-2005 Withdrawal 1500.00 8500.00 Customer_Transaction records from Customer_Transaction table Query uses data from Customer_Transaction table

SELECT Account_No, Transaction_Date, Total_Available_Balance_in_Dollars FROM Customer_Transaction WHERE Total_Available_Balance_in_Dollars > 10000 Step 1 OldCust_details Table Account_No Transaction_ Total_Available_Balance Date _in_Dollars 234814-Jan-2005 13500.00 342114-Jan-2005 27234.00 236716-Jan-2005 12456.00

Query Results Step 2

Transaction_Total_Available_Balance Date _in_Dollars 234814-Jan-2005 13500.00 342114-Jan-2005 27234.00 236716-Jan-2005 12456.00

Account_No

Figure 4-19: Inserting Multiple Rows

The logical restrictions on the query that appears within the multi-row INSERT statement: The query results contains the same number of columns as the column list in the INSERT statement and the data types must be compatible, column by column

4.6.4.4.

BETWEEN, IN, LIKE

The BETWEEN operator includes both the end values specified.

102 | P a g e

Infosys Foundation Program

Relational Database Management System

The IN operator is used to check if a value belongs to a set of values. Note that BETWEEN and IN can be fully substituted with a combination of AND, OR, NOT. The LIKE operator is used to check for similarity of strings. When used with LIKE the use of “_” refers to exactly one unknown character; “%” refers to an unknown number of unknown characters.

test-expression [NOT] BETWEEN low-expression AND high-expression

Figure 4-20: Range test (Between) syntax

test-expression [NOT] IN (constant1, constant2…………)

Figure 4-21: Set membership test (IN) syntax

Column-name [NOT] LIKE pattern ESCAPE escape-character

Figure 4-22: Pattern matching test (LIKE) syntax

Example: 1. List all Account_Nos with an account balance in the range $20000.00 to $30000.00. SELECT Account_No FROM Customer_Transaction WHERE Total_Available_Balance_in_Dollars BETWEEN 20000.00 AND 30000.00; Or SELECT Account_No FROM Customer_Transaction WHERE Total_Available_Balance_in_Dollars >= 20000.00 AND Total_Available_Balance_in_Dollars <= 30000.00; 2. List all customers who have account in Downtown or Brighton. SELECT Cust_ID FROM Customer_Details 103 | P a g e

Infosys Foundation Program

Relational Database Management System

WHERE Bank_Branch IN (‘Downtown’, ‘Brighton’); Or

SELECT Cust_ID FROM Customer_Details WHERE Bank_Branch = ‘Downtown’ OR Bank_Branch = ‘Brighton’; 3. List all Accounts where the Bank_Branch name begins with a ‘D’ and has ‘o’ as the second character. SELECT Account_No, Cust_ID, Cust_Last_Name FROM Customer_Details WHERE Bank_Branch LIKE ‘Do%’; 4. List all Accounts where the Bank_Branch column has ‘o’ as the second character. SELECT Account_No, Cust_ID, Cust_Last_Name FROM Customer_Details WHERE Bank_Branch LIKE ‘_o%’; 5. List all Account_Nos with balance not in the range $20000.00 to $30000.00. SELECT Account_No FROM Customer_Transaction WHERE Total_Available_Balance_in_Dollars NOT BETWEEN 20000.00 AND 30000.00; 4.6.4.5.

IS NULL, IS NOT NULL

The NULL value is used to indicate the value is not present. It is not a zero or blank character. NULL cannot be compared to any other value. If compared, since the result of the comparison cannot be determined, the result of the comparison is also a NULL. column-name IS [ NOT ] NULL

Figure 4-23: NULL value test (IS NULL) syntax

Note: A NULL value is not equal to another NULL value. The result of comparing two NULL values is NULL. It is neither TRUE nor FALSE.

Example:

104 | P a g e

Infosys Foundation Program

Relational Database Management System

1. List employees who have not been assigned a Manager yet. SELECT Employee_ID FROM Employee_Manager WHERE Manager_ID IS NULL; 2. List employees who have been assigned to some Manager. SELECT Employee_ID FROM Employee_Manager WHERE Manager_ID IS NOT NULL; 4.6.4.6.

Column titles using AS

When the SELECT statement returns a column, the title of the result column set is the name of the column. If the statement includes an evaluated expression, the column title is a default name that the RDBMS gives the expression. To give meaningful column titles use the keyword AS. Example: List those customer accounts whose account balance is greater than $10000.00. SELECT Account_No AS “Customer Account Total_Available_Balance_in_Dollars AS “Total Balance” FROM Customer_Transaction WHERE Total_Available_Balance_in_Dollars > 10000.00; 4.6.4.7.

No.”,

Sorting Query Results (ORDER BY clause)

The rows returned as an output of SQL query is not arranged in any particular order. If needed, we can arrange the rows returned by an SQL query using the ORDER BY clause in the SELECT statement. The ORDER BY is a row-wise operation. By default the ORDER BY clause arranges the rows of the query result in ascending order. To arrange the rows returned by the query in descending order, use the keyword DESC.

ORDER BY ---------------Column name1, Column name2, ……….. -------- ASC ------Column-number1, Column number2,……

DESC

Figure 4-24: The ORDER BY clause syntax

Example: 1. List the account numbers and their account balances of all customers in ascending order of the account balance.

105 | P a g e

Infosys Foundation Program

Relational Database Management System

SELECT Account_No, Total_Available_Balance_in_Dollars FROM Customer_Transaction ORDER BY Total_Available_Balance_in_Dollars; 2. List the customers and their account numbers in the descending order of the account numbers. SELECT Cust_Last_Name, Cust_First_Name, Account_No FROM Customer_Details ORDER BY 3 DESC; Note: ORDER BY clause can be followed by the column name or the position of the column as appeared in the SELECT statement. 3. List the customers and their account numbers in descending order of the Customer Last Name and ascending order of account numbers. SELECT Cust_Last_Name, Cust_First_Name, Account_No FROM Customer_Details ORDER BY Cust_Last_Name DESC, Account_No; Or SELECT Cust_Last_Name, Cust_First_Name, Account_No FROM Customer_Details ORDER BY 1 DESC, 3; 4.6.4.8.

Aggregate Functions / Column Functions

SQL allows summarizing data from the database through a set of column or aggregate functions. A SQL column or aggregate function takes a complete column of data as its arguments and produces a single resultant data value that summarizes the column. Commonly used aggregate functions: x

SUM()

: computes the total of a given column

x

AVG()

: computes the average value in a given column

x

MIN()

: finds the smallest value in a given column

x

MAX()

: finds the largest value in a given column

x

COUNT()

: counts the number of non-NULL values in a given column

106 | P a g e

Infosys Foundation Program

Relational Database Management System

x

COUNT (*): counts rows of query results including rows which have NULL values. If there are no rows, this function returns a value zero. Note: Rows that have a NULL value in the relevant column are ignored by all the above aggregate function except count (*).

SUM ( [ DISTINCT ] column-name / expression ) AVG ( [ DISTINCT ] column-name / expression ) MIN ( expression) MAX ( expression ) COUNT ( [ DISTINCT ] column-name ) COUNT ( *)

Figure 4-25: Column functions syntax

Example: 1. List the minimum account balance from Customer_Transaction table. SELECT MIN (Total_Available_Balance_in_Dollars) FROM Customer_Transaction; 2. List the maximum account balance from Customer_Transaction table. SELECT MAX (Total_Available_Balance_in_Dollars) FROM Customer_Transaction; 3. List the average account balance of customers from Customer_Transaction table. SELECT AVG (Total_Available_Balance_in_Dollars) FROM Customer_Transaction; 4. List total number of account holders in the ‘Downtown’ Branch. SELECT COUNT (*) FROM Customer_Details WHERE Bank_Branch = ‘Downtown’;

107 | P a g e

Infosys Foundation Program

Relational Database Management System

5. List total number of Customers. SELECT COUNT (*) FROM Customer_Details; 6. List number of Customers having “Savings” Account. SELECT COUNT (*) FROM Customer_Details WHERE Account_Type = ‘Savings’; 7. List the minimum and sum of all account balances from Customer_Transaction table. SELECT MIN (Total_Available_Balance_in_Dollars), SUM (Total_Available_Balance_in_Dollars) FROM Customer_Transaction; 8. List total number of unique Customer Last Names from Customer_Details table. SELECT COUNT (DISTINCT Cust_Last_Name) FROM Customer_Details; Difference between COUNT(*) and COUNT(Column-name): 9. List total number of Employees. SELECT COUNT (*) FROM Employee_Manager; 10. List total number of Employees Employee_Manager table.

who

have

been

assigned

a

Manager

from

SELECT COUNT (Manager_ID) FROM Employee_Manager; Note: COUNT (Column-Name) counts the number of non-NULL values in a column whereas COUNT (*) counts rows of query results and includes NULL values in a column

4.6.4.9.

GROUP BY

The GROUP BY clause is used in a SELECT statement to collect data across multiple records and group the results by one or more columns.

108 | P a g e

Infosys Foundation Program

Relational Database Management System

Sometimes it is required to get information not about each row, but about each group. Example: Consider the Customer_Loan table that has data about all the loans taken by all the customers of the bank. Assume that we want to retrieve the total loan-amount of all loans taken by each customer. Related rows can be grouped together by the GROUP BY clause by specifying a column as a grouping column. In the above example, the Cust_ID will be the grouping column. In the output table all the rows with an identical value in the grouping column will be grouped together. Hence, the number of rows in the output is equal to the number of distinct values of the grouping column. SELECT Cust_ID, SUM(Amount_in_Dollars) FROM Customer_Loan GROUP BY Cust_ID;

GROUP BY Cust_ID Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Customer_Loan records from Customer_Loan table

Query Results Sum(Amount Cust_ID _in_Dollars) 101 8755.00 103 4555.00 104 3050.00

Figure 4-26: Example of Group BY Clause

109 | P a g e

Infosys Foundation Program

Relational Database Management System

SELECT Department, COUNT (Employee_ID) FROM Employee_Manager GROUP BY Department ; GROUP BY Department

Employee Employee_ Employee_ _ID Last_Name Mid_Name 2345Atherton S. 3556George A. 3620Jackson G. 22789Stevenson S. 23456Smith A. 30456Langer C. 31234Frost J. 32345Austen L.

Employee_ Employee_Email First_Name Cindy [email protected] Henry [email protected] Matt [email protected] Crystal [email protected] Luther [email protected] Christiana [email protected] Robert [email protected] Jane [email protected]

Department Grade Manager_ID HR Finance Design HR Finance HR Finance Design

1 1 1 2 2 3 3 2

NULL NULL NULL 2345 3556 2345 3556 3620

Records from Employee_Manager Table

Query Results Department HR Finance Design

Count(Employee_ID) 3 3 2

Figure 4-27: Example of GROUP BY clause SELECT Manager_ID, COUNT (Employee_ID) FROM Employee_Manager GROUP BY Manager_ID; GROUP BY Manager_ID

Employee Employee_ Employee_ _ID Last_Name Mid_Name 2345Atherton S. 3556George A. 3620Jackson G. 22789Stevenson S. 23456Smith A. 30456Langer C. 31234Frost J. 32345Austen L.

Employee_ Employee_Email First_Name Cindy [email protected] Henry [email protected] Matt [email protected] Crystal [email protected] Luther [email protected] Christiana [email protected] Robert [email protected] Jane [email protected]

Department Grade Manager_ID HR Finance Design HR Finance HR Finance Design

1 1 1 2 2 3 3 2

NULL NULL NULL 2345 3556 2345 3556 3620

Records from Employee_Manager Table

Query Results Manager_ID Count(Employee_ID) 2345 2 3556 2 3620 1 NULL

3

Figure 4-28: Example of GROUP BY clause

IF GROUP BY clause has been used in a SELECT statement, all the rows with an identical value in the grouping column will be grouped together.

110 | P a g e

Infosys Foundation Program

Relational Database Management System

Once the GROUP BY clause is used, the aggregate functions in the SELECT statement are calculated after grouping i.e., there is one value of the aggregate column for each value of the grouping column. Example: Refer to Figure 4-28. In the example, the grouping is based on the Manager_ID column. There are three records with NULL values in the Manager_ID column. All the three records are placed in the same group. It is the group with indeterminate values. This does not imply that NULL values are equal. Note: If the GROUP BY clause has been used in a SELECT statement, only the grouping columns (columns on which grouping has been done) or aggregate functions can appear in the column list specified in the SELECT statement.

Example: Invalid SQL statement SELECT Department, Manager_ID, COUNT(Employee_ID) FROM Employee_Manager GROUP BY Manager_ID;

The above SQL statement should be written as: SELECT Department, Manager_ID, COUNT(Employee_ID) FROM Employee_Manager GROUP BY Manager_ID, Department; Refer to Figure 4-29.

111 | P a g e

Infosys Foundation Program

Relational Database Management System SELECT Department, Manager_ID, COUNT (Employee_ID) FROM Employee_Manager GROUP BY Manager_ID, Department ; Group By Manager_ID,Department

Employee Employee_ Employee_ _ID Last_Name Mid_Name 2345Atherton S. 3556George A. 3620Jackson G. 22789Stevenson S. 23456Smith A. 30456Langer C. 31234Frost J. 32345Austen L.

Employee_ Employee_Email First_Name Cindy [email protected] Henry [email protected] Matt [email protected] Crystal [email protected] Luther [email protected] Christiana [email protected] Robert [email protected] Jane [email protected]

Department Grade Manager_ID HR Finance Design HR Finance HR Finance Design

1 1 1 2 2 3 3 2

NULL NULL NULL 2345 3556 2345 3556 3620

Records from Employee_Manager Table

Query Results Department HR Finance Design HR Finance Design

Manager_ID Count(Employee_ID) 2345 2 3556 2 3620 1 NULL 1 NULL 1 NULL 1

Figure 4-29: An example of GROUP BY clause

4.6.4.10.

HAVING

The HAVING clause is used along with the GROUP BY clause. The HAVING clause can be used to select and reject row groups. The format of the HAVING clause is similar to the WHERE clause, consisting of the keyword HAVING followed by a search condition. The HAVING clause thus specifies a search condition for groups.

112 | P a g e

Infosys Foundation Program

Relational Database Management System

SELECT Cust_ID, SUM(Amount_in_Dollars) FROM Customer_Loan GROUP BY Cust_ID HAVING SUM(Amount_in_Dollars) > 4000.00;

GROUP BY Cust_ID Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Customer_Loan records from Customer_Loan table

Query Results Sum(Amount Cust_ID _in_Dollars) 101 8755.00 103 4555.00

Figure 4-30: An example of HAVING Clause SELECT Department, COUNT (Employee_ID) FROM Employee_Manager GROUP BY Department HAVING COUNT(Employee_ID) > 2 ;

GROUP BY Department

Employee Employee_ Employee_ _ID Last_Name Mid_Name 2345Atherton S. 3556George A. 3620Jackson G. 22789Stevenson S. 23456Smith A. 30456Langer C. 31234Frost J. 32345Austen L.

Employee_ Employee_Email First_Name Cindy [email protected] Henry [email protected] Matt [email protected] Crystal [email protected] Luther [email protected] Christiana [email protected] Robert [email protected] Jane [email protected]

Department Grade Manager_ID HR Finance Design HR Finance HR Finance Design

1 1 1 2 2 3 3 2

Records from Employee_Manager Table

Query Results Department HR Finance

Count(Employee_ID) 3 3

Figure 4-31: An example of HAVING Clause

113 | P a g e

Infosys Foundation Program

NULL NULL NULL 2345 3556 2345 3556 3620

Relational Database Management System

Note: The WHERE clause can be used to select and reject the individual rows that participate in a query. The HAVING clause can be used to select and reject row groups.

4.6.4.11.

Retrieval using UNION

The UNION operation combines the rows from two sets of query results. By default, the UNION operation eliminates duplicate rows as part of its processing. Example: SELECT Cust_ID FROM Customer_Fixed_Deposit UNION SELECT Cust_ID FROM Customer_Loan; Refer to Figure 4-32. To retain duplicate rows in a UNION operation, specify the ALL keyword immediately following the word UNION. Example: SELECT Cust_ID FROM Customer_Fixed_Deposit UNION ALL SELECT Cust_ID FROM Customer_Loan;

Refer to Figure 4-33.

114 | P a g e

Infosys Foundation Program

Relational Database Management System Cust_ID

Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent 101Smith A. Mike [email protected] 2011 8055.00 6.5 103Langer G. Justin [email protected] 2015 2060.00 6.5 104Quails D. Jack [email protected] 3010 3050.00 6.5

Customer_Fixed_Deposit records from Customer_Fixed_Deposit table

Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Cust_ID 101 103 104 103

Cust_ID 101 103 104 UNION

Customer_Loan records from Customer_Loan table Query Results 101 103 104

Figure 4-32: Using UNION to combine query results

Cust_ID

Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent 101Smith A. Mike [email protected] 2011 8055.00 6.5 103Langer G. Justin [email protected] 2015 2060.00 6.5 104Quails D. Jack [email protected] 3010 3050.00 6.5

Customer_Fixed_Deposit records from Customer_Fixed_Deposit table

Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Cust_ID 101 103 104 103

Cust_ID 101 103 104 UNION ALL

Customer_Loan records from Customer_Loan table Query Results 101 103 104 103 101 103 104

Figure 4-33: Using UNION ALL to combine query results

There are some restrictions on the table that can be combined by a UNION operation:

115 | P a g e

Infosys Foundation Program

Relational Database Management System

x x

x

The SELECT statements combined using UNION number of columns The data type of each column in the first table the corresponding column in the second table. differ Neither of the two tables can be sorted with combined query results can be sorted

or UNION ALL must contain the same must be the same as the data type of The data width and column name can the ORDER BY clause. However, the

Note: Eliminating duplicate rows from query results is a time-consuming process, especially if the query results contain a large number of rows. If one is sure that the UNION operation cannot produce duplicate rows, one should specifically use the UNION ALL operation because the query will execute much more quickly.

4.6.4.12.

Retrieval using INTERSECT

The INTERSECT operation selects the common row from two sets of query results. Refer to Figure 4-34. Example: SELECT Cust_ID FROM Customer_Fixed_Deposit INTERSECT SELECT Cust_ID FROM Customer_Loan;

116 | P a g e

Infosys Foundation Program

Relational Database Management System Cust_ID

Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent 101Smith A. Mike [email protected] 2011 8055.00 6.5 103Langer G. Justin [email protected] 2015 2060.00 6.5 104Quails D. Jack [email protected] 3010 3050.00 6.5

Customer_Fixed_Deposit records from Customer_Fixed_Deposit table

Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Cust_ID 101 103 104 103

Cust_ID 101 103 104 INTERSECT

Customer_Loan records from Customer_Loan table Query Results 101 103 104

Figure 4-34: Using INTERSECT to combine query results

4.6.5.

Sub-Queries

A sub-query is a query within a query. The results of the sub-query are used by the DBMS to determine the results of the higher-level query that contains the sub-query. Usually, the subquery appears within the WHERE or HAVING clause of another SQL statement. SELECT [ ALL / DISTINCT ] column-name1, column-name2, ------ FROM table-specification [ WHERE search-condition ] [ GROUP BY grouping column ] [ HAVING search-condition ] [ ORDER BY sort-specification ] Figure 4-35: Basic sub-query syntax

The sub-query is enclosed in parentheses, but otherwise it has a form similar to that of a SELECT statement, with a FROM clause and optional WHERE, GROUP BY, and HAVING clauses. The form of these clauses in a sub-query is identical to that in a SELECT statement, and they perform their normal functions when used within a sub-query. 4.6.5.1. x x x

Independent Sub-Queries

Inner Query is independent of Outer Query Inner Query is executed first and the results are stored Outer Query then runs on the stored results

Example: To list the Cust_ID and Loan_No for all Customers who have taken a loan of amount greater than the loan amount of Customer (Cust_ID = 104).

117 | P a g e

Infosys Foundation Program

Relational Database Management System

SELECT Cust_ID, Loan_No FROM Customer_Loan WHERE Amount_in_Dollars > (SELECT Amount_in_Dollars FROM Customer_Loan WHERE Cust_ID = 104);

Sub-Query

Cust_ID

SELECT Amount_in_Dollars FROM Customer_Loan WHERE Cust_ID = 104;

data

101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Customer_Loan records from Customer_Loan table

Step 1

3050.00 Step 2

Cust_ID 101 103 104 103

3050.00 compared with values in Amount_in_Dollars

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Query Result

101

1011

Customer_Loan records from Customer_Loan table Figure 4-36: How an independent sub-query executes

The inner query, which retrieves the Amount_in_Dollars of Cust_ID, 104 can be executed independent of the outer query. Hence the name independent sub-query. The inner query needs to be executed only once, since it returns one constant value irrespective of the outer query. In the above example, the innermost query is executed first, the result is stored and then the outer query is executed for each row of the Customer_Loan table. The inner query is executed only once, while the outer one is executed as many times as the number of rows in the Customer_Loan table. Example: 1. List customer names of all customers who have taken a loan > $3000.00.

118 | P a g e

Infosys Foundation Program

Relational Database Management System

SELECT Cust_Last_Name, Cust_Mid_Name, Cust_First_Name FROM Customer_Details WHERE Cust_ID IN ( SELECT Cust_ID FROM Customer_Loan WHERE Amount_in_Dollars > 3000.00); 2. List customer names of all customers who have the same Account_type as Customer ‘Jones Simon’. SELECT Cust_Last_Name, Cust_Mid_Name, Cust_First_Name FROM Customer_Details WHERE Account_Type = ( SELECT Account_Type FROM Customer_Details WHERE Cust_Last_Name = ‘Jones’ AND Cust_First_Name = ‘Simon’); 3. List customer names of all customers who do not have a Fixed Deposit. SELECT Cust_Last_Name, Cust_Mid_Name, Cust_First_Name FROM Customer_Details WHERE Cust_ID NOT IN ( SELECT Cust_ID FROM Customer_Fixed_Deposit); 4. List customer names of all customers who have either a Fixed Deposit or a loan but not both at any of the Bank Branches. It will include names that have no fixed deposit and loan as well. SELECT Cust_Last_Name, Cust_Mid_Name, Cust_First_Name FROM Customer_Details WHERE Cust_ID NOT IN ( SELECT Cust_ID FROM Customer_Loan WHERE Cust_ID IN (SELECT Cust_ID FROM Customer_Fixed_Deposit)); 4.6.5.2.

Co-Related Sub-Queries

In co-related sub-queries, SQL performs a sub-query, once for each row of the main query. The column(s) from the table of the outer query is always referred in the inner query. Refer to Figure 4-37. Example: To list all Customers who have a fixed deposit of amount less than the sum of all their loans.

119 | P a g e

Infosys Foundation Program

Relational Database Management System

Query: SELECT Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name FROM Customer_Fixed_Deposit WHERE Amount_in_Dollars < (SELECT SUM (Amount_in_Dollars) FROM Customer_Loan WHERE Customer_Loan.Cust_ID = Customer_Fixed_Deposit.Cust_ID);

Figure 4-37: A Correlated Query

Explanation of the query: The inner query is repeated once for every record of the outer query. The outer query uses the Customer_Fixed_Deposit table. Refer to Figure 4-38. Customer_Fixed_Deposit Cust_ID Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent 101Smith A. Mike [email protected] 2011 8055.00 6.5 103Langer G. Justin [email protected] 2015 2060.00 6.5 104Quails D. Jack [email protected] 3010 3050.00 6.5

Figure 4-38: Customer_Fixed_Deposit table

The inner query uses the Customer_Loan table. Refer to Figure 4-39. Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Customer_Loan records from Customer_Loan table Figure 4-39: Customer_Loan table

The Customer_Fixed_Deposit table has three records. The inner query will be repeated three times. This is similar to the nested FOR loop that has been covered in Programming Fundamentals course. For the first record in the Customer_Fixed_Deposit table: 1. The record with the value of 101 in the Cust_ID column of the Customer_Fixed_Deposit table is read. Step 1 Customer_Fixed_Deposit Cust_ID

Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent 101Smith A. Mike [email protected] 2011 8055.00 6.5 103Langer G. Justin [email protected] 2015 2060.00 6.5 104Quails D. Jack [email protected] 3010 3050.00 6.5

120 | P a g e

Infosys Foundation Program

Relational Database Management System

2. All records with a value of 101 in the Cust_ID column of the Customer_Loan table are retrieved and their Amount_in_Dollars values are summed up. In the Example, there is only one record with a value of 101 in the Cust_ID column and the Amount_in_Dollars value is $8755.00. Step 2 Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Customer_Loan records from Customer_Loan table

3. This value is compared with $8055.00. Since the target value of $8055.00 is less than the Amount_in_Dollars value of $8755.00, the record with the Cust_ID value of 101 is part of the query result. Step 3 8055.00 < 8755.00 (True) The record with Cust_ID = 101 from Customer_Fixed_Deposit will occur in the query results

For the second record in the Customer_Fixed_Deposit table: 1. The record with the value of 103 in the Cust_ID column of the Customer_Fixed_Deposit table is read. Step 1 Customer_Fixed_Deposit Cust_ID

Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent 101Smith A. Mike [email protected] 2011 8055.00 6.5 103Langer G. Justin [email protected] 2015 2060.00 6.5 104Quails D. Jack [email protected] 3010 3050.00 6.5

2. All records with a value of 103 in the Cust_ID column of the Customer_Loan table are retrieved and their Amount_in_Dollars values are summed up. In the Example, there are two records with a value of 103 in the Cust_ID column and the sum of their Amount_in_Dollars values is $4555.00. Step 2 Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Customer_Loan records from Customer_Loan table

121 | P a g e

Infosys Foundation Program

Relational Database Management System

3. This value is compared with $2060.00. Since the target value of $2060.00 is less than the sum(Amount_in_Dollars) value of $4555.00, the record with the Cust_ID value of 103 is part of the query result. Step 3 2060.00 < 4555.00 (True) The record with Cust_ID = 103 from Customer_Fixed_Deposit will occur in the query results

For the third record in the Customer_Fixed_Deposit table: 1. The record with the value of 104 in the Cust_ID column of the Customer_Fixed_Deposit table is read. Step 1 Customer_Fixed_Deposit Cust_ID

Cust_Last_ Cust_Mid Cust_First Cust_Email Fixed_Deposit Amount_in_ Rate_of_Interest Name _Name _Name _No Dollars _in_Percent 101Smith A. Mike [email protected] 2011 8055.00 6.5 103Langer G. Justin [email protected] 2015 2060.00 6.5 104Quails D. Jack [email protected] 3010 3050.00 6.5

2. All records with a value of 104 in the Cust_ID column of the Customer_Loan table are retrieved and their Amount_in_Dollars values are summed up. In the Example, there is only one record with a value of 104 in the Cust_ID column and the Amount_in_Dollars value is $3050.00. Step 2 Cust_ID 101 103 104 103

Loan_No Amount_in_Dollars 1011 8755.00 2010 2555.00 2056 3050.00 2015 2000.00

Customer_Loan records from Customer_Loan table

3. This value is compared with $3050.00. Since the target value of $3050.00 is equal to the Amount_in_Dollars of $3050.00, the record with the Cust_ID value of 104 is not part of the query result. Step 3 3050.00 < 3050.00 (False) The record with Cust_ID = 104 from Customer_Fixed_Deposit will not occur in the query results

Output of the co-related query:

122 | P a g e

Infosys Foundation Program

Relational Database Management System

Output Table Cust_ID Cust_Last_ Cust_Mid_ Cust_First_ Name Name Name 101Smith A. Mike 103Langer G. Justin Figure 4-40: Output of co-related query

Note: For each row of the Customer_Fixed_Deposit table to be tested by the WHERE clause of the main query, the Cust_ID column (which appears in the sub-query as an outer reference) has a different value. Thus SQL carries out this sub-query - once for each row in the Customer_Fixed_Deposit table. A sub-query containing an outer reference is called a correlated sub-query because its results are correlated with each individual row of the main query. For the same reason, an outer reference is sometimes called a correlated reference. Example: List customer IDs of all customers who have both a Fixed Deposit and a loan at any of Bank Branches. SELECT Cust_ID FROM Customer_Details WHERE Cust_ID IN (SELECT Cust_ID FROM Customer_Loan WHERE Customer_Loan.Cust_ID

= Customer_Details.Cust_ID)

AND Cust_ID IN (SELECT Cust_ID FROM Customer_Fixed_Deposit WHERE Customer_Fixed_Deposit.Cust_ID

4.6.6.

= Customer_Details.Cust_ID);

JOINS

Join operations take two tables and return another table as a result. Cartesian Product / Cross Join Cross joins return all rows from the first table. Each row from the first table is combined with all rows from the second table. Cross joins are also known as the Cartesian product 41(or just the product) of two tables. The columns of the product table are all the columns of the first table, followed by all the columns of the second table. 41

Cartesian product: A mathematical term that, when applied to relational databases, refers to the result obtained by joining all the rows of one table with all the rows of another table in every possible combination.

123 | P a g e

Infosys Foundation Program

Relational Database Management System

Refer to Figure 4-41.

Table 1

A a1 a2

B b1 b2

C c1 c2 Cartesian Product ( m * n ) rows

Table 2

X x1 x2

Y y1 y2

A a1 a1 a2 a2

B b1 b1 b2 b2

C c1 c1 c2 c2

X x1 x2 x1 x2

Y y1 y2 y1 y2

Product of Table1 and Table2 Figure 4-41: The Cartesian product of two tables

4.6.6.1.

SELF JOIN

Joining a table with itself is a self-join. Example: Problem Statement: To list all the Employees (Employee_ID, Employee_Last_Name, Employee_First_Name) along with their Managers (Manager_ID, Manager_Last_Name, Manager_First_Name). Query: SELECT Emp.Employee_ID as “Employee ID”, Emp.Employee_Last_Name as “Employee Last Name”, Emp.Employee_First_Name as “Employee First Name”, Emp.Manager_ID as “Manager ID”, Manager.Employee_Last_Name as “Manager Last Name”, Manager.Employee_First_Name as “Manager First Name” FROM Employee_Manager Emp, Employee_Manager Manager WHERE Emp.Manager_ID = Manager.Employee_ID; Processing of the Query: Step 1: The table Employee_Manager has two aliases (another name), Emp and Manager. Step 2: Manager_ID attribute of Emp (alias for Employee_Manager) is matched with Employee_ID attribute of Manager (alias for Employee_Manager). The Figure below shows the matching of only two records. The other records are matched similarly.

124 | P a g e

Infosys Foundation Program

Relational Database Management System Emp (Alias for Employee_Manager) Employee Employee_ Employee_ _ID Last_Name Mid_Name 2345Atherton S. 3556George A. 3620Jackson G. 22789Stevenson S. 23456Smith A. 30456Langer C. 31234Frost J. 32345Austen L.

Employee Employee_ Employee_ _ID Last_Name Mid_Name 2345Atherton S. 3556George A. 3620Jackson G. 22789Stevenson S. 23456Smith A. 30456Langer C. 31234Frost J. 32345Austen L.

Employee_ Employee_Email First_Name Cindy [email protected] Henry [email protected] Matt [email protected] Crystal [email protected] Luther [email protected] Christiana [email protected] Robert [email protected] Jane [email protected]

Employee_ Employee_Email First_Name Cindy [email protected] Henry [email protected] Matt [email protected] Crystal [email protected] Luther [email protected] Christiana [email protected] Robert [email protected] Jane [email protected]

Department Grade Manager_ID HR Finance Design HR Finance HR Finance Design

1 1 1 2 2 3 3 2

NULL NULL NULL 2345 3556 2345 3556 3620

Department Grade Manager_ID HR Finance Design HR Finance HR Finance Design

1 1 1 2 2 3 3 2

NULL NULL NULL 2345 3556 2345 3556 3620

Manager (Alias for Employee_Manager)

Step 3: The columns that appear in the output table are specified in the column list used with the SELECT statement. Query Results Employee Employee Employee Manager Manager ID Last Name First Name ID Last Name 22789Stevenson Crystal 2345Atherton 23456Smith Luther 3556George 30456Langer Christiana 2345Atherton 31234Frost Robert 3556George 32345Austen Jane 3620Jackson

Manager First Name Cindy Henry Cindy Henry Matt

Figure 4-42: Output of SELF JOIN

4.6.6.2.

INNER JOINS

An inner join between two (or more) tables is the Cartesian product that satisfies the join condition in the WHERE clause. Inner joins use a comparison operator like = or <> to match rows from two tables based on the values in common columns from each table. Inner Joins include Equi-Joins A join in which the joining condition is based on equality between values in the common columns. Example: SELECT Table1.Emp_ID, Table1.City, Table2.Cust_ID, Table2.City FROM Table1, Table2

125 | P a g e

Infosys Foundation Program

Relational Database Management System

WHERE Table1.City = Table2.City; Table1 Emp_ID A1 A2 A3 A4 A5

Table2 CITY New YorK NULL Chicago Chicago Paris

INNER JOIN

Cust_ID B1 B2 B3 B4 B5

CITY New York New York NULL Chicago Moscow

Output Table Table1.Emp_ID Table1.City A1 New York A1 New York A3 Chicago A4 Chicago

Table2.Cust_ID Table2.City B1 New York B2 New York B4 Chicago B4 Chicago

Figure 4-43: An Example of inner join

4.6.6.3.

OUTER JOINS

An inner join provides only those values that satisfy the WHERE condition. However, it may be worthwhile sometimes, to retrieve all rows that match the WHERE clause and those that have unmatched rows in the column being compared. An outer join is then used to retrieve the rows with an unmatched value in the relevant column. Refer to Figure 4-44. Constructing a FULL OUTER JOIN: x x

x

Begin with the INNER JOIN of the two tables, using matching columns For each row of the left table that is not matched by any row in the right one row to the query results, using the values of the columns in the left assuming a NULL value for all columns of the right table For each row of the right table that is not matched by any row in the left one row to the query results, using the values of the columns in the right assuming a NULL value for all columns of the left table

126 | P a g e

table, add table, and table, add table, and

Infosys Foundation Program

Relational Database Management System Table1 Emp_ID A1 A2 A3 A4 A5

CITY New YorK NULL Chicago Chicago Paris

Table2 Cust_ID B1 B2 B3 B4 B5

CITY New York New York NULL Chicago Moscow Unmatched rows

Outer_Join Table Table1.Emp_ID Table1.City A1 New York A1 New York A3 Chicago A4 Chicago A5 Paris A2 NULL NULL NULL NULL NULL

INNER JOIN Unmatched rows

Table2.Cust_ID Table2.City B1 New York B2 New York B4 Chicago B4 Chicago NULL NULL NULL NULL B5 Moscow B3 NULL

Figure 4-44: An example of OUTER JOIN

Note: Full Outer Join is supported by Oracle 9i and later versions.

4.6.6.4.

LEFT OUTER JOIN

Constructing a LEFT OUTER JOIN: x x

Begin with the INNER JOIN of the two tables, using matching columns For each row of the left table that is not matched by any row in the right table, add one row to the query results, using the values of the columns in the left table, and assuming a NULL value for all columns of the right table

Refer to Figure 4-45. Note: The LEFT OUTER JOIN thus includes NULL-extended copies of the unmatched rows from the first (left) table but does not include any unmatched rows from the second (right) table.

Example: The syntax given is Oracle specific. SELECT Table1.Emp_ID, Table1.City, Table2.Cust_ID, Table2.City FROM Table1, Table2 WHERE Table1.City = Table2.City (+);

127 | P a g e

Infosys Foundation Program

Relational Database Management System

Table1 Emp_ID A1 A2 A3 A4 A5

CITY New YorK NULL Chicago Chicago Paris

Table2 Cust_ID B1 B2 B3 B4 B5

CITY New York New York NULL Chicago Moscow

Left_Outer_Join Table Table1.Emp_ID Table1.City A1 New York A1 New York A3 Chicago A4 Chicago A5 Paris A2 NULL

INNER JOIN Unmatched rows

Table2.Cust_ID Table2.City B1 New York B2 New York B4 Chicago B4 Chicago NULL NULL NULL NULL

Figure 4-45: An example of LEFT OUTER JOIN

4.6.6.5.

RIGHT OUTER JOIN

Constructing a RIGHT OUTER JOIN: x x

Begin with the INNER JOIN of the two tables, using matching columns For each row of the right table that is not matched by any row in the left table, add one row to the query results, using the values of the columns in the right table, and assuming a NULL value for all columns of the left table

Refer to Figure 4-46. Note: The RIGHT OUTER JOIN thus includes NULL-extended copies of the unmatched rows from the SECOND (right) table but does not include any unmatched rows from the first (left) table.

Example: The syntax given is Oracle specific. SELECT Table1.Emp_ID, Table1.City, Table2.Cust_ID, Table2.City FROM Table1, Table2 WHERE Table1.City (+) = Table2.City;

128 | P a g e

Infosys Foundation Program

Relational Database Management System

Table1 Emp_ID A1 A2 A3 A4 A5

CITY New YorK NULL Chicago Chicago Paris

Table2 Cust_ID B1 B2 B3 B4 B5

CITY New York New York NULL Chicago Moscow Unmatched rows

INNER JOIN

Right_Outer_Join Table Table1.Emp_ID Table1.City A1 New York A1 New York A3 Chicago A4 Chicago NULL NULL NULL NULL

Table2.Cust_ID Table2.City B1 New York B2 New York B4 Chicago B4 Chicago B5 Moscow B3 NULL

Figure 4-46: an example of RIGHT OUTER JOIN

4.6.7.

Queries using EXISTS / NOT EXISTS

4.6.7.1.

EXISTS

The EXISTS checks whether a sub-query produces any row(s) of results. Consider a nested query. If the query following the EXISTS returns at least one row, the EXISTS returns TRUE and stops further execution of the inner SELECT statement. The outer query will be executed only if the EXISTS returns true. If the inner query produces no rows, the EXISTS returns FALSE and the outer query will not be executed. The EXISTS test cannot produce a NULL value. Example: 1. List all Customers who have at least one Fixed Deposit more than $3000.00. SELECT Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name FROM Customer_Details CD WHERE EXISTS (SELECT * FROM Customer_Fixed_Deposit CFD

129 | P a g e

Infosys Foundation Program

Relational Database Management System

WHERE CFD.Amount_in_Dollars > 3000.00 AND CFD.Cust_ID = CD.Cust_ID); Note: CD is the alias for Customer_Details. CFD is the alias for Customer_Fixed_Deposit.

2. List all Customers who have both a Fixed Deposit and a Loan at the Bank. SELECT Cust_ID FROM Customer_Fixed_Deposit WHERE EXISTS (SELECT * FROM Customer_Loan WHERE Customer_Loan.Cust_ID = Customer_Fixed_Deposit.Cust_ID);

4.6.7.2.

NOT EXISTS

The logic of the EXISTS test can be reversed by using the NOT EXISTS form. In this case, the test is TRUE if the sub-query produces no rows, and FALSE otherwise. Example: List all Customers who do not have a single Fixed Deposit over $3000.00. SELECT Cust_ID, Cust_Last_Name, Cust_Mid_Name, Cust_First_Name FROM Customer_Details CD WHERE NOT EXISTS (SELECT * FROM Customer_Fixed_Deposit CFD WHERE CFD.Amount_in_Dollars > 3000.00 AND CFD.Cust_ID = CD.Cust_ID);

4.6.8.

The Order of Execution of a SELECT statement

If a SELECT Statement contains a WHERE, GROUP BY, HAVING and ORDER BY CLAUSE, the order of execution is as follows: 1. The WHERE clause is applied first, and the rows for which the search condition in the WHERE clause returns a TRUE are retained. 2. Next a GROUP BY clause is applied. It will group the rows selected by the WHERE clause such that all the rows in each group have the same value for the column in the GROUP BY clause. 3. Next the HAVING clause is applied. It will retain row groups for which the search condition in the HAVING clause returns a TRUE value.

130 | P a g e

Infosys Foundation Program

Relational Database Management System

4. Lastly the query result is sorted in the order specified in the ORDER BY clause.

4.7.

Views

A view is a virtual table in the database defined by a query. A view does not exist in the database as a stored set of data values. The rows and columns of data visible through the view are produced by the query that defines the view.

CREATE VIEW view-name column-name1, column-name2, ---------------- AS query

Figure 4-47: The CREATE VIEW statement syntax

4.7.1.

Horizontal View

Horizontal view restricts a user’s access to selected rows of a table.

CREATE VIEW view_cust AS SELECT * FROM Customer_Details WHERE Cust_ID in (101,102,103);

Figure 4-48: Horizontal View

4.7.2.

Vertical View

Vertical view restricts a user’s access to select columns of a table.

CREATE VIEW view_cust AS SELECT Cust_ID, Account_No, Account_Type FROM Customer_Details;

Figure 4-49: Vertical View

4.7.3.

DROP VIEW Statement

The DROP VIEW statement is used to drop a view.

131 | P a g e

Infosys Foundation Program

Relational Database Management System

DROP VIEW view-name Figure 4-50: DROP VIEW statement syntax

4.7.4.

Joined Views

Joined Views are used to simplify multi-table queries. A joined view draws its data from two or three different tables and presents the query results as a single virtual table. Once the view is defined, one can use a single-table query against the view for requests that would otherwise each require a two-table or three-table join. CREATE VIEW Cust_View As SELECT Customer_Details.Cust_Last_Name, Customer_Details.Cust_First_Name, Fixed_Deposit_No, Amount_in_Dollars FROM Customer_Details, Customer_Fixed_Deposit WHERE Customer_Details.Cust_ID = Customer_Fixed_Deposit.Cust_ID;

Figure 4-51: Joined Views

A view can be referenced like a real table in a SELECT, INSERT, DELETE, or UPDATE statement. However, more complex views cannot be updated; they are read only views.

4.7.5.

VIEW Updates

A view can be updated if the query that defines the view meets all of the following restrictions: x x x x x

DISTINCT must not be specified; that is, duplicate rows must not be eliminated from the query results The FROM clause must specify only one updateable table; the view must have a single underlying source table The SELECT list cannot contain expressions, calculated columns, or column functions The WHERE clause must not include a sub query; only simple row-by-row search conditions may be used The SELECT list must include all the columns specified with the NOT NULL constraint

4.7.6.

Checking View Updates (CHECK OPTION)

If a view is defined by a query that includes the WHERE clause, only rows that meet the search criteria are visible in the view. Other rows may be present in the source table(s) from which the view is derived, but they are not visible through the view.

132 | P a g e

Infosys Foundation Program

Relational Database Management System

Example: CREATE VIEW view_customer AS SELECT Cust_ID, Cust_Last_Name, Account_No, Account_Type, Bank_Branch FROM Customer_Details WHERE Bank_Branch = ‘Downtown’ ;

Figure 4-52: Creation of a simple view

INSERT INTO view_customer VALUES (115, ’Costner’, 107, ‘Savings’, ‘Bridgewater’); Figure 4-53: Insertion in a simple view

Note: This is a perfectly valid SQL statement, and the RDBMS inserts a new row with the specified column values into the Customer_Details table. However, the newly inserted row does not meet the search condition for the view. As a result, if one runs this query immediately after the INSERT statement the newly added row does not show up in the view.

SELECT Cust_ID, Cust_Last_Name, Bank_Branch FROM view_customer;

SQL can allow DBMS to detect and prevent this type of INSERT or UPDATE from taking place through the view by creating the view with the CHECK OPTION. The CHECK OPTION is specified in the CREATE VIEW statement, as shown below:

133 | P a g e

Infosys Foundation Program

Relational Database Management System

CREATE VIEW view_customer AS SELECT Cust_ID, Cust_Last_Name, Account_No, Account_Type, Bank_Branch FROM Customer_Details WHERE Bank_Branch = ‘Downtown’ With CHECK OPTION;

Figure 4-54: Create view with CHECK OPTION

4.7.7. x x

x

x

Advantages of Views

Security: A user can be permitted to access the database, only through a small set of views that contain the specific data the user is authorized to see Query simplicity: A view can draw data from several different tables and present it as a single table, thus effectively turning multi-table queries into single-table queries. Internally RDBMS uses multi-table queries Structural simplicity: Views can give a user, a personalized view of the database structure, presenting the database as a set of virtual tables that make sense to the user Data integrity: If data is accessed and entered through a view, the DBMS can automatically check the data to ensure that it meets specified integrity constraints

4.7.8.

Disadvantages of Views

x

Performance: The DBMS translates the queries against the view into queries against the underlying source tables. If a view is defined by a multi-table query, then even a simple query against a view becomes a complicated join, and it may take a long time to complete. This is in reference to insert, delete and update operations

x

Update restrictions: When a user tries to update rows of a view, the DBMS must translate the request into an update on rows of the underlying source tables. This is possible for simple views, but more complicated views cannot be updated

4.8.

Data Control Language (DCL)

DCL statements are used to control access to the database and the data in it. It is used to enforce data security.

134 | P a g e

Infosys Foundation Program

Relational Database Management System

4.8.1.

Granting Privileges

The GRANT statement is used to grant security privileges on database objects to specific users. Normally, the GRANT statement is used by the owner of the table or view to give other users access to the data.

GRANT SELECT/ INSERT / DELETE / UPDATE / ALL PRIVILEGES ON table-name TO user-name / PUBLIC [ WITH GRANT OPTION ]

Figure 4-55: The GRANT statement syntax

Example: GRANT SELECT, INSERT ON Customer_Details TO Edwin ; GRANT ALL PRIVILEGES ON Customer_Loan TO JACK ; GRANT ALL ON Customer_Loan TO PUBLIC ;

4.8.1.1.

Passing Privileges (GRANT OPTION)

A GRANT statement with the WITH GRANT OPTION clause conveys, along with the specified privileges, the right to grant those privileges to other users.

EDWIN

1.

WITH GRANT OPTION

2.

GRANT

JACK

BORIS

Figure 4-56: Using the GRANT OPTION

135 | P a g e

Infosys Foundation Program

Relational Database Management System

4.8.2.

Revoking Privileges (REVOKE)

The REVOKE statement is used to REVOKE privileges previously granted with the GRANT statement.

REVOKE SELECT/ INSERT / DELETE / UPDATE / ALL PRIVILEGES ON table-name FROM user-name / PUBLIC

Figure 4-57: The REVOKE statement syntax

Example: REVOKE SELECT, INSERT ON Customer_Details FROM Edwin ; REVOKE ALL PRIVILEGES ON Customer_Loan FROM JACK ; REVOKE ALL ON Customer_Loan FROM PUBLIC ;

1.

WITH GRANT OPTION

EDWIN

3. REVOKE 2.

GRANT

JACK

BORIS Figure 4-58: REVOKE with CASCADE

136 | P a g e

Infosys Foundation Program

Relational Database Management System

4.9.

Best Practices

1. Do not use SELECT *. This is time-consuming and reduces performance. Instead, list out each field that is required. 2. It is potentially dangerous to use SELECT * in embedded SQL i.e. SQL embedded in an application program because the meaning of the asterisk (*) might change. Example: if a column is added to or dropped from some table. 3. While Evaluating NULL in a WHERE clause of a query, use IS NULL as opposed to = NULL. 4. If one is sure that the UNION operation cannot produce duplicate rows, use the UNION ALL as opposed to UNION because the query will execute much more quickly. 5. If the GROUP BY clause has been used in a SELECT statement, then use only the grouping columns (columns on which grouping has been done) or aggregate functions in the column list of the SELECT statement. 6. Rows that have a NULL value in the relevant column are ignored by all the aggregate function except count (*). 7. Index is most appropriate when queries against a table are more frequent than INSERT and UPDATE operations 8. EXISTS is beneficial when the most selective filter is in the parent query. This allows the selective predicates in the parent query to be applied before filtering the rows against the EXISTS criteria. 9. IN is most beneficial when the most selective filter appears in the sub-query and there are indexes on the join columns Tips to write a good query: Tip 1: SELECT account_no, trans_date, amount FROM transaction WHERE amount + 3000 < 5000; Replace the above query with the following SELECT account_no, trans_date, amount FROM transaction WHERE amount < 2000; 137 | P a g e

Infosys Foundation Program

Relational Database Management System

Reason: Avoid unnecessary computational overhead in queries. Tip2: SELECT quantity, AVG(actual_price) FROM item GROUP BY quantity HAVING quantity > 40; Replace the above query with the following: SELECT quantity, AVG(actual_price) FROM item WHERE quantity > 40 GROUP BY quantity; Reason: The WHERE clause filters the rows from the table according to the search condition. Then the GROUP BY clause is applied only on the filtered rows. It saves time. If as opposed to this, if the rows are grouped first then the row groups are filtered using the HAVING clause, it leads to an increased overhead in terms of time required for execution of the query.

Tip 3: Problem Statement: To retrieve the average salary for ‘presidents’ and ‘managers’. SELECT job, avg(sal) FROM emp GROUP BY job HAVING job = 'president' OR job = 'manager'; Replace the above query with the following: SELECT job, avg(sal) FROM emp WHERE job = 'president’ OR job = 'manager' GROUP BY job; Reason: The WHERE clause filters the rows from the table according to the search condition. Then the GROUP BY clause is applied only on the filtered rows. It saves time. If as opposed to this, if the rows are grouped first then the row groups are filtered using the HAVING clause, it leads to an increased overhead in terms of time required for execution of the query. Tip 4: Problem Statement: To select records from debit_transactions table, credit_transactions table where tran_date is ’31-DEC-99’ SELECT acct_num, balance_amt FROM debit_transactions WHERE tran_date = `31-DEC-99' UNION

138 | P a g e

Infosys Foundation Program

Relational Database Management System

SELECT acct_num, balance_amt FROM credit_transactions WHERE tran_date = `31-DEC-99'; Replace the above query with the following: SELECT acct_num, balance_amt FROM debit_transactions WHERE tran_date = `31-DEC-99' UNION ALL SELECT acct_num, balance_amt FROM credit_transactions WHERE tran_date = `31-DEC-99'; Reason: Eliminating duplicate rows from query results is a very time-consuming process, especially if the query results contain a large number of rows. If one is sure that the UNION operation cannot produce duplicate rows, one should specifically use the UNION ALL operation because the query will execute much more quickly. Tip 5: Problem statement: To determine if transaction(s) was made on ‘25-JAN-2005’ SELECT COUNT(*) FROM Customer_Transaction WHERE Transaction_Date = ‘25-JAN-2005’; Replace the above query with the following: SELECT Cust_Last_Name, Cust_Mid_Name, Cust_First_Name FROM Customer_Details WHERE EXISTS (SELECT Cust_ID FROM Customer_Transaction WHERE Transaction_Date = ‘25-JAN-2005’); Reason: When COUNT (*) is used, it scans the entire table which is a time consuming operation. If EXISTS is used, it checks whether a sub-query produces any rows of query results. If the sub-query following the EXISTS returns at least one row, the EXISTS test returns TRUE and stops further execution of the inner SELECT statement. It thus minimizes overhead.

139 | P a g e

Infosys Foundation Program

Relational Database Management System

4.10. Summary x

The CREATE TABLE statement creates a table and defines its columns, PRIMARY KEY, FOREIGN KEY(s) and other constraints like UNIQUE and NOT NULL

x

The DROP TABLE statement removes a previously created table from the database

x

The ALTER TABLE statement can be used to add a column to an existing table, modify a column definition, add/drop a PRIMARY KEY, FOREIGN KEY and other constraints like UNIQUE and NOT NULL

x

The CREATE INDEX statement can be used to define indexes, which speeds up database queries but add overheads to database updates

x

If a SELECT Statement contains a WHERE, GROUP BY, HAVING and ORDER BY CLAUSE, the order of execution is as follows:

x

o

The WHERE clause is applied first, and the rows for which the search condition in the WHERE clause returns a TRUE are retained.

o

Next a GROUP BY clause is applied. It will group the rows selected by the WHERE clause such that all the rows in each group have the same value for the column in the GROUP BY clause.

o

Next the HAVING clause is applied. It will retain row groups for which the search condition in the HAVING clause returns a TRUE value.

o

Lastly the query result is sorted in the order specified in the ORDER BY clause.

DCL statements are used to control access to the database and the data in it. It is used to enforce data security

140 | P a g e

Infosys Foundation Program

Relational Database Management System

5. On-Line Transaction Processing(OLTP) 5.1.

Purpose

The biggest responsibility of the modern day information system is x To simulate42 the manual system x To record every transaction that the organization undertakes x Capture the day-to-day activities in the life cycle of an enterprise x Help the organization to make quick and correct decisions based on the data x Protect the data from unauthorized access x Recovering the data in case of failures Every organization requires some on-line application system or server to manage their daily activities. These systems help in recording the transactions, the organization goes through with their employees, customers and vendors. It is impossible to imagine an enterprise without an on-line transaction system. To build an efficient on-line transaction system, it is necessary to know how these systems are built, the difficulties that may encounter and how to overcome them. In this cyber-age, we need to know how to protect data from un-authorized usage and how to recover the data in case of failures.

5.2.

Transaction

A transaction is nothing but an interaction between different users, or different systems or user and a system. A transaction is a logical unit of work which takes the database from one consistent state to another consistent state. While moving from one consistent state to another consistent state, the database may pass through multiple discrete steps. The database may go back to its original state at the end of the transaction (this happens in the case of failure) or to the next logical step (this happens in the case of success). Consider the following examples: Example1: Drawing money from a bank account is one transaction. This transaction has multiple steps. x Insert the ATM card into the ATM machine 42

Simulate: To make a model.

141 | P a g e

Infosys Foundation Program

Relational Database Management System

x x x x x

Enter the PIN number Machine validates the PIN number Choose the appropriate menu for money withdrawal The ATM machine checks for the account balance to ensure that all banking business rules are strictly followed. After doing all the checks, the ATM machine correctly dispenses out the exact amount and updates the records accordingly

For any reason, if any of the above steps fail, then the transaction itself fails and the records are not modified. This means that the system goes back to its original state and no change is made to the system. If ALL the steps have been successfully carried out, then the records are updated accordingly and the system goes to a state where it is equipped for the next transaction. Example2: A person is interested in transferring money from account Acc1 to account Acc2. This transaction has following steps: x x x x x x x x

Insert the ATM card into the ATM machine Enter the PIN number Machine validates the PIN number Choose the appropriate menu for money transfer Enter information of the beneficiary account The ATM machine checks for the account balance to ensure that all banking business rules are strictly followed After verifying the balance, amount will be debited from account Acc1 and the records are updated accordingly The deducted amount will be deposited to Account Acc2 and the records are updated accordingly.

From the above steps, it is evident that the transaction may have ‘n’ number of physical steps. Transaction is successful only if ALL the steps are carried out successfully. A transaction cannot be divided into smaller tasks. The successful completion of the transaction is called as the COMMIT state. After this state, changes are permanent and irreversible. If ANY ONE step fails, the complete transaction fails and the system is taken back to the original state which was present before the beginning of the transaction. This process of going back to original state is called as ROLLBACK. If the transaction rolls back, then the transaction reaches the ABORT state.

142 | P a g e

Infosys Foundation Program

Relational Database Management System

Figure 5-1: The transaction state transition diagram BEGIN

While executing

Active When normal execution can’t proceed

After the final statement has been executed Partially completed

Failed After rolling back and restoration to previous state

After successful completion Committed

Aborted

Figure 5-1: Transaction state transition diagram

5.3.

Transaction Systems

The transaction processing (TP) systems which mimic the real life system like Salary processing, library, banking, airline, defence missile systems are basically divided into three categories.

5.3.1.

Batch Transaction Processing System

In the batch transaction processing system, a set of application programs work on a set of input data to produce the desired output. In this process there will be absolutely NO human interaction. The best example for batch processing is the salary slip generation application. The salary slip generation program may read the data like employee name, grade, basic salary, date of joining, overtime for the week, loss of pay, loans, recoveries, etc., from the database and generates the salary slips. Mail will be sent to employees automatically, without any intervention by the users.

5.3.2.

On-line Transaction Processing System (OLTP)

In OLTP system, the user will be continuously interacting with the system through a computer or a terminal on a regular basis. Some examples for the online systems are the Air-line reservation, Railway

143 | P a g e

Infosys Foundation Program

Relational Database Management System

reservation system, the Banking ATM machine, the Library application, etc. In these kind of systems, the user needs to enter pre-defined inputs like flight number, train number, date of journey, amount to withdraw, book access number, return date, etc. Based on these pre-defined inputs, the system produces pre-defined outputs like the confirmed tickets, or non availability of ticket, the issuing of library book for a certain period, etc. We shall study in-depth about the OLTP system in this chapter.

5.3.3.

Real time Transaction Processing System

This system is the most complicated among all the transaction systems. It is capable of handling unexpected inputs to unexpected outputs. Examples: Air traffic control system or Missile defense system. These systems are capable of handling a sudden change in the air pressure, the temperature, the wind direction, the target speed and the direction and can change their output based on these inputs. These real time applications are similar to on-line systems except for the reason that the input and the output to the system is not all the time pre-defined.

5.4.

Transaction Properties

Every transaction system must possess the following characteristics: Atomicity: Transactions should either completely succeed or completely fail. For any reasons, if the system crashes before the completion of the transaction, the database state should not change. The data, which was involved with the transaction, should be restored to the previous consistent state in the database. The transaction is indivisible or undividable which means it cannot be divided further into sub tasks. Consistency: Transactions must preserve database consistency or stability. A transaction transforms the database from one consistent state to another consistent state. Isolation: A transaction's operations like INSERT, SELECT, UPDATE and DELETE should not interfere with other transactions, or in other words it should not interfere with transactions of other users of the database. The database system should reveal the individual changes made by a transaction only after a transaction completed successfully. Durability: Once a transaction completes (commits), the changes made to database are permanent and available to all the transactions that follow it.

144 | P a g e

Infosys Foundation Program

Relational Database Management System

These properties are called as ACID (derived from the first letter of the above characteristics) properties.

5.5.

Requirements for an OLTP System

In addition to ACID properties, OLTP systems have additional requirements to meet. In the following sections, these requirements are discussed.

5.5.1.

Integrity

All the data entering into the system must be validated for its correctness and adherence to the organization’s business rules. This is implemented in RDBMS through three types of integrity checks. Domain Integrity involves the implementation of business domain specific rules. Example: If an organization decides not to hire an employee who is above 58 years and less than or equal to 20 years. This can be implemented using the CHECK constraint. CREATE TABLE Employee ( Emp_number NUMBER(6) CONSTRAINT pk_employee PRIMARY KEY, Emp_Name VARCHAR2(25) NOT NULL, Dept_number NUMBER(5) REFERENCES DEPARTMENT(Dept_number), Date_of_Birth DATE NOT NULL, Date_of_joining DATE DEFAULT sysdate, CHECK ((Date_of_joining - Date_of_Birth) >= 20 AND (Date_of_joining - Date_of_Birth) <= 58 )); Entity integrity is implemented using the primary key constraint. Basically entity integrity refers to the fact that a particular attribute uniquely identifies the physical entity. Example: For each employee, the employee number uniquely identifies an employee. This means employee number 0007 always represent details of employee Gopalakrishnan S. Hence entity integrity enforces that primary keys cannot have either null values or duplicate values. Referential Integrity is implemented using the relationships between primary keys and foreign keys of tables within a database. This ensures consistency of data. Referential integrity demands that the value of every foreign key present in every table is matched by the value of a primary key in another table. This relationship is called as parent-child relationship. For example, every employee of the organization must belong to a

145 | P a g e

Infosys Foundation Program

Relational Database Management System

valid department. Hence department number column of the employee table refers to the department number column of department table. One cannot insert into the department number column of employee table unless the value is present in the department number column of department table (except for NULL values). If NULL is a value in the foreign key column, it represents the ‘unknown state’ and is not a violation of referential integrity. In above example, department table is the parent table and employee table is child table. After enforcing referential integrity, the parent table primary key value might be deleted. This will violate the referential integrity of the child table. This is because the child table might still contain records containing the original parent table primary key value. For example, employee table contains department number as the foreign key column, which refers to the primary key of the department table. If any department number is deleted from the department table, and if the employee table contain the corresponding department number value, it leads to violation of referential integrity. To avoid such situations, the following restrictions on the foreign key columns of child table can be put at the time of creation. x ON DELETE RESTRICT – Do not allow to delete the parent table data if it is referred in child table. For example if department number 10 is referred in employee table, then do not allow to delete the department number 10 in department table. This is default clause for Oracle x ON DELETE SET NULL – On delete of the parent data, set NULL value in child table wherever the deleted data is referred. For example if department number 10 is referred in employee table, and it is deleted in department table, set NULL values in corresponding department number columns of employee table (wherever department number 10 was referred) x ON DELETE SET DEFAULT – Set the default values to child records on deletion of parent records. For example if department number 10 is referred in employee table, and it is deleted in department table, set default values (say 00) in employee table wherever department number 10 was referred x ON DELETE SET CASCADE – Delete all the child table records from child table on deletion of parent record in parent table. For example if department number 10 is referred in employee table, and it is deleted in department table delete all the records in employee table wherever department number 10 was referred

146 | P a g e

Infosys Foundation Program

Relational Database Management System

5.5.2.

Concurrency

Concurrency means allowing different transactions to execute simultaneously. The biggest challenge of having a concurrent system is maintenance of consistency in the system, in spite of the multiple transactions executing simultaneously. Consider a simple example to know how concurrency affects consistency. Assume there are only three seats available between Bangalore and Singapore on a particular day and around ten people are trying to book the ticket on the same flight, the same day. The system allows transactions to occur simultaneously and these ten people can book those three seats from different locations. These ten people get a ticket each when there were just three seats available! This is a BIG violation of consistency or integrity of the system. In next section listed all the possible consistency problems with transactions, occurring simultaneously.

5.5.2.1.

Lost Update

Let us understand lost update concept using a banking application. Assume Person named Hilary holds an account in the Capital Bank, San Jose branch with USD 1500 account balance. One fine day she deposits USD 2000 by cash to her account. While the branch clerk is updating her account at San Jose, her husband Kevin deposits USD 1800 to her account almost at the same time in San Francisco. Clerk in San Francisco is not aware of the other transaction and adds this amount to her balance. Her balance is now updated to USD 3300 thereby ignoring her last deposit of USD 2000 at San Jose. Basically we lost one update which happened on her account! This problem has occurred because two transactions are working on the same resource without knowing each other’s activity. The Figure 5-2 gives the snapshot of main memory43 in which these two transactions operating on Hilary’s account. Note that: x This diagram is not RDBMS table. It is a snapshot of main memory at that point of time 43

Main Memory: Please recall CHSSC concepts - All the read and write operations happen in main memory before they are written into hard disks.

147 | P a g e

Infosys Foundation Program

Relational Database Management System

x

Balance column indicates the current available balance of Hilary when two transactions are concurrently running

Time

Hilary's Deposit

Balance

10:22 10:23 10:24 10:25 10:26 10:27 10:28 10:29

Read Balance (1500) Balance=1500+2000

1500

Write new Balance (3500) Commit

3500

Kevin's Deposit Read Balance (1500)

3300

Balance=1500+1800 Write new Balance (3300) Commit

Figure 5-2: Lost Update

5.5.2.2.

Dirty Read

Let us re-visit the same example discussed for lost update case with slightly different scenario as shown in Figure 5-3. This time, Hilary again tries to deposit USD 2000 to her account but due to some technical reasons the transaction will not be successful. We know that a transaction can be either in the prior state or a new state after the completion of the transaction. So, her deposit transaction is aborted and her balance is rolled back to the original value USD 1500. But unfortunately Kevin’s transaction read the balance value as USD 3500 in main memory before the rollback of previous transaction. Due to this problem, Kevin’s transaction occurring in San Francisco read the dirty data. This kind of problem is known as dirty read.

Time

Hilary's Deposit

Balance

11:22 11:23 11:24 11:25 11:26 11:27 11:28

Read Balance (1500) Balance=1500+2000 Write new Balance (3500)

1500

148 | P a g e

Kevin's Deposit

3500 Read Balance (3500)

Rollback 5300

Balance= 3500 + 1800 Write new Balance (5300)

Infosys Foundation Program

Relational Database Management System

11:29

Commit Figure 5-3: Dirty Read

5.5.2.3.

Incorrect Summary

Let us consider another scenario where Hilary wants to transfer amount of USD 500 to her sister Evelyn’s account in the same branch. After deducting USD 500, Hilary’s balance will be USD 1000. Evelyn’s account balance was USD 1500 before and now will become USD 2000 with the addition of USD 500. Almost at the same time, the bank branch manager starts another transaction to calculate the total sum available in bank through customer deposits. This program calculates the sum by reading Hilary’s balance amount as USD 1500 (before deduction of transfer amount) and Evelyn’s balance as USD 2000 (after addition of transfer amount). This program concludes that the sum is USD 3500 (sum of Hilary’s balance amount and Evelyn’s balance amount) but actually it is only USD 3000. This problem is known as incorrect summary. Snapshot of main memory for these transactions are shown in Figure 5-4.

Time Hilary's Transfer 12:22 12:23 12:24 12:25 12:26 12:27 12:28 12:29 12:30 12:31 12:32 12:33

Read Hilary’s Balance (1500) Balance=1500-500 Write new Balance (1000)

Balance

Summary Transaction

1500

Sum = 0 Read Hilary's Balance (1500)

1000 Sum=Sum + Balance (1500)

Read Evelyn's Balance (1500) Balance=1500 + 500 Write new Balance (2000) Commit

1500 2000 2000

3500

Read Evelyn's Balance (2000) Sum=Sum + Balance (3500) Write Sum (3500) Commit

Figure 5-4: Incorrect Summary

5.5.2.4.

Phantom Record

Let us consider the snapshot of two different transactions which are running simultaneously almost at the same time as shown in Figure 5-5.

149 | P a g e

Infosys Foundation Program

Relational Database Management System

One transaction is counting the number of accounts held by the bank. Another transaction is creating new accounts. Though two accounts are created and committed before completion of “Total Accounts” transaction, these accounts (Simon’s account and Mike’s account) are missed by “Total accounts” transaction for counting. These newly inserted rows appear as phantom to the “Total Accounts” transaction, inconsistently appearing and disappearing. This is called Phantom record because Phantom is considered as an invisible ghost as in the case of newly inserted rows.

150 | P a g e

Infosys Foundation Program

Relational Database Management System

Time

Create account

Total Accounts

13:22 13:23 Read the total number of accounts in the bank as total

13:24 13:25 13:26 13:27 13:28 13:29 13:30

Create account for Simon with a deposit of 500 Create account for Mike with a deposit of 1000 Commit Write Total Commit Figure 5-5: Phantom Record

If we observe these problems closely, all these problems are because of interleaving of the transactions. The solution to overcome these problems would be to make every transaction follow each other. This is called as serialization. Serialization of transactions can be achieved by setting following rules on transactions. 1. If any row is being modified, then do not allow any other transaction either to read or update/delete that row until the first transaction completes. 2. If a transaction is reading a particular row, prevent other transactions from making any changes to that row until the first transaction completes. 3. If a transaction is reading some data, do not allow any other transaction to insert new rows into the same table until the first transaction completes. This will avoid problems like Phantom records. Serialization can be achieved using Locking or Time Stamping techniques.

5.6.

Locks

Locking is a technique to have a controlled access to the resources like a database, tablespace 44, table, rows and columns. While these resources are put under lock by some transaction, other transactions have very restricted or no access to these resources, depending on the locking mode. Locking is one of the most widely used techniques in commercial RDBMS products to achieve consistency while supporting concurrency of transactions.

44

Tablespace: The logical part of the database which represents collection of the structures like tables, etc created by various users.

151 | P a g e

Infosys Foundation Program

Relational Database Management System

Basically resources can be locked either in Shared (S) mode for Read purpose or in Exclusive (X) mode for Update, Delete or Insert purpose.

5.6.1.

Shared Lock (S)

This locking technique allows a higher transaction concurrency. When a particular resource like a table or a row is locked in the shared mode by one transaction, all other transactions can perform the read operation on the locked resource, but no updates or modifications are possible by other transactions. Usually SELECT operation takes the S lock on resources.

5.6.2.

Exclusive Lock (X)

This is the most restrictive lock. Once a transaction puts the X lock on a particular resource, no other transaction can put any kind of lock on this resource. This resource is exclusively reserved for the first transaction and no other transaction can use it for read or write operation. Hence this X lock allows the least concurrency. Usually INSERT/UPDATE/DELETE operations put the X lock on resources before writing/modifying/deleting operations. The Figure 5-6 explains the compatibility of these locks. This figure can be interpreted as: If transaction T1 locks the resource A (database/tablespace/table/row/field) in shared(S) mode, another transaction T2 can also lock the same resource A, in shared(S) mode. If transaction T1 locks the resource A (database/tablespace/table/row/field) in shared (S) mode, another transaction T2 cannot lock the same resource A, in exclusive (X) mode until T1 releases its S lock on the resource A. If transaction T1 locks resource A in X mode no other transaction can lock resource A in any other (S or X) mode until T1 releases its X lock on the resource A. Transaction T1 Transaction T2

152 | P a g e

A X S

X S 8 8 8 9 Infosys Foundation Program

Relational Database Management System

Figure 5-6: Share - Exclusive Lock Matrix

In Figure 5-6, symbol 8 represents re epresents incompatibility of the lock and represents compatibility of the lock.

5.7.

9 symbol

Granularity of Locking

Granularity of locking refers to the granular level at which a resource can be locked. Take for example a database. A database is made up of multiple tablespaces. Each tablespace hosts multiple tables. Within a table there are multiple rows and fields as shown in Figure 5-7. It is possible to lock a x x x x x

Database Tablespace Table Row Field

If RDBMS application is capable of locking a field of a table explicitly, then the granularity of locking is at field level. If it can lock only up to the row level, the locking granularity of that RDBMS product is row level. Thus, the higher the granularity of locking, the higher will be the concurrency. In above case database, tablespace, table and row are all the ancestors of the field. Similarly database, tablespace are ancestors of the table. Tablespace, table, row and fields are descendants of database. In the same way rows and fields are descendants of table. S and X locks alone cannot achieve complete concurrency. This is illustrated below. Let us consider the following scenario and analyze the concurrency that can be possible. Assume in a banking application transaction called BalanceUpdate has locked Row R2 of table ACC_DETAILS in X mode for updating the account balance. Because of this X mode lock, no other transactions can acquire either S or X lock on row R7 or any of its fields.

153 | P a g e

Infosys Foundation Program

Relational Database Management System

Database DB_BANK_DETAILS

TableSpace TS_CUST_DETAILS Table CUST_MAST

TableSpace TS_LOAN_DETAILS ROW LOCKED IN EXCLUSIVE MODE BY TRANSACTION BalanceUpdate

Table ACC_DETAILS

Table LOAN_MAST

Table INTEREST_MAST R7

Table LOAN_DETAILS TableSpace TS_BRANCH_DETAILS Table BRANCH_MAST

Figure 5-7: Granularity of Locking

Let us assume following scenarios: x A transaction called BalanceEnquiry requires a lock on first row of table ACC_DETAILS in S mode x A transaction called SummaryReport requires a lock on complete table ACC_DETAILS in S mode In ideal condition: x System should allow to lock first row of ACC_DETAILS table for transaction BalanceEnquiry x System should prevent transaction SummaryReport from acquiring a lock on table ACC_DETAILS Transaction SummaryReport should be prevented from acquiring a lock on table ACC_DETAILS because row R7 of table ACC_DETAILS is already locked by transaction BalanceUpdate in X mode.

154 | P a g e

Infosys Foundation Program

Relational Database Management System

If transaction SummaryReport is allowed to acquire S lock on table ACC_DETAILS, we may encounter the dirty read problem. On same lines no S or X locks are allowed by any other transaction on the tablespace TS_CUST_DETAILS or database DB_BANK_DETAILS because row R7 is part of TS_CUST_DETAILS and database DB_BANK_DETAILS. In other words, although row R7 of table ACC_DETAILS was locked explicitly in X mode, the table ACC_DETAILS, the tablespace TS_CUST_DETAILS and the entire database DB_BANK_DETAILS, was locked implicitly in the X mode to avoid any parents of R7 being locked by some other transactions. This implicit locking of complete database now avoids lock on row R1 of Table ACC_DETAILS by transaction BalanceEnquiry. This is serious threat to concurrency of the transactions. Let us look at the solution for this problem in next section.

5.8.

Intent Locking

In Intent locking only the intention of locking is expressed at the ancestor node of the required resource and the resource at the lower level is locked explicitly only when required. Consider the example discussed in Section 5.5.2.1. In the earlier case, it was required to lock row R7 of table ACC_DETAILS in X mode explicitly but all its ancestors were implicitly locked in the same mode (Refer Figure 5-7). This has reduced the concurrency considerably. To overcome this concurrency issue it is necessary for a transaction to express only the intension of locking the database DB_BANK_DETAILS, the tablespace TS_CUST_DETAILS and the table ACC_DETAILS in the X mode and in turn lock the row R7 explicitly in X mode. This concept is called as intent locking. Some other transactions still can express their intension of exclusive or shared locking on database DB_BANK_DETAILS or tablespace TS_CUST_DETAILS or table ACC_DETAILS and explicitly lock any other row other than Row R7, either in X or S mode. This intent locking mechanism not only increases concurrency but also stops the implicit locking of ancestral resources. Hence intent locking is called as Parent-Child locking. You express your intension of locking at parent level and lock child resource in explicit mode. Intent locking is classified as Intent Shared (IS) locking and Intent Exclusive (IX) locking.

155 | P a g e

Infosys Foundation Program

Relational Database Management System

5.8.1.

Intent Share (IS)

This lock has the intention to share the requested node. This also allows the requester to explicitly lock the descendants of this node in S or IS mode. Example: Transaction SummaryReport explained in section 5.5.2.1 can lock entire database DB_BANK_DETAILS and tablespace TS_CUST_DETAILS in IS mode and ACC_DETAILS table explicitly in S mode.

5.8.2.

Intent Exclusive (IX)

This lock has the intention to have exclusive access to the requested node and allows the requester to explicitly lock the descendants in IX or X modes. Example: Transaction BalanceUpdate explained in section 5.7 can lock database DB_BANK_DETAILS, the tablespace TS_CUST_DETAILS and the table ACC_DETAILS in IX mode and lock the row R7 explicitly in X mode.

5.8.3.

Shared Intent Exclusive (SIX)

The Combination of Shared and Intent exclusive lock is referred to as Shared Intent Exclusive Lock or SIX Lock. A share and intent exclusive lock (or SIX lock, pronounced as the separate letters S I X rather than like the number six) indicates an S lock at the current level plus an intention to insert, delete, update data at a lower level of granularity. Think of a SIX lock as an S lock plus an IX lock as shown in Figure 5-8. Only one transaction can be granted a SIX lock on a table at a time. DATABSE DB_BANK_DETAILS

S LOCK ON ACC_DETAILS

IX Lock on Rows TABLE ACC_DETAILS

Figure 5-8: SIX Lock

A SIX lock on a table indicates an intention to read all of the rows in the table and to delete/update/insert to a few. The S lock in the SIX lock at the table level covers all of the rows. Rows that are updated will obtain X locks, but only after IX intention locks have been obtained on the pages45 that contain them.

45

Page: It is part of a table. Usually in one page multiple rows are stored.

156 | P a g e

Infosys Foundation Program

Relational Database Management System

A SIX lock is stronger than a S lock or an IX lock. When a transaction obtains a SIX lock on a table, only that transaction will be able to modify data in the table. In this respect, a SIX lock slightly resembles an X lock. With a SIX lock, however, other transactions that want to read some of the data (read data at the row or page level and obtain an IS lock on the table) are allowed to proceed, so concurrency is better than with an X lock. Lock mode compatibility will be described in greater detail later in this module. If other transactions obtain S lock on row of a table or S lock on page of a table the SIX transactions wants to modify, the SIX transaction must wait until the S locks are released before it can modify the data. Other transactions that want to read all of the data (obtain an S lock on the table) or that want to write to any portion of the data are not allowed to proceed until the SIX lock is released. A SIX lock is also called a share sub-exclusive lock. plete compatibility matrix of these locks is shown Figure 5 9 Note that in Figure 5-9 A complete 5-9. re symbol 8represents incompatibility of the lock and symbol 9represents compatibility of the lock. Transaction T1

A Transaction T2

X S IS IX SIX

X S IS IX 888 8 8998 8999 8899 8898

SIX 8 8 9 8 8

Figure 5-9: Complete Lock Matrix

This lock matrix can be interpreted as:

157 | P a g e

Infosys Foundation Program

Relational Database Management System

If resource A (tablespace/table/row) is locked in X mode by transaction T1, no other transactions can lock resource A in any mode. If resource A is locked in S mode by transaction T1, another transaction say T2 can lock the same resource A in S or IS mode but it can not lock in IX or SIX or X mode. If resource A is locked in IS mode by transaction T1, another transaction say T2 can lock the same resource A in S or IS or IX or SIX mode but it can not lock in X mode. If resource A is locked in IX mode by transaction T1, another transaction say T2 can lock the same resource A in IS or IX mode but it can not lock in S or X or IX or SIX mode. If resource A is locked in SIX mode by transaction T1, another transaction say T2 can lock the same resource A in IS mode but it can not lock in S or X or IX or SIX mode. SIX lock is combination of S and IX. Hence SIX lock is compatible with that lock which has the common compatibility with S and IX locks. Since S and IX are together compatible with IS lock, SIX lock is compatible with IS lock only. The biggest problem with locking technique is that it may lead to Deadlock.

5.8.4.

Case study for Intent Locks

Objective: To study about the compatibility of locks. Assumptions: A database db has two tables (files) f1 and f2. File f1 has pages p11, p12, and p13. File f2 has pages p21, p22 and p23. Page p11 has 2 records, r111 and r112. Page p12 has 2 records, r121 and r122. Page p13 has 2 records, r131 and r132. Page p21 has 2 records, r211 and r212. Page p22 has 2 records, r221 and r222. Page p23 has 2 records, r231 and r232. Consider the following situation: Transaction T1 wants to update record r111 and record r211.

158 | P a g e

Infosys Foundation Program

Relational Database Management System

Transaction T2 wants to update all records on page p12. Transaction T3 wants to read record r112 and the entire file f2. Assume that transaction T4 and T5 starts only after all the other transactions have committed. Transaction T4 wants to modify r111 and transaction T5 wants to read record r112. Problem statement: Specify The locks which will be acquired by the transactions The order in which the locks will be acquired by the transactions The order in which the locks will be released by the transactions Solution: T1 IX(db) IX(f1)

T2

T3

IX(db) IS(db) IS(f1) Is(p11) IX(p11) X(r111) IX(f1) X(p12) S(r112) IX(f2) IX(p21) X(r211) Do the updation Unlock(r211) Unlock(p21) Unlock(f2) S(f2) Unlock(p12) Unlock(f1) Unlock(db) Unlock(r111) Unlock(p11) Unlock(f1) Unlock(db) Unlock(r112)

159 | P a g e

Infosys Foundation Program

Relational Database Management System

Unlock(p11) Unlock(f1) Unlock(f2) Unlock(db) Assumption: Transaction T4 and transaction T5 start after the transactions T1, T2 and T3 have committed and released the locks. T4 SIX(db) SIX(f1) IX(p11) X(r111) Unlock(r111) Unlock(p11) Unlock(f1) Unlock(db)

T5 IS(db) IS(f1) IS(p11) S(r112) Unlock(r112) Unlock(p11) Unlock(f1) Unlock(db)

Learning’s from the above case study: x Locks are acquired from the root to the place (node) one wants to lock. (top to bottom) x S or X mode locks are applied only at very fine granularity (only on the specific node that the user wishes to read or update) x Locks are released in bottom to top fashion x Check for the compatibility of the locks in cases where a transaction already holds a lock on the node and another transaction wants to acquire a lock on the same node

5.9.

Deadlock

Deadlock is a situation where one transaction is waiting for another transaction to release the resource it needs, and vice versa. Each transaction will be waiting forever for the other to release the resource. This is shown in the Figure 5-10:

Time 10:22 10:23 10:24 10:25 10:26 10:27

160 | P a g e

Transaction BalanceUpdate

Transaction LoanUpdate

Lock ACC_DETAILS Update ACC_DETAILS Try for lock on LOAN_DETAILS Wait for lock Wait for lock Wait for lock

Lock LOAN_DETAILS Update LOAN_DETAILS Try for lock on ACC_DETAILS Wait for lock Wait for lock Wait for lock

Infosys Foundation Program

Relational Database Management System

10:28

Wait for lock

Wait for lock

Figure 5-10: Deadlock

In above diagram transaction BalanceUpdate locked the table ACC_DETAILS in X mode at time 10:22 and waiting to acquire lock on table LOAN_DETAILS for some updation. But transaction LoanUpdate already has the X lock on table LOAN_DETAILS and waiting for table ACC_DETAILS which is locked by transaction BalanceUpdate. These two transactions will be waiting infinitely for each other to release the locked resources. This is known as deadlock. If a deadlock occurs, one of the participating transactions must be rolled back to allow the other to proceed. There are various methods to choose which transaction to roll back when a deadlock is detected. Usually rollback action is decided on: x How long the transactions have been running x Data already updated by the transaction x Data that remains to be updated by the transaction There are schemes available for preventing deadlock. Most of the RDBMS products allow deadlocks to occur and resolve them, when they are detected.

5.10. Security Security is one of the best implemented strategies in RDBMS. Security is implemented in RDBMS packages using: 1. USERID and PASSWORD to restrict the users from acquiring an un-authorized access 2. Grant and Revoke statements (Data Control Language) to provide restricted access control to resources like Tables 3. Database views to restrict access to sensitive data 4. Encryption46 of data to avoid un-authorized access

5.11. Recovery A database might be left in an inconsistent state by: x An Application error x Power failure x O/S or Database failure x Network failure x Hardware or Media failures

46

Encryption: The process of manipulation of data to prevent accurate interpretation by all but those for whom the data is intended.

161 | P a g e

Infosys Foundation Program

Relational Database Management System

If the database is in an inconsistent state, it is necessary to restore it to a consistent state. Recovery process can be achieved either using log files or backups of the database. The simplest backup technique is ‘Dumping’. The entire content of the database are backed up on to secondary devices like tapes on a regular basis. This backing up operation must be performed when the state of the database is consistent. Therefore no transactions which modify the database can be running during this backup process. This dumping process can take a long time to perform and one may not be able to stop transactions for a longer time in the production environment. Hence it cannot be performed as often as one would like to. This type of back-up is called cold back up and is usually done on a periodic basis like once a week or once a month at night when transactions in system are at their minimal threshold. These tapes can then be used in case of complete hard disk failures. This is showed in the Figure 5-11. Log file contains Trans ID Timestamp Old Value New Value

Log File Hard Disk Log Information

Main Memory

Da

ta

Tr a

Log File 1

Log File 2

Applying Log

Database Harddisk ns

fe

r

Data File 1

Data File 2

Back-up Tape Device

Figure 5-11: Database Backup

If the database back up is done while transactions are running, it is called as Hot backups. Usually hot backups are incremental in nature. This means only modified data since the last backups are captured. Usually it takes less time and is done on a daily basis. The hot and cold backups are useful only in the case of media or hard disk failure. This back up cannot be used for x x x

Un-planned power shutdown Sudden breakdown in O/S or database Memory failure

These kinds of failures are called as instance failures. Instance failures can be handled by making use of transactional log or redo log files. These are further explained in the following sections.

162 | P a g e

Infosys Foundation Program

Relational Database Management System

5.12. Transaction Log Transaction log or the journal log or redo-log is a physical file. Usually the Transaction ID, the time stamp of the transaction, the old value and the new values of the data are stored in transaction log file. Therefore the RDBMS is aware of the state of the database i.e. before and after image of data after each transaction. Every database is returned to a consistent state and the log may be truncated to remove committed transactions. Normally there are two techniques used to maintain the log files.

5.12.1.

Deferred update

Deferred update, or NO UNDO/REDO, is an algorithm to support transaction failures owing to O/S, application, power, memory and machine failures. While a transaction runs, not updates/alterations made by that transaction are recorded in the database but captured only in the log files. On commit, data changes are applied to the database using the log files. This process is called as “Re-doing”. On rollback, data changes which are captured in the log files are discarded and no changes are made to the database. On system restart, due to any of the above mentioned reasons if transaction fails and it is not committed, contents of the log files are discarded and the transaction will be restarted. If it is committed before crashing then after restart, the log file contents are applied to the database. Sequences of deferred update are explained in Figure 5-12 and Figure 5-13.

Time

Transaction

Disk Before

Disk After

10:22 10:23 10:24 10:25

Start Read field F1 Update F1 to 23 Read field F2

6 6 6 12

6 6 6 12

10:24

Update F2 to 45

12

12

(12,45)

10:25

Commit

F1=23, F2=45

Commit

163 | P a g e

Log Start Timestamp (6,23)

Infosys Foundation Program

Relational Database Management System

Figure 5-12: Deferred Update

From Figure 5-12 it is evident that when transaction updates the field F1 to 23 and field F2 to 45 in main memory log file will have old value and new value of the field. Database disk file still holds the old values. Contents of database are modified using log file only after transaction commits. The process or re-doing the transaction from the log is sometimes referred as ‘Rollforward’. Disadvantage of deferred update technique is increased time of recovery in case of system failure. START

Update ate Reco Record in Memory

Update in Logs

NO

Has System crashed?

YES

Restart System

Is transaction committed?

YES

YES

Do you find commit in log?

Make changes permanent in database ase using log

NO

NO

Discard Log data

STOP

Figure 5-13: Sequences of Deferred Update

5.12.2.

Immediate Update

Immediate update, or UNDO/REDO, is another algorithm to support transaction failure owing to O/S, application, power, memory or machine failure.

164 | P a g e

Infosys Foundation Program

Relational Database Management System

While a transaction runs, updates/alterations made by that transaction can be written to the database directly. However, the original and the new data being written must both be stored in the log BEFORE writing it to the database. On commit, all the changes to the database are made permanent and log contents are discarded. On rollback, using the log entries, old values are restored. All the changes which that transaction has made to the database disk are discarded. This process is called as “Un-doing”. Database changes are made permanent once the system restarts, after the crash for committed transactions. The original values are restored using the log files for uncommitted transactions. Transaction snapshot is shown in Figure 5-14 and sequences of immediate update process are shown in Figure 5-15.

Time

Transaction

Disk Before

Disk After

Log Start Timestamp

10:22 10:23 10:24 10:25

Start Read column F1 Update F1 to 23 Read Column F2

6 6 6 12

6 6 23 12

10:24

Update F2 to 45

12

45

(12,45)

10:25

Commit

F1=23, F2=45

Commit

(6,23)

Figure 5-14: Immediate Update

From Figure 5-14 is evident that when transaction updates the field F1 to 23 and field F2 to 45 in main memory, log file will have old value and new value of the field. Simultaneously database disk file also modified to reflect the new values even before transaction commits. For any reason if transaction fails to commit, contents of database disk files values are restored to old values using log file. The process of undoing changes using the log files is frequently referred to as rollback. Disadvantage of immediate update technique is frequent I/O operations while the transaction is active.

165 | P a g e

Infosys Foundation Program

Relational Database Management System

START

Update ate Recor Record in Memory

Update in Logs

Update te Database Datab on disk

NO

Has as System crashed?

YES

Restart System

Is transaction committed?

YES

YES

Do you find commit in log?

NO

NO Make ake change changes permanent

Undo changes in database se using log

Discard Log data

Undo changes in database se u using log

STOP

Figure 5-15: Sequences in Immediate Update

5.12.3.

Check-Points

Usually in commercial RDBMS applications neither the deferred updates nor the immediate updates are used because of their disadvantages. In these commercial RDBMS applications, databases are updated at fixed intervals of time; say every 2 minutes, irrespective of the

166 | P a g e

Infosys Foundation Program

Relational Database Management System

transaction commit/uncommit state. Updating the database at fixed intervals of time is called as check-pointing. At the check point time, the contents of the log files are applied to the database. Transactions may be committed or non-committed at the check point. Later if the transaction rolls back, the database is restored to the original state using the log files. As already explained, this process is called as “Un-doing”. If the transaction commits, changes are made permanent, again using the log files. This process is called as “Re-doing”. Hence check point based updates use both the Roll forward and the Rollback mechanism. To some extent this technique speeds up the recovery mechanism during instance failures. For example consider the snapshot of the database shown in Figure 5-16.

Log File T1: ABC T2: AB T4: AB T2: C T3: ABC T5: AB

T1 A

B

C T2

A

T1: ABC T2: AB T4: AB

B

Database

Database

C

T3 A

B

C Memory Crash

T4

A

B

T5 Check Point Ts Start

A

Tc

B

Tf Failure

Figure 5-16: Checkpoint Scenario

Let us analyze the situation on a system restart, after an unfortunate crash 1. Transaction T1 committed before check point and also wrote to the database hence no changes are required in the database. 2. Transaction T2 committed before system failure but partially wrote to the database at the check point. After restart, other parts of T2 should be written to the database using the log files.

167 | P a g e

Infosys Foundation Program

Relational Database Management System

3. Transaction T3 began after the checkpoint hence contents were not written to the database but successfully completed before crash. Complete transaction needs to be written to the database using the log file. 4. Transaction T4 began before the checkpoint hence part of T4 was written to the database. The unfortunate crash happened before T4 committed. Hence it is required to undo the changes to the database using the log files and restore it to the old values. 5. The contents of the transaction T5 in the log files needs to be discarded and the transaction needs to be re-started as this transaction started after checkpoint and hence no traces of this transaction exist in the database. Recovery scenario is explained in the Figure 5-17: T1 Commits

Log File

T1 A

B

C

A

T2 B

T2 Commits

C

T1: ABC T2: AB T4: AB T2: C T3: ABC T5: AB

Database

Database

T3

T3 Commits

A

B

C

T1 : No Changes T2 : Redo C T3 : Redo ABC T4 : Undo AB T5 : Discard AB

T4

A

T1: ABC T2: ABC T4: AB T3 : ABC

B

T5

A

B

T4 No Commit Start

Check Point

Memory Crash T5 No Commit

Figure 5-17: Recovery from Crash

Note: System can not be restored using the log files for hard disk failure(s). Only backup of data files and log files can save databases from media failures.

Examples: 1. Recovery using deferred update in a single-user environment Consider the read and write operations of two transactions T1 and T2 given below:

168 | P a g e

Infosys Foundation Program

Relational Database Management System

T1 read_item(A) read_item(D) write_item(D)

T2 read_item(B) write_item(B) read_item(D) write_item(D) The system log at the time of crash is as given below: <start T1> <write_item, T1, D, 20> <start T2> <write_item, T2, B, 10> <write_item, T2, D, 25>

----------------System crash

Solution: Transaction T1 commits before the system crash. The operations of transaction T1 are therefore redone (redone means contents of the log files are applied to the data file). The entries in the log corresponding to transaction T2 are ignored by the system because T2 is not committed. 2. Recovery using check points (concurrent transactions considered) Consider the read and write operations of transactions T1, T2, T3 and T4 given below: T1 read_item(A) read_item(B) write_item(B)

T2 read_item(C) write_item(C) read_item(B) write_item(B)

T3 read_item(A) write_item(A) read_item(E) write_item(E)

T4 read_item(C) write_item(C) read_item(A) write_item(A)

The system log at the time of crash is as given below: <start T1> <write_item, T1, <start T4> <write_item, T4, <write_item, T4, <start T2> <write_item, T2, 169 | P a g e

B, 20>

C, 15> A, 20>

C, 12>

Infosys Foundation Program

Relational Database Management System

<start T3> <write_item, T3, A, 30> <write_item, T2, E, 25>

----------------------- System crash

Solution: Transaction T1 committed before the checkpoint. Therefore no operation is performed on account of it. Transaction T4 is redone because its commit point is after the last system checkpoint. Transaction T2 and T3 are ignored because they did not reach their commit points.

5.13. Summary x

All transactions should be: o Atomic o Consistent o Isolated o Durable

x

OLTP applications should ensure: o Integrity o Concurrency o Security o Recovery

x

Integrity of the RDBMS application can be maintained using: o Entity Integrity o Referential Integrity o Domain Integrity

x

While allowing Concurrency one may face problems in implementing consistency. Following are the four major problems encountered: o Lost Updates o Dirty Read o Incorrect Summary o Phantom records

x

Consistency can be implemented using serialization techniques like: o Locking

170 | P a g e

Infosys Foundation Program

Relational Database Management System

o

Time-stamping

x

Locking technique leads to the dead lock problem

x

Time stamping technique leads to many rollback problem

x

Security is implemented in RDBMS using: o User ID / Password o Grant and Revoke commands o Views Two types of recovery mechanism can be implemented in the RDBMS application: o Media failures using back-up strategy o Instance recovery using transaction log files Two types of backups are possible: o Cold backup o Hot backup Three types of updates to the database are possible, using the transaction log files: o Immediate update o Deferred update o Check-point based updates

x

x

x

171 | P a g e

Infosys Foundation Program

Relational Database Management System

6. Introduction to PL/SQL 6.1.

Need for PL/SQL

SQL is a flexible, efficient fourth generation language47. It has features designed within it which can create, manipulate and control the relational database. SQL lacks programming language capabilities. PL/SQL is a technology inbuilt within Oracle which provides all the features available in SQL with the procedural logic implementation capabilities expected out of any programming language. Hence a programmer can loop through a set of records in the underlying table and he can manipulate one record at a time, applying the intended business logic on it. PL/SQL is considered as an extension to SQL developed by Oracle. As Oracle database can be hosted in heterogeneous platforms such as UNIX, Windows, a code written in PL/SQL can also be hosted in all the above mentioned platforms. This introduces a kind of platform independence. In PL/SQL a group of DML statements can be combined together and executed as a transaction which does some logical unit of work, thereby transforming the database from one consistent state to another consistent state. Changes introduced by the set of statements can be permanently saved to the database (committing the transaction) or can be rolled back (undoing the transaction). If the set of statements submitted for execution fails in the middle of the transaction due to some reasons like, system crash, memory crash or unplanned power shutdown, then the database is automatically restored to its earlier consistent state. PL/SQL also allows a programmer to write DDL, DML and DCL statements. But we will be discussing how to write DML and DCL statements in PL/SQL, and not about writing DDL statements in PL/SQL as it is beyond the scope of this course-ware material.

47

A 4GL is typically non-procedural and designed so that end users can specify what they want without having to know how the computer will process their requirement

172 | P a g e

Infosys Foundation Program

Relational Database Management System

Moreover an equivalent counterpart of PL/SQL in SQL-Server is T-SQL. Similarly every RDBMS in turn might have a PL/SQL component. We have selected PL/SQL in Oracle technology for our discussion.

6.2.

PL/SQL Architecture

PL/SQL technology is available in an Oracle server environment as well as in some of the Oracle application development tools such as Oracle Forms and Oracle Reports environment. Both the environments expect a valid PL/SQL block to be submitted. All the procedural statements are executed by the procedural statement executor module present in the PL/SQL engine and SQL statements are executed by the SQL statement executor present in the Oracle server.

6.3.

PL/SQL block structure

Every PL/SQL block written has the following structure, with DECLARE and EXCEPTION optional keywords. BEGIN and END mandatory keywords. Optional keywords [ ] are enclosed within square brackets. [ DECLARE ] BEGIN [ EXCEPTION ] END;

6.4.

Comments in PL/SQL

Two different ways of placing comments in PL/SQL: 1. Single line comment (--) 2. Multi line comment (/* */) Single line comment starts with a double hyphen (--) symbol, appearing at the beginning of comment. Multiline comments are also possible in PL/SQL, which follows the same C style. Multiline comments begin with /* and end up with */. Multiline comments should not be nested.

173 | P a g e

Infosys Foundation Program

Relational Database Management System

DECLARE --Declaration section BEGIN /* Executable declaration */ END;

6.5.

Anonymous PL/SQL blocks

As these PL/SQL blocks do not have a name of its own, it is called anonymous PL/SQL blocks. Hence we will not be able to invoke an anonymous PL/SQL block from anywhere else in our PL/SQL code, as well as itself. PL/SQL blocks are not stored permanently in the database. But we can type the code in a text file and have it saved and stored as a part of file system. There are other forms of PL/SQL blocks that are stored permanently in the oracle database such as PROCEDURES, FUNCTIONS. These are also called as NAMED PL/SQL blocks. Every anonymous PL/SQL block can have three sections. 1. Declaration section 2. Executable section 3. Exception section All statements that fall between the DECLARE and BEGIN keywords constitute the declaration section. All PL/SQL variables would be declared in the declaration section. All statements that fall between the BEGIN and END keywords constitute the executable section. Any valid SQL and PL/SQL statements can be present in the executable section. Statements written between the EXCEPTION and END keywords form the exception section. All runtime errors can be handled with the help of this section.

6.5.1.

Declaration section

Declaration section is an optional section within an anonymous PL/SQL block. This section is especially used for declaring PL/SQL variables along with their datatypes. All the datatypes available in SQL are supported in PL/SQL too. For example, to declare a variable to store the current system date and time details, we need to declare as shown below in the declaration section.

174 | P a g e

Infosys Foundation Program

Relational Database Management System

DECLARE v_datetime TIMESTAMP; … BEGIN … END; Any number of variables can be declared in the declaration section. There is no upper limit on the number of variables that can be declared within the PL/SQL declaration section. Two variables cannot be declared in the same line. For example, DECLARE v_customername, v_suppliername VARCHAR2(30); BEGIN … END; The above declaration is INVALID. Maximum length of variable name is 30 characters. Variables declared can also be initialized with some initial values. If a variable is not initialized, the default value present in the variable would be NULL. Variables declared can be referred in the executable section and exception section. The scope or lifetime of variable declared is that it is alive both in the executable section and in the exception section. Variables cannot be declared in the execution section or exception section.

6.5.2.

Executable section

Executable section is a mandatory section. Any valid SQL and PL/SQL statements can be present in the executable section. If no executable statement is present in this section, then the PL/SQL block becomes invalid. Hence at least a NULL statement should be present in this section to make this PL/SQL block a valid one. Hence NULL is a valid executable statement in PL/SQL. SQL> 1 BEGIN 2 NULL; 3 END; SQL> The above PL/SQL block is valid, which does nothing.

175 | P a g e

Infosys Foundation Program

Relational Database Management System

SQL> 1 BEGIN 2 END; SQL> The above PL/SQL block is invalid and would throw an error.

6.5.3.

Exception section

Exception section is an optional section. This section is especially used to trap any runtime errors generated during the execution of a PL/SQL program. To understand about what an exception is let us assume that we try to divide a numeric value by zero. During compilation, this statement would not throw compilation error but during runtime the PL/SQL runtime engine would identify this runtime error. This runtime error is called as an exception. Exception section can also be used to define certain alternative routine or a recovery mechanism which need to be executed during runtime errors. Any valid SQL and PL/SQL statements can be present in the exception section.

6.6. 6.6.1.

PL/SQL block execution How a PL/SQL block can be executed?

Type the PL/SQL block as shown below in the SQL prompt. After we type BEGIN, press enter. New line number would start generating for every new line we type. To stop generating line numbers place a full stop after typing the last line. SQL> BEGIN 2 DBMS_OUTPUT.PUT_LINE('Hello World'); 3 END; 4 . SQL> / PL/SQL procedure successfully completed.

176 | P a g e

Infosys Foundation Program

Relational Database Management System

Now to execute the PL/SQL block, type (/) at the SQL prompt. The recently typed PL/SQL block would be executed. In Oracle we have STANDARD package48. Within which DBMS_OUTPUT is a sub package and PUT_LINE is a procedure within that package which helps us to echo any information on the screen. But in this case, nothing would be displayed on the screen. To enable display of output information on the screen, type the following. This command has to be typed once for every new session. Meaning, every time you connect to Oracle Server using SQL PLUS, type the below command once. SQL> SET SERVEROUTPUT ON SQL> / Hello World PL/SQL procedure successfully completed. SQL> Later typing a (/) symbol executes the PL/SQL block present in the editor buffer and would display the output ‘Hello World’.

6.6.2.

Another way of executing the PL/SQL block

Save the PL/SQL block in a file named First.sql as shown below. Once the file is saved, then we can execute the content of the file using @filename or @””. SQL> SAVE First.sql Created file First.sql SQL> @First.sql Hello World PL/SQL procedure successfully completed.

48

PACKAGE: A collection of procedures and functions bundled together with a name

177 | P a g e

Infosys Foundation Program

Relational Database Management System

6.7.

Named PL/SQL blocks

PL/SQL block which has a name is called named PL/SQL block. Named PL/SQL block has a special header section which specifies whether it is a PROCEDURE or a FUNCTION. The syntax of implementing a PROCEDURE is as follows: CREATE [ OR REPLACE ] PROCEDURE block_name(p_param DATATYPE) IS/AS --declaration of variables BEGIN --SQL and PL/SQL statements EXCEPTION --error handling END; PROCEDURE is used for implementing an action and FUNCTION is used for implementing mathematical computations. The header has a special RETURN clause only for FUNCTION which specifies the type of data returned by the function. CREATE [ OR REPLACE ] FUNCTION block_name(p_param DATATYPE) RETURN datatype IS/AS --declaration of variables BEGIN --SQL and PL/SQL statements EXCEPTION --error handling END; Both for procedures and functions we can pass one or more parameters as input. Procedures may return zero, one or more than one output, whereas function must return at least one value as output. When the named PL/SQL block is submitted to the oracle server, it would not execute immediately, rather it get compiled and stores permanently in the database for later execution. Procedures and functions can be invoked within an anonymous PL/SQL block.

6.8.

Variables and datatypes

In the declaration section of an anonymous PL/SQL block we can declare variables. The syntax for declaring variables in PL/SQL is as shown below. variable_name [CONSTANT] datatype [ NOT NULL ] [ := value ] The coding convention followed for declaring variable is to start with “v_”. For example to declare a variable to store the employee name we declare v_empname.

178 | P a g e

Infosys Foundation Program

Relational Database Management System

DECLARE v_empname VARCHAR2(30); … Immediately followed by the variable name, we need to specify the datatype of the variable, which decides the nature of data stored within that variable during runtime. Uninitialized variables would be initialized with NULL values by default, irrespective of the datatype of the variable. If we are interested in declaring a variable, which should not contain an initial value NULL, then we can initialize the variable with some value, as shown below. DECLARE v_empname VARCHAR2(30) :=’Joe’; … Using assignment operator (:=) we can initialize a PL/SQL variable with a value. An equivalent of assignment operator is the DEFAULT keyword which can also be used to initialize a variable with a value, as shown below. DECLARE v_empname VARCHAR2(30) DEFAULT ’Joe’; … How do I declare a variable which should not hold a NULL value at any point of time during execution of a PL/SQL block? DECLARE v_empname VARCHAR2(30) NOT NULL := 'Joe'; … Using NOT NULL constraint we can implement the above requirement, as shown above. Whenever we declare PL/SQL variables with a NOT NULL constraint, we have to initialize the variable with some value. Whenever we refer this variable within PL/SQL henceforth, after declaration, it would be ensured that it has a non NULL value. We can even declare constants as shown below. CONSTANTs have to be initialized with some value during the declaration. The coding convention followed for declaring constants is “c_”. Constants can be assigned NULL value. NOT NULL variables cannot be assigned NULL values.

179 | P a g e

Infosys Foundation Program

Relational Database Management System

Values of the constant variables cannot be changed in the program but the values of the variable which has a NOT NULL constraint can be changed except NULL.

DECLARE c_discount CONSTANT NUMBER := 10; …

6.8.1.

Scalar datatype - Character

Scalar variables are variables which hold single value during runtime. Character datatype is of two types. 1. CHAR (n) – Fixed length character datatype. n an integer which stands for the number of bytes to be allocated. In other words, n stands for the number of alphanumeric characters that can be stored. If no size is mentioned, one character can be stored. For example, assume a variable declared as shown below. DECLARE v_empname CHAR(10):= 'Joe'; … Irrespective of the actual number of characters stored (3 characters), 10 memory spaces would be allocated, but unoccupied, when we go with the above declaration. (i.e. 7 memory spaces are unoccupied) 2. VARCHAR2 (n) –Variable length character datatype. n is an integer which stands for the number of bytes to be allocated. In other words, n stands for the number of alphanumeric characters that can be stored. For example, assume a variable declared as shown below. DECLARE v_empname VARCHAR2 (10):= 'Joe'; … Only 3 memory spaces are allocated and 3 characters are stored in the variable v_empname. The maximum number of alphanumeric characters which can be stored in the variable is 10.

180 | P a g e

Infosys Foundation Program

Relational Database Management System

3. CHAR (n CHAR) – To support internalization and globalized databases, wherein the number of bytes allocated to store every character is more than one byte, the above declaration is introduced. For example, let us say for storing a Chinese alphabet, we need 4 bytes. DECLARE v_empname CHAR(10 CHAR); … In the above declaration, regardless of the number of bytes allocated for every character, for sure, we can store 10 characters, supporting multi-byte characters. As we have seen earlier it exhibits the fixed length nature, similar to CHAR (n). 4. VARCHAR2 (n CHAR) – This is similar to CHAR (n CHAR) but it exhibits the variable length nature, similar to VARCHAR2 (n). DECLARE v_empname VARCHAR2(10 CHAR); … Few valid variable declarations are shown below: SQL> DECLARE 2 v_departmentname VARCHAR2(20) :='Civil'; 3 v_instructorname CHAR(20) NOT NULL:= ' Bob Hockins'; 4 v_applicantname VARCHAR2(20 CHAR) DEFAULT 'Joel'; 5 c_coursename CONSTANT CHAR(20 CHAR) := 'AutoCAD'; …..

181 | P a g e

Infosys Foundation Program

Relational Database Management System

6.8.2.

Scalar datatype – PLS_INTEGER

Variable declared with datatype as PLS_INTEGER can store positive, negative numbers or 0. As arithmetic operations involving PLS_INTEGER provides the best performance in PL/SQL. Assigning a real value to a variable declared as PLS_INTEGER would not throw an error and the value stored would be rounded to the nearest integer.

6.8.3.

Scalar datatype - NUMBER

Variable declared with NUMBER (P, S) datatype can store integer and floating point values where P is total number of digits allowed and S is the number of digits to the right of the decimal place. Note here that the decimal point is ignored in the calculation of width. A maximum of 38 digits can be stored in NUMBER. SQL> DECLARE 2 v_semester PLS_INTEGER; 3 v_durationinhours NUMBER(7,2);

6.8.4.

Scalar datatype - Boolean

Variable declared with BOOLEAN datatype can store boolean values such as TRUE, FALSE and NULL. These variables can be used for decision making purpose while writing conditional statements. SQL> DECLARE 2 v_test BOOLEAN; 3 BEGIN 4 v_test:='TRUE'; --Wrong 5 v_test:= TRUE; --Correct 6 END; While explicitly assigning boolean value to boolean variables care should be taken that it is not enclosed within single quotes (‘). The above code snippet demonstrates the same.

Do not attempt to print or display the value stored in boolean variable, as it is not possible.

182 | P a g e

Infosys Foundation Program

Relational Database Management System

6.8.5.

Scalar Datatype - Date

The DATE datatype stores the century, year, month, day, hour, minute and second. Fractional seconds are not available in the date datatype.

SQL> 1 DECLARE 2 v_billingdate DATE := SYSDATE; 3 BEGIN 4 DBMS_OUTPUT.PUT_LINE( TO_CHAR (v_billingdate, 'DD-MON-YYYY HH:MI:SS')); 5* END; SQL> / 24-JAN-2010 10:01:39 PL/SQL procedure successfully completed. SQL> Certain predefined formats can be applied to DATE variables using TO_CHAR () function, so as to print the date according to format mentioned, as shown in the above example.

6.8.6.

Scalar Datatype - Timestamp

The TIMESTAMP datatype stores the date/time details much like DATE datatype, and in addition it also provides the subsecond details upto nine digits (the default is six). SQL> 1 DECLARE 2 v_billingdate TIMESTAMP := SYSTIMESTAMP; 3 BEGIN 4 DBMS_OUTPUT.PUT_LINE(v_billingdate); 5* END; SQL> / 24-JAN-10 10.34.25.929000 PM PL/SQL procedure successfully completed. SQL>

183 | P a g e

Infosys Foundation Program

Relational Database Management System

By default, whenever we print a TIMESTAMP variable it prints both the date and time duration, without the need of using TO_CHAR () function. SQL> 1 DECLARE 2 v_billingdate TIMESTAMP(9) := SYSTIMESTAMP; 3 BEGIN 4 DBMS_OUTPUT.PUT_LINE(v_billingdate); 5* END; SQL> / 24-JAN-10 10.36.54.413000000 PM PL/SQL procedure successfully completed. SQL>

6.9.

DBMS_OUTPUT package

SQL> 1 DECLARE 2 v_departmentname VARCHAR2(20) :='Civil'; 3 v_dateofjoining DATE:= SYSDATE; 4 v_registrationdate TIMESTAMP := SYSTIMESTAMP; 5 BEGIN 6 DBMS_OUTPUT.PUT_LINE(v_departmentname); 7 DBMS_OUTPUT.PUT_LINE(v_dateofjoining); 8 DBMS_OUTPUT.PUT_LINE(v_registrationdate); 9* END; SQL> / DBMS_OUTPUT is an oracle supplied package and it has a set of procedures defined within the package. One among the procedures is PUT_LINE () procedure which is used to display the string passed as argument enclosed in parenthesis to be echoed on the screen. Thus for display of messages from an anonymous PL/SQL block we use this procedure. This procedure is mainly used for debugging purpose. When executing a PL/SQL block, any DBMS_OUTPUT.PUT_LINE () messages are placed in an output buffer, which displays it content on the screen when the program completes its execution.

184 | P a g e

Infosys Foundation Program

Relational Database Management System

Civil 24-JAN-10 24-JAN-10 10.47.03.368000 PM

6.9.1.

DBMS_OUTPUT procedures

DBMS_OUTPUT.ENABLE and DBMS_OUTPUT.DISABLE are procedures which will enable and disable transfer and display of information in the output buffer respectively DBMS_OUTPUT.PUT () procedure merely places the information in the output buffer but does not display the same on the screen. DBMS_OUTPUT.PUT_LINE () procedure not only places the information in the output buffer but also display the same on the screen, including the display of messages which are not yet displayed and also places an end-of-line marker, for every invocation. DBMS_OUTPUT.NEW_LINE () procedure forces the display of information from the output buffer on to the screen, for those messages which are yet to be echoed on the screen and also places an end-of-line marker once, even for consecutive repeated invocation. Using this procedure, we cannot transfer any messages to output buffer for display.

6.9.2.

DBMS_OUTPUT procedures usages SQL> 1 BEGIN 2 DBMS_OUTPUT.PUT('An intelligent programmer '); 3 DBMS_OUTPUT.PUT('finds programming in '); 4 DBMS_OUTPUT.PUT_LINE('PL/SQL '); 5 DBMS_OUTPUT.PUT('interesting '); 6* END; SQL> / An intelligent programmer finds programming in PL/SQL PL/SQL procedure successfully completed.

SQL> 1 BEGIN 2 DBMS_OUTPUT.PUT('An intelligent programmer ');

185 | P a g e

Infosys Foundation Program

Relational Database Management System

3 DBMS_OUTPUT.PUT('finds programming in '); 4 DBMS_OUTPUT.NEW_LINE; 5 DBMS_OUTPUT.PUT_LINE('PL/SQL '); 6 DBMS_OUTPUT.PUT('interesting '); 7 DBMS_OUTPUT.NEW_LINE; 8* END; SQL> / An intelligent programmer finds programming in PL/SQL interesting PL/SQL procedure successfully completed.

7. PL/SQL basics and constructs 7.1.

%TYPE anchored declarations

Usage1: Anchored declaration is a way of associating a database column definition to a PL/SQL variable. SQL> DECLARE 1 -- variablename tablename.columnname%TYPE; --Syntax 2 v_courseid course.courseid%TYPE; .... The primary advantage of anchored declaration is changes to column precision or datatype definition in the database would not affect the PL/SQL block which deals with those values. For example, v_courseid instead of declaring it as VARCHAR2(6), we have defined in the above PL/SQL block as COURSE.COURSEID%TYPE, where COURSE is the name of the base table and COURSEID is the column present in the COURSE table, both separated by dot(.) symbol and followed by %TYPE, which stands for copying the datatype definition alone. The advantage over here is, changes which happens to COURSEID from VARCHAR2 (6) to VARCHAR2 (8) at a later point of time, during maintenance, certainly would not affect the PL/SQL block in any way, since the next time when we compile and execute the PL/SQL block, changes in the datatype definition would be automatically reflected, leading to ease of maintenance from the programmer point of view.

186 | P a g e

Infosys Foundation Program

Relational Database Management System

At the same time, if there is any NOT NULL constraint or CHECK constraint associated with that database column, those constraints would NOT be applied to the PL/SQL variable defined using anchored declaration. Usage2: Another usage of %TYPE is whenever we want to reuse the datatype of an earlier declared PL/SQL variable, we can use %TYPE. SQL> DECLARE 2 v_projectscore NUMBER(3) NOT NULL:= 67; 3 v_assignmentscore v_projectscore%TYPE :=28; .... For example, as shown in the above PL/SQL block, v_projectscore is a variable declared with a NUMBER datatype with a NOT NULL constraint and initialized with a value 67. If another variable needs to be declared with the same datatype and constraints, we can go for %TYPE. Thus, while declaring v_assignmentscore we have declared it as v_projectscore%TYPE, thereby treating v_assignmentscore also as a NUMBER datatype and the NOT NULL constraint is also applied. Hence v_assignmentscore also has to be initialized. Note the value of v_projectscore 67 will not be copied to v_assignmentscore.

7.2.

Bind variables

Bind variables are declared in the SQL PLUS host environment. These variables are used to pass runtime values out of one or more PL/SQL programs to the host environment. The syntax for declaring variable is to use the VARIABLE keyword followed by the name of the bind variable and specify the datatype. This declaration is done outside any PL/SQL block. For example, to declare a bind variable named g_courseid do the declaration as shown in the below PL/SQL block. The convention followed for declaring bind variable is to start with “g_” and these variables help us in transfer of the information from PL/SQL to SQL PLUS environment. These variables are alive only for the current session in which it is declared. Soon after declaring the bind variable, it will hold a NULL value and we cannot initialize bind variables with an initial value. Within the PL/SQL block g_courseid is assigned a value C001. Understand the difference in addressing bind variable within PL/SQL as it has to be prefixed with a colon symbol. SQL> SET SERVEROUTPUT ON SQL> VARIABLE g_courseid VARCHAR2(4); SQL> BEGIN

187 | P a g e

Infosys Foundation Program

Relational Database Management System

2 :g_courseid :='C001'; 3 END; 4 / PL/SQL procedure successfully completed. Changes happens to the bind variable would be visible within the PL/SQL block as well as even outside, once the PL/SQL block has completed execution. Thus we can use DBMS_OUTPUT.PUT_LINE (:g_courseid) to print the value present in a bind variable inside a PL/SQL block. To display the value present in the bind variable in SQL environment use the PRINT command as shown below. SQL> PRINT g_courseid G_COURSEID ---------------C001 Using PRINT command we can view the content of only one bind variable. To view the list of all bind variables declared in a session type VARIABLE in the SQL prompt and press enter. We cannot declare bind variables with a DATE datatype. We can not declare the size of a NUMBER or CHAR type variable.

7.3.

Substitution variables

Substitution variables are declared in the SQL PLUS environment. These variables are used to pass run time values into one or more PL/SQL programs. Using DEFINE command we can define values for these variables. Irrespective of the type of data assigned to these variables, all would be considered as CHAR datatype. For example, to define a substitution variable named g_courseid and to initialize the variable with a value C001 we need to do as shown below. SQL> SET SERVEROUTPUT ON SQL> DEFINE g_courseid = 'C001'; SQL> DECLARE 2 v_courseid VARCHAR2(4); 3 BEGIN 4 v_courseid :='&g_courseid'; 5 DBMS_OUTPUT.PUT_LINE(v_courseid); 6 END; 7 /

188 | P a g e

Infosys Foundation Program

Relational Database Management System

Substitution variable have to be initialized with some value during declaration in SQL PLUS environment while using DEFINE command. While referring the substitution variable in PL/SQL prefix it with an (&) ampersand symbol. Thus the value defined in SQL PLUS is substituted in PL/SQL block; thereby v_courseid is assigned a value C001. PL/SQL would not ask us to enter any value for g_courseid. We cannot change the value of g_courseid within PL/SQL. Thus for mere transfer of data values from SQL PLUS to PL/SQL we can go for substitution variables. These variables are also alive only for the current session in which they are declared. The output of the above PL/SQL block is shown below. old 4: v_courseid:= '&g_courseid'; new 4: v_courseid:= 'C001'; PL/SQL procedure successfully completed.

7.4.

Accepting input in PL/SQL

To accept input in PL/SQL prefix the declared PL/SQL variable with an (&) ampersand symbol. Once we compile and execute the PL/SQL block the system would prompt we to enter some value during runtime. The below PL/SQL block demonstrates the same. SQL> SET SERVEROUTPUT ON SQL> DECLARE 2 v_courseid VARCHAR2(4); 3 BEGIN 4 v_courseid:='&v'; 5 DBMS_OUTPUT.PUT_LINE(v_courseid); 6 END; 7 / When the above PL/SQL block is executed, it would ask the user to enter the value for v. Value entered would be assigned to v_courseid, which is displayed by the subsequent DBMS_OUTPUT.PUT_LINE () statement. Enter value for v: C001 old 4: v_courseid:='&v'; new 4: v_courseid:='C001'; PL/SQL procedure successfully completed. PL/SQL is not interactive. Please follow the code snippet to understand the same.

189 | P a g e

Infosys Foundation Program

Relational Database Management System

SQL> DECLARE 2 v_customername VARCHAR2(20); 3 v_qtyrequired NUMBER; 4 BEGIN 5 v_customername := '&v_customername'; 6 DBMS_OUTPUT.PUT_LINE('Customer Name : '||v_customername); 7 v_qtyrequired := &v_qtyrequired; 8 DBMS_OUTPUT.PUT_LINE('Required Qty : '||v_qtyrequired); 9 END; 10 / Enter value for v_customername: JAMES old 5: v_customername := '&v_customername'; new 5: v_customername := 'JAMES'; Enter value for v_qtyrequired: 20 old 7: v_qtyrequired := &v_qtyrequired; new 7: v_qtyrequired := 20; Customer Name : JAMES Required Qty : 20 PL/SQL procedure successfully completed. While executing the above PL/SQL block, the system would ask us to enter the customer name and quantity required both, only after which it will display the entered customer name and quantity required details. Even though there is a presence of DBMS_OUTPUT.PUT_LINE () statement immediately after accepting the customer name, the system would not display the customer name immediately after accepting it, rather it would accept values for all input variables (i.e., preceded by an (&) ampersand symbol ) and then finally displays both the customername and the required quantity. A PL/SQL programmer should be aware of this behavior. Note: x Even if ‘&’ is present in a commented line, it executes and prompts for a value. x If it is a character input, accept it within single quotes (' ').

7.5.

SET VERIFY ON/OFF

The usual tendency of PL/SQL is that for every substitution that happens within the PL/SQL block it would display two lines. This would help us in identifying how the substitution has happened and how many has happened. We can even suppress the display of substitution by typing the above command SET VERIFY OFF in the SQL PLUS environment.

190 | P a g e

Infosys Foundation Program

Relational Database Management System

The below PL/SQL block and its output demonstrates the same. To enable the display of substitution we can use SET VERIFY ON command in the SQL PLUS environment. SQL> SET VERIFY OFF SQL> ed Wrote file afiedt.buf 1 DECLARE 2 v_courseid VARCHAR2(4); 3 BEGIN 4 v_courseid:='&v'; 5 DBMS_OUTPUT.PUT_LINE(v_courseid); 6* END; SQL> / Enter value for v: C002 C002 PL/SQL procedure successfully completed.

7.6.

Operators and Expressions

The list of operators where our focus of discussion would be is as follows: 1. Concatenation operator ( || ) 2. Arithmetic operators ( +,-,*,/, **) 3. Relational operators (=, !=, <, >, <=, >=) 4. Logical Operators (AND, OR and NOT) Using these operators, expressions can be framed.

7.6.1.

Concatenation operator

Concatenation operator attaches or concatenates two or more strings together. For example, in the below PL/SQL block, v_applicantname declared and initialized with a value John was appended with a value 10 and the concatenated value is reassigned to the same variable v_applicantname and displayed on the screen. DECLARE v_applicantname VARCHAR2(10) := 'John'; BEGIN v_applicantname := v_applicantname || '10'; DBMS_OUTPUT.PUT_LINE('value of v_applicantname : '|| v_applicantname); END;

191 | P a g e

Infosys Foundation Program

Relational Database Management System

value of v_applicantname : John10

This operator is especially used in formatted outputs.

7.6.2.

Arithmetic operator - Addition

The below PL/SQL block demonstrates the usage of plus (+) operator, obviously used for addition. v_hostelfee even though declared but not initialized, has a value NULL within it. When an arithmetic plus (+) operator is applied on this variable with some other operand value 50, the resultant would be NULL. As the formula says NULL when added with any NUMBER value would yield NULL. DECLARE v_hostelfee NUMBER; BEGIN v_hostelfee:= v_hostelfee+500; -- NULL +500 DBMS_OUTPUT.PUT_LINE(‘value of v_hostelfee : '|| v_hostelfee); END; Value of v_hostelfee: (NULL)

Use only numeric and date datatypes with arithmetic operators.

7.6.3.

Arithmetic operator - Exponentiation

The below example demonstrates the usage of (**) exponentiation operator. To identify 35 = ? , we would go with this exponentiation operator. v_number a PL/SQL variable declared and initialized with a value 3 and a statement v_number:= v_number ** 5 is written in the executable part of the PL/SQL block which is substituted as 3 ** 5 (as 3 raise to the power 5 equals 243) during runtime and the resultant value is displayed as shown in the below PL/SQL block.

192 | P a g e

Infosys Foundation Program

Relational Database Management System

DECLARE v_number NUMBER:=3; BEGIN v_number:=v_number ** 5; DBMS_OUTPUT.PUT_LINE( 'value of v_number : '|| v_number); END; value of v_number : 243

7.6.4.

Usage of Arithmetic operators with DATE variables

The below PL/SQL block demonstrates the usage of plus (+) operator, with the date datatype. A variable named v_today is assigned a date value and an executable statement written in the PL/SQL block increments the date assigned by one and assigns the newly incremented date value to yet another variable named v_tomorrow. SQL> SET SERVEROUTPUT ON SQL> DECLARE 2 v_today DATE := '31-MAR-2009'; 3 v_tomorrow DATE; 4 BEGIN 5 v_tomorrow := v_today + 1; 6 DBMS_OUTPUT.PUT_LINE('Tomorrow''s date is '||v_tomorrow); 7 END; 8 / Tomorrow's date is 01-APR-09 PL/SQL procedure successfully completed.

7.7.

Nested PL/SQL blocks

We can define a PL/SQL block within another PL/SQL block. Thus the PL/SQL block defined inside is called a nested PL/SQL block. Nested PL/SQL block can also have declaration section, executable section and exception section. One or more PL/SQL blocks can be present within an anonymous PL/SQL block. Nested PL/SQL blocks can be present in the executable section or in the exception handling section. The below example demonstrates the way in which a nested PL/SQL block can be defined.

193 | P a g e

Infosys Foundation Program

Relational Database Management System

DECLARE v_departmentid NUMBER:=10; BEGIN DECLARE v_seatsavailable NUMBER:=20; BEGIN DBMS_OUTPUT.PUT_LINE('The value of v_departmentid: '||v_departmentid); DBMS_OUTPUT.PUT_LINE('The value of v_seatsavailable: '||v_seatsavailable); END; DBMS_OUTPUT.PUT_LINE('The value of v_departmentid: '|| v_departmentid); DBMS_OUTPUT.PUT_LINE('The value of v_seatsavailable: '||v_seatsavailable); END; / The outer block has a variable named v_departmentid and initialized with a value 10. There is another block defined which has a variable named v_seatsavailable initialized with a value 20. The scope or the lifetime of variable v_departmentid is within the block in which it is declared and the nested PL/SQL block, whereas v_seatsavailable is visible only within the nested block in which it is declared. Hence executing the above PL/SQL block would lead to compilation error as shown in the above PL/SQL block. ERROR at line 11: ORA-06550: line 11, column 57: PLS-00201: identifier 'V_SEATSAVAILABLE' must be declared ORA-06550: line 11, column 1: PL/SQL: Statement ignored The various different ways in which, nesting of PL/SQL blocks can happen, is shown in below PL/SQL block.

194 | P a g e

Infosys Foundation Program

Relational Database Management System

DECLARE --declaration of variables in the enclosed block BEGIN --SQL and PL/SQL statement(s) DECLARE --- declaration of variables in the nested block BEGIN -- SQL & PL/SQL statement(s) in nested block END; DECLARE -- declaration of variables in the nested block BEGIN -- SQL & PL/SQL statement(s) in nested block END; --SQL and PL/SQL statement(s) END; DECLARE --declaration of variables in the enclosed block BEGIN --SQL and PL/SQL statement(s) DECLARE -- declaration of variables in the nested block BEGIN -- SQL & PL/SQL statement(s) in nested block DECLARE -- declaration of variables BEGIN -- SQL and PL/SQL statement(s) END; END; --SQL and PL/SQL statement(s) END;

Overlapping of nested blocks is not allowed. This is shown in the below PL/SQL block.

195 | P a g e

Infosys Foundation Program

Relational Database Management System

DECLARE --declaration of variables in the enclosed block BEGIN --SQL and PL/SQL statement(s) DECLARE -- declaration of variables in the nested block BEGIN -- SQL & PL/SQL statement(s) in nested block DECLARE -- declaration of variables END; BEGIN -- SQL and PL/SQL statement(s) END; --SQL and PL/SQL statement(s) END;

7.7.1.

Scope of variables

PL/SQL variables declared in the DECLARE section would be visible in the EXECUTABLE section and EXCEPTION section. Lifetime of variables declared in the nested block will be only within the nested block. Variables declared in the outermost block are visible in all the nested blocks. The below code snippet demonstrates the concept of scope of variables. v_courseid is a variable declared in the outer block as assigned a value C001. Another variable with the same name v_courseid is declared in the first inner block also and assigned a value C002. Now when we try to print or display the value of v_courseid in the inner block what value would be displayed?

DECLARE v_courseid VARCHAR2(4) := ‘C001’; BEGIN DECLARE v_courseid VARCHAR2(4) := ‘C002’; BEGIN DBMS_OUTPUT.PUT_LINE(v_courseid); END; DECLARE v_durationinhours NUMBER(4):=3; BEGIN

196 | P a g e

Infosys Foundation Program

Relational Database Management System

DBMS_OUTPUT.PUT_LINE(v_courseid); END; DBMS_OUTPUT.PUT_LINE(v_courseid); END; Thus it displays a value C002 as variable declared in the nested block always has a higher precedence over outer block variables. Hence we cannot access the v_courseid with value C001 within the nested PL/SQL block, as far as the code snippet 2-17 is concerned. A simple analogy to remember the above scenario is as we have learnt in C language the local variables always have higher precedence over global variables, the variables declared within a nested block always has higher precedence. C002 C001 C001

7.7.2.

Qualifying identifiers

Anonymous PL/SQL blocks can be qualified with identifiers (or names). While qualifying use << and >> angle brackets to enclose the identifiers. For example, <> is one PL/SQL block within which there is a nested <> PL/SQL block. These qualifying identifiers will be useful if there is a presence of one or more variables with the same name. <> DECLARE v_seatsavailable NUMBER:=10; BEGIN <> DECLARE v_seatsavailable NUMBER:=20; BEGIN DBMS_OUTPUT.PUT_LINE(' branch seats available: '||branch.v_seatsavailable); DBMS_OUTPUT.PUT_LINE(' course seats available: '||course.v_seatsavailable); END; DBMS_OUTPUT.PUT_LINE(' branch seats available: '||branch.v_seatsavailable); DBMS_OUTPUT.PUT_LINE(' course seats available: '||course.v_seatsavailable); END; v_seatsavailable is a variable declared within the branch block as well as in the nested course block. If we specify the variable without any qualifier, it will refer to the variable declared

197 | P a g e

Infosys Foundation Program

Relational Database Management System

in the current block and it is not mandatory that we have to make use of qualifier declared earlier. By this special qualification names, we are able to access both the variables declared in the outer block <> and the inner block <>. The outcome of the above code snippet is as shown below. ERROR at line 13: ORA-06550: line 13, column 63: PLS-00219: label 'COURSE' reference is out of scope ORA-06550: line 13, column 6: PL/SQL: Statement ignored

7.8.

PL/SQL conditional constructs

PL/SQL supports a list of conditional constructs which are discussed in this section.

7.8.1.

IF THEN – END IF syntax IF condition THEN action; END IF;

The above PL/SQL conditional construct is used to check whether a condition is TRUE, and if it is TRUE the set of statements enclosed between IF THEN and END IF is executed once. The condition can evaluate to TRUE or FALSE or NULL. DECLARE v_maxscore NUMBER :=25; v_projectscore NUMBER :=&v; BEGIN IF v_projectscore>v_maxscore THEN DBMS_OUTPUT.PUT_LINE('Invalid project score'); END IF; END; SQL> / Enter value for v: 26 old 3: v_projectscore NUMBER :=&v; new 3: v_projectscore NUMBER :=26; Invalid project score

198 | P a g e

Infosys Foundation Program

Relational Database Management System

PL/SQL procedure successfully completed.

7.8.2.

IF THEN ELSE – END IF syntax IF condition THEN action-true; ELSE action-false; END IF;

The above PL/SQL conditional construct is used to execute a set of statements (action-true part), if the condition is evaluated to TRUE. If the condition evaluates to FALSE or NULL then the set of statements associated with the false part (action-false) is executed. When the condition can evaluate to NULL? If any variable referred in the condition has a value NULL then the resultant of the condition would be NULL and still the false part would be executed. SQL> ed Wrote file afiedt.buf 1 DECLARE 2 v_num NUMBER; 3 BEGIN 4 IF v_num > 10 THEN 5 DBMS_OUTPUT.PUT_LINE('TRUE'); 6 ELSE 7 DBMS_OUTPUT.PUT_LINE('FALSE OR NULL'); 8 END IF; 9* END; SQL> / FALSE OR NULL PL/SQL procedure successfully completed. As we could see in the above example v_num a variable declared in the declaration part is not initialized and when checked for condition v_num > 10, reduces to NULL > 10 and resultant condition is NULL.

7.8.3.

Usage of inequality operator ( != or <> )

The below example demonstrates the usage of inequality operator while comparing the dissimilarity of 2 strings.

199 | P a g e

Infosys Foundation Program

Relational Database Management System

SQL> ed Wrote file afiedt.buf 1 DECLARE --Comparing VARCHAR2 datatypes 2 v_string1 VARCHAR2(20) := 'Foundation program'; 3 v_string2 VARCHAR2(20) := 'Foundation program'; 4 BEGIN 5 IF v_string1 <> v_string2 THEN 6 DBMS_OUTPUT.PUT_LINE('Both are unequal'); 7 ELSE 8 DBMS_OUTPUT.PUT_LINE('Both are equal'); 9 END IF; 10* END; SQL> / Both are equal PL/SQL procedure successfully completed. Predict the output of the above code snippet by doing the following changes in it. x x

7.8.4.

Change the declaration of v_string1 alone to CHAR(20) and check the output Change both the declaration of strings v_string1 and v_string2 to CHAR(20) and check the output

IF THEN ELSIF – END IF syntax

IF condition THEN action; ELSIF condition THEN action; [ELSE action;] END IF; The above conditional construct is also called as IF ELSIF ladder, wherein more than one condition can be checked one after the other and for every condition check; we can associate a set of executable statements. Hence if any one condition leads to TRUE value, the associated block of statements just beneath that would be executed and no other condition would be checked. Make a note of the ELSIF spelling where E is missing. This is the correct syntax. If none of the condition specified in the IF ELSIF ladder is TRUE, then the control would move to the ELSE part placed at the end and the set of statements present in that ELSE block would be executed.

200 | P a g e

Infosys Foundation Program

Relational Database Management System

The presence of ELSE block is optional in the above syntax and hence enclosed in square brackets. We need only one END IF at the end of the construct. Another variation of IF ELSE IF ladder is also present, which provides the same purpose and functionality, but with a difference in syntax as shown below. IF condition THEN action; ELSE IF condition action; [ELSE action;] END IF; END IF;

THEN

Here for every IF there will be a separate END IF. Examples for both the forms have been given. DECLARE v_maxscore NUMBER :=25; v_projectscore NUMBER :=&v; BEGIN IF v_projectscore>v_maxscore THEN DBMS_OUTPUT.PUT_LINE('Project score cannot be greater than 25'); ELSIF v_projectscore < 0 THEN DBMS_OUTPUT.PUT_LINE('Project score cannot be –ve'); ELSE DBMS_OUTPUT.PUT_LINE('Valid project score'); END IF; END; Enter value for v: 20 Valid project score PL/SQL procedure successfully completed.

SQL> 1 DECLARE 2 v_maxscore NUMBER :=25; 3 v_projectscore NUMBER :=&v; 4 BEGIN 5 IF v_projectscore>v_maxscore THEN

201 | P a g e

Infosys Foundation Program

Relational Database Management System

6 DBMS_OUTPUT.PUT_LINE('Project score cannot be greater than 25'); 7 ELSE IF v_projectscore < 0 THEN 8 DBMS_OUTPUT.PUT_LINE('Project score cannot be -ve'); 9 ELSE 10 DBMS_OUTPUT.PUT_LINE('Valid project score'); 11 END IF; 12 END IF; 13* END; SQL> / Enter value for v: 22 Valid project score PL/SQL procedure successfully completed.

7.8.5.

LOOP.. END LOOP

Whenever we want a set of statements to be executed repeatedly, we prefer this LOOP END LOOP construct. The syntax of this construct is as shown below. LOOP action; END LOOP;

To transfer the control outside the LOOP ... END LOOP construct, we have an EXIT WHEN clause. Otherwise, the control would stay indefinitely within the LOOP ... END LOOP construct. LOOP action; EXIT WHEN condition; END LOOP; Once the condition specified in the EXIT WHEN clause is TRUE, the control would be transferred outside the LOOP ... END LOOP construct. If the condition evaluates to FALSE or NULL the control is retained within the LOOP ... END LOOP construct. An example, given below demonstrates the usage of LOOP ... END LOOP construct. DECLARE v_seatsallocated NUMBER:=1; BEGIN LOOP DBMS_OUTPUT.PUT_LINE('Seats allocated '|| v_seatsallocated);

202 | P a g e

Infosys Foundation Program

Relational Database Management System

v_seatsallocated:= v_seatsallocated +1; EXIT WHEN v_seatsallocated >5; END LOOP; END; Seats allocated 1 Seats allocated 2 Seats allocated 3 Seats allocated 4 Seats allocated 5 PL/SQL procedure successfully completed.

7.8.6.

Numeric FOR Loop FOR countervariable IN low_number .. high_number LOOP action; END LOOP;

PL/SQL supports numeric FOR Loop where an implicit declaration of counter variable happens and the counter variable is initialized with the low_number initially. After this initialization, the block of statements present in the body of the FOR Loop is executed once. Subsequently the counter variable is incremented by one and checked whether it has attained the high_number value. If not, once again the block of statements is executed; else the control is transferred outside. We cannot increment the counter variable by 2 or 3 or any other value. Counter variable will always be incremented by one. SQL> BEGIN 2 FOR v_num IN 1 .. 4 3 LOOP 4 DBMS_OUTPUT.PUT_LINE('Seats allocated : '|| v_num); 5 END LOOP; 6 END; Seats allocated : 1 Seats allocated : 2 Seats allocated : 3 Seats allocated : 4

203 | P a g e

Infosys Foundation Program

Relational Database Management System

PL/SQL procedure successfully completed. The lifetime of the counter variable would be only within the FOR loop and a reference to counter variable outside the FOR loop would throw scope error. SQL> ed Wrote file afiedt.buf 1 BEGIN 2 FOR v_num IN 1 .. 4 3 LOOP 4 DBMS_OUTPUT.PUT_LINE('Seats allocated : '|| v_num); 5 END LOOP; 6 DBMS_OUTPUT.PUT_LINE('Seats allocated : '|| v_num); 7* END; SQL> / DBMS_OUTPUT.PUT_LINE('Seats allocated : '|| v_num); * ERROR at line 6: ORA-06550: line 6, column 45: PLS-00201: identifier 'V_NUM' must be declared ORA-06550: line 6, column 1: PL/SQL: Statement ignored When the low_number and high_number value is one and the same, still the body of the FOR loop will be executed once which is demonstrated in the below example.

SQL> ed Wrote file afiedt.buf 1 BEGIN 2 FOR v_num IN 1 .. 1 3 LOOP 4 DBMS_OUTPUT.PUT_LINE('Seats allocated : '|| v_num); 5 END LOOP; 6* END; SQL> / Seats allocated : 1 PL/SQL procedure successfully completed. Try to identify what happens, if you declare a variable named v_num in the above PL/SQL block?

204 | P a g e

Infosys Foundation Program

Relational Database Management System

7.8.7.

Numeric FOR Loop – with REVERSE option

Another variation of numeric FOR loop available in PL/SQL is with the REVERSE keyword option, wherein we can start from the high_number value and end up with the low_number value. The syntax is as shown below. FOR countervariable IN REVERSE low_number .. high_number LOOP action; END LOOP; The below example demonstrates the usage of REVERSE keyword in FOR loop. SQL> BEGIN 2 FOR v_num IN REVERSE 1 .. 4 3 LOOP 4 DBMS_OUTPUT.PUT_LINE('Seats allocated : '|| v_num); 5 END LOOP; 6 END; Seats allocated : 4 Seats allocated : 3 Seats allocated : 2 Seats allocated : 1 PL/SQL procedure successfully completed. Note: Always the lower number should be mentioned at the beginning

7.8.8.

WHILE Loop

We also have the traditional WHILE loop available in PL/SQL, with the very same entry controlled nature. WHILE condition LOOP action; END LOOP; Only if the condition is TRUE, we can execute the blocks of statements enclosed within WHILE LOOP … END LOOP. As long as the condition is TRUE the control would be retained inside, but once the condition returns FALSE the control will be transferred outside. The counter variable used in the condition part needs to be declared explicitly. The below example demonstrates the usage of WHILE loop.

205 | P a g e

Infosys Foundation Program

Relational Database Management System

SQL> ed Wrote file afiedt.buf 1 DECLARE 2 v_num NUMBER:= 1; 3 BEGIN 4 WHILE v_num <= 4 5 LOOP 6 DBMS_OUTPUT.PUT_LINE('v_num: '|| v_num); 7 v_num := v_num + 1; 8 END LOOP; 9* END; SQL> / v_num: 1 v_num: 2 v_num: 3 v_num: 4 PL/SQL procedure successfully completed.

7.9.

Using SQL statements in PL/SQL

All the SQL statements available in SQL can be used in PL/SQL also. This section illustrates the usage of various SQL statements which we have learnt earlier, and highlights the difference in syntax, if any.

7.9.1.

Using SELECT statements in PL/SQL SELECT select_list [INTO variable_list] FROM table_list [WHERE where_clause];

We can use SELECT statements in PL/SQL but with an additional clause called the INTO clause. This INTO clause is used for specifying the list of PL/SQL variables, where the selected value has to be moved into. Only one row can be returned into the variable list. SQL> 1 DECLARE 2 v_ename emp.ename%TYPE; 3 v_sal emp.sal%TYPE; 4 v_job emp.job%TYPE;

206 | P a g e

Infosys Foundation Program

Relational Database Management System

5 BEGIN 6 --select statement fetches the employee name, salary and 7 --job of an employee where employee no equal to 7934 8 SELECT ename,sal,job INTO v_ename,v_sal,v_job FROM emp WHERE empno=7934; 9 DBMS_OUTPUT.PUT_LINE('The employee name is '||v_ename); 10 DBMS_OUTPUT.PUT_LINE('Salary '||v_sal); 11 DBMS_OUTPUT.PUT_LINE('Job '||v_job); 12* END; SQL> / The employee name is MILLER Salary 1300 Job CLERK PL/SQL procedure successfully completed. In the above example we have made use of anchored declarations and declared three variables v_ename, v_sal, v_job and the SELECT statement fetches the employee name, salary and job value of an employee with employee number 7934 into the above declared PL/SQL variables and displays the same. One important thing which is worth mentioning here is that in the SELECT statement the datatype of every column mentioned in the column list and datatype of PL/SQL variables should match. If the SELECT statement is unable to identify a row in the underlying table matching the query condition specified in the WHERE clause, a NO_DATA_FOUND exception is thrown. SQL> 1 DECLARE 2 v_ename emp.ename%TYPE; 3 v_sal emp.sal%TYPE; 4 v_job emp.job%TYPE; 5 BEGIN 6 --select statement fetches the employee name, salary and 7 --job of an employee where employee no equal to 1234 8 SELECT ename,sal,job INTO v_ename,v_sal,v_job FROM emp WHERE empno=1234; 9 DBMS_OUTPUT.PUT_LINE('The employee name is '||v_ename); 10 DBMS_OUTPUT.PUT_LINE('Salary '||v_sal); 11 DBMS_OUTPUT.PUT_LINE('Job '||v_job); 12* END;

207 | P a g e

Infosys Foundation Program

Relational Database Management System

SQL> / DECLARE * ERROR at line 1: ORA-01403: no data found ORA-06512: at line 8 If the SELECT statement fetches more than one row then TOO_MANY_ROWS exception would be thrown. SQL 1 DECLARE 2 v_ename emp.ename%TYPE; 3 v_sal emp.sal%TYPE; 4 v_job emp.job%TYPE; 5 BEGIN 6 SELECT ename,sal,job INTO v_ename,v_sal,v_job FROM emp; 7* END; SQL> / DECLARE * ERROR at line 1: ORA-01422: exact fetch returns more than requested number of rows ORA-06512: at line 6 Exceptions are discussed in the subsequent chapters. Refer to those chapters for more details.

7.10. Composite datatype Composite datatype helps us to store more than one value. This is just similar to the concept of structure in C language, wherein a programmer can store either homogeneous or heterogeneous values in contiguous memory locations within a variable. In PL/SQL, allows us to create record variables which can store multiple column values. Using %ROWTYPE anchored declaration; we can declare a record variable based on a database table definition. i.e. we need to prefix the %ROWTYPE keyword with a database table name.

7.10.1.

%ROWTYPE

208 | P a g e

Infosys Foundation Program

Relational Database Management System

--recordvariablename v_branchrec

tablename%ROWTYPE;

branch%ROWTYPE;

The name of the individual columns within the record variable would be similar to name of database table column names. None of the constraints mentioned during table creation would be applied to the individual columns while creating record variables. Only the column names alone are copied and retained. To refer to individual field or column within a record, after creation of record variable we have to use the following syntax

recordvariable.columnname

Referring to record variable name alone would not print the entire record. If any underlying column definition is modified, the change would be reflected in the structure of record variable, the next time the PL/SQL block is run or compiled. SQL> DECLARE 2 v_branchrec branch%ROWTYPE; 3 BEGIN 4 SELECT * INTO v_branchrec FROM branch WHERE branchid='B1'; 5 DBMS_OUTPUT.PUT_LINE(v_branchrec.branchid); 6 DBMS_OUTPUT.PUT_LINE(v_branchrec.branchname); 7 END; 8 / SQL> / B1 Computer Science PL/SQL procedure successfully completed.

7.10.2.

Using INSERT statements in PL/SQL INSERT INTO table_name[(column_list)] VALUES select_statement | (value_list);

209 | P a g e

Infosys Foundation Program

Relational Database Management System

The syntax of INSERT statement in PL/SQL is shown above. We can write INSERT statements in PL/SQL and hard code the value directly. We can even accept the branchid, branchname, seatavailable, departmentid from the end user and insert all the details into the branch table. --Inserting values to branch table directly by providing values BEGIN INSERT INTO branch VALUES (‘B4’,’Microbiology’,10,40); END; DECLARE v_branchid branch.branchid%TYPE:= '&bid'; v_branchname branch.branchname%TYPE := '&bname'; v_seatsavailable branch.seatsavailable%TYPE := &seats; BEGIN INSERT INTO branch(branchid, branchname, seatsavailable) (v_branchid,v_branchname, v_seatsavailable); END; /

7.10.3.

VALUES

Using UPDATE statements in PL/SQL UPDATE table_name SET column_name = select_statement [column_name = value] [WHERE where_clause];

|

value

,

The syntax for updating a record or a set of records in PL/SQL is shown above. We can write UPDATE statement and hard code the value for any straightforward updation, or we can accept the new value to be updated from the end user and substitute the PL/SQL variable in the appropriate place for updation. The below PL/SQL block demonstrates the usage of UPDATE statement in PL/SQL. BEGIN UPDATE branch SET seatsavailable=12 WHERE branchid='B4'; END; /

210 | P a g e

Infosys Foundation Program

Relational Database Management System

7.10.4.

Using DELETE statements in PL/SQL DELETE FROM table_name [WHERE where_clause]

The syntax for deleting a record or a set of records in PL/SQL is shown above. We can write delete statement and hard code the value for any straightforward deletion, or we can accept the value to be deleted from the end user and substitute the PL/SQL variable in the appropriate place for deletion. BEGIN DELETE FROM branch WHERE branchid='B4'; END; /

8. PL/SQL EXCEPTIONS

211 | P a g e

Infosys Foundation Program

Relational Database Management System

8.1.

Introduction

In this chapter, we would be discussing about how exceptions are handled in PL/SQL. When a PL/SQL is compiled, if there are any compilation errors, such as incorrect syntax usage then the PL/SQL compilation unit would throw compilation errors. PL/SQL block which does have any compilation error, might throw some runtime errors or exceptions, during the PL/SQL code execution, by the PL/SQL runtime engine. For example, when a programmer writes an arithmetic expression which leads to division by zero situations then ZERO_DIVIDE runtime exception is thrown by the PL/SQL block. These exceptions can be trapped by writing exception handlers appropriately in the exception block of PL/SQL. Thus exception is an identifier in PL/SQL raised during the execution of a PL/SQL block. Whenever an exception arises the control leaves the main body of action and transfers the control to the EXCEPTION section of the anonymous PL/SQL block. For example, if the exception is thrown in the nth line of a PL/SQL block, the control will not return to the (n+1) th line. The execution of the program would continue in the exception handler, and then to any outer block, if it is nested. Also the program execution would never return to the subsequent statement after the exception is raised.

8.2.

How to handle exception?

Using the exception part of a PL/SQL block we can handle exceptions. If the exceptions are not trapped in the exception part of a PL/SQL, these exceptions would be propagated to the calling environment. Also note that exceptions can be raised in the declaration part, executable part as well as in the exception part.

8.3.

Exception syntax EXCEPTION WHEN exception1 [OR exception2 . . .] THEN statement1; statement2; ... [WHEN exception3 [OR exception4 . . .] THEN statement1; statement2;

212 | P a g e

Infosys Foundation Program

Relational Database Management System

. . .] [WHEN OTHERS THEN statement1; statement2; . . .] END; The above code snippet shows the syntax of writing exception handlers. EXCEPTION keyword starts the exception handling section. PL/SQL programmer can define several exception handlers, each with its own set of actions. The runtime exception identifier which need to be handled should be placed in between the WHEN and THEN keyword. If more than one exception situation need to be handled, then we can make use of optional OR keyword in between two exception identifier names. We cannot replace the optional OR keyword by AND keyword. When an exception occurs, only one among the several exception handlers would be executed before leaving the EXCEPTION block, after which the control would be transferred to the outer block or to the calling environment. As a programmer, we anticipate lot of runtime error situations which a PL/SQL block might come across, but what would happen in case, if there is an unanticipated runtime error. The solution is to go for a WHEN OTHERS exception handler, which would take care of all other unanticipated runtime error scenarios.

8.4.

Exception Types

Exceptions can be classified into following types. 1. Predefined oracle server exceptions 2. Non-predefined oracle server exceptions 3. User defined exceptions

8.4.1.

Raising exceptions

In general exceptions can be raised in the executable or in the exception section of a PL/SQL block. Predefined exceptions are raised implicitly, whenever that situation arises. PL/SQL runtime engine executes statements associated with the trapped predefined exception. We can raise our own exceptions also explicitly in the executable section or in the exception section.

8.5.

Predefined oracle server exception

Predefined exceptions are raised implicitly whenever an anticipated error situation arises while executing the statements in the PL/SQL block. PL/SQL runtime engine executes statements associated with the trapped predefined exception.

213 | P a g e

Infosys Foundation Program

Relational Database Management System

For example, division by zero is a predefined error situation. Whenever this error situation arises immediately the control would be moved over to the exception section searching for ZERO_DIVIDE exception identifier. If the programmer has written a block of statement under this ZERO_DIVIDE exception identifier, we say that the programmer has defined an exception handler. The PL/SQL runtime engine also executes the bunch of statements associated with this exception handler, after which the control is transferred to the enclosed outer block, if any or to the calling SQL environment. Oracle Error

Predefined Exception

Description

ORA-1403

NO_DATA_FOUND

SELECT statement matches no rows

ORA-1422

TOO_MANY_ROWS

SELECT statement matches more than one row

ORA-0001

DUP_VAL_ON_INDEX

Unique constraint violated

ORA-1476

ZERO_DIVIDE

Division by zero

ORA-6502

VALUE_ERROR

Truncation, Arithmetic error

ORA-1722

INVALID_NUMBER

Conversion to a number failed. Ex. “2A” is not valid

To trap a predefined oracle server exception we need to know the standard name. We can refer to oracle documentation for predefined oracle server exception identifier names.

8.5.1.

NO_DATA_FOUND predefined exception

NO_DATA_FOUND is a predefined oracle server exception that would be implicitly raised, whenever a SELECT statement enclosed in a PL/SQL block, fails to identify a matching record in the underlying table. Note that if an INSERT or UPDATE or DELETE statement does not affect one or more rows, this exception is NOT thrown or raised. Only when the SELECT statement fails in a PL/SQL block, the above predefined exception is thrown. SQL> DECLARE 2 v_branchid branch.branchid%TYPE; 3 v_seats branch.seatsavailable%TYPE; 4 BEGIN 5 v_branchid := '&branchid'; 6 SELECT seatsavailable INTO v_seats FROM branch WHERE branchid 214 | P a g e

Infosys Foundation Program

Relational Database Management System

LIKE v_branchid; 7 DBMS_OUTPUT.PUT_LINE('Seats Available: ' || v_seats); 8 EXCEPTION 9 WHEN NO_DATA_FOUND THEN 10 DBMS_OUTPUT.PUT_LINE('Invalid Branch ID'); 11 END; In the example, if the given branchid is present in the branch table, the above PL/SQL block would display the number of seats available; else it would display “Invalid Branch ID”.

8.5.2.

TOO_MANY_ROWS predefined exception

TOO_MANY_ROWS is a predefined oracle server exception that is thrown implicitly, whenever the SELECT statement fetches more than one row. --Given a valid supplierid identify whether he supplies one item or more --than one SET SERVEROUTPUT ON DECLARE v_supplierid itemsupplier.supplierid%TYPE; v_supplierrec itemsupplier%ROWTYPE; BEGIN SELECT * INTO v_supplierrec FROM itemsupplier WHERE supplierid ='&v_supplierid'; DBMS_OUTPUT.PUT_LINE('Supplier '||v_supplierid|| ' supplies only one item'); EXCEPTION WHEN TOO_MANY_ROWS THEN DBMS_OUTPUT.PUT_LINE('Supplier '||v_supplierid|| ' supplies more than one item'); END; The above PL/SQL block deals with itemsupplier table, which captures the details of list of suppliers who supplies various items. Assuming that the same supplier can supply more than one item, based on supplierid when we try to fetch records from the supplier table, this might affect more than one record. Hence this would be leading to TOO_MANY_ROWS predefined exception and would display a message “Supplier <> supplies more than one item”.

215 | P a g e

Infosys Foundation Program

Relational Database Management System

8.5.3.

DUP_VAL_ON_INDEX predefined exception

DUP_VAL_ON_INDEX predefined exception is thrown whenever we try to duplicate a primary key column in a table. SQL> DECLARE 2 v_student student%ROWTYPE; 3 BEGIN 4 v_student.studentid = &studentid; 5 v_student.applicationid = &applicationid; 6 v_student.currentsemester = ¤tsemester; 7 v_student.branchid = '&branchid'; 8 v_student.userid = '&userid'; 9 v_student.password = '&password'; 10 v_student.residentialstatus = '&resstatus'; 11 INSERT INTO student VALUES(v_student.studentid, v_student.applicationid, v_student.currentsemester, v_student.branchid, v_student.userid, v_student.password, v_student.residentialstatus); 12 EXCEPTION 13 WHEN DUP_VAL_ON_INDEX THEN 14 DBMS_OUTPUT.PUT_LINE('Duplicate Student ID'); 15 WHEN OTHERS THEN 16 DBMS_OUTPUT.PUT_LINE('Transaction Failed'); 17 END; In the above PL/SQL block, while inserting a student record, if the student id is duplicated, then we would receive a message “Duplicate Student ID”.

8.5.4.

VALUE_ERROR predefined exception

VALUE_ERROR predefined exception is thrown in 2 different scenarios. 1. When an entered or accepted input data value from the user is very large. For example, v_studentid is declared as VARCHAR2 (6). If we try entering more than 6 characters as input, it would lead to truncation of the given input value, leading to VALUE_ERROR predefined exception. SQL> DECLARE 2 v_studentid VARCHAR2(6); 3 v_studentrec student%ROWTYPE; 4 BEGIN 5 v_studentid:= '&v_studentid'; 6 SELECT * INTO v_studentrec

216 | P a g e

FROM

student

WHERE

Infosys Foundation Program

Relational Database Management System

studentid=v_studentid; 7 DBMS_OUTPUT.PUT_LINE('Student Name '||v_studentrec.studentname); 8 EXCEPTION 9 WHEN VALUE_ERROR THEN 10 DBMS_OUTPUT.PUT_LINE('Entered input is very large'); 11 END;

is

2. When the expected input is numeric but the input entered by the user is characters, then VALUE_ERROR predefined exception is thrown.

8.5.5.

INVALID_NUMBER predefined exception

While we insert records to an underlying table, which expects a numeric value to be entered for a specific column, but a character value is entered (by mistake), then this would lead to INVALID_NUMBER predefined exception. SQL> BEGIN 2 --Inserting departmentid, departmentname, headofdepartment 3 --into department table 4 INSERT INTO department VALUES('X','BioMedical', ' I101'); 5 EXCEPTION 6 WHEN INVALID_NUMBER THEN 7 DBMS_OUTPUT.PUT_LINE('Not a valid number'); 8 END; The above example demonstrates the same, wherein the department table expects the department id (a numeric value) to be entered as input but a character value (‘X’) is entered while inserting a record into the same table. This PL/SQL block when compiled and executed throws INVALID_NUMBER predefined exception.

8.6.

Non-predefined oracle server exception

Every oracle error has an error code and an error message. Not for all runtime error situations, predefined exception names are available. There are runtime error situations which are unnamed in nature. These unnamed runtime error situations can be trapped either using WHEN OTHERS exception handler or we can associate an exception identifier to it using PRAGMA EXCEPTION_INIT PL/SQL compiler directive and then handle it implicitly. SQL> DECLARE 2 e_Missing_Null EXCEPTION; 3 PRAGMA EXCEPTION_INIT( e_Missing_Null, -1400); 4 BEGIN

217 | P a g e

Infosys Foundation Program

Relational Database Management System

5 6 7 8 9

INSERT INTO department VALUES (40, NULL , 'I101'); EXCEPTION WHEN e_Missing_Null THEN DBMS_OUTPUT.PUT_LINE('Missing value for a NOT NULL column '); END;

PRAGMA EXCEPTION_INIT compiler directive during compile time associates an oracle error number with an exception identifier specified. Once the association happens, henceforth we can handle the error situation with that associated oracle error number by writing an exception handler for the same. As shown in the above example, whenever a NOT NULL constraint violation happens while inserting or updating a record an error code -1400 is thrown during runtime with an oracle defined error message. There is no predefined exception identifier which can handle this oracle error situation. e_Missing_Null is an exception identifier declared in the declaration section. Using PRAGMA EXCEPTION_INIT we have associated that exception identifier with the oracle error number -1400. While inserting a department record, value has to be provided for department name and NULL value cannot be inserted for department name, else NOT NULL constraint violation exception would be thrown. As the above PL/SQL block does NULL value insertion for department name, implicitly NOT NULL constraint violation exception is raised, which in turn is handled in the exception block and displays “Missing value for a NOT NULL column”.

8.7.

User-defined exception

Exceptions which are very much specific to the business requirements can be implemented with the help of user defined exceptions. User defined exception identifiers are defined in the declaration section. Using RAISE exceptionidentifier; statement we raise a user-defined exception and exception handlers are written to handle the user-defined exception. e_Invalid_Departmentid is a user defined exception identifier declared in the declaration section. SELECT statement in the executable section identifies the count of number of records in the department table with the given departmentid. If no record exists with the given department id then v_count is set to 0. SQL> 2 3 4

218 | P a g e

DECLARE v_departmentid department.departmentid%TYPE; v_count NUMBER; e_Invalid_Departmentid EXCEPTION;

Infosys Foundation Program

Relational Database Management System

5 BEGIN 6 v_departmentid := '&v_departmentid'; 7 SELECT count(*) INTO v_count FROM department WHERE departmentid=v_departmentid; 8 IF v_count = 0 THEN 9 RAISE e_Invalid_Departmentid; 10 END IF; 11 DBMS_OUTPUT.PUT_LINE('Valid Department id'); 12 EXCEPTION 13 WHEN e_Invalid_Departmentid THEN 14 DBMS_OUTPUT.PUT_LINE('Invalid Department id'); 15 END; If v_count is zero, e_Invalid_Departmentid is raised in the executable part and trapped in the exception handling part which prints ‘Invalid Department id’.

8.8.

WHEN OTHERS exception handler

Exceptions which are not handled by any exception handlers, will be caught with the help of WHEN OTHERS exception handler. WHEN OTHERS can be used to handled all kinds of exception, irrespective of whether it is predefined or non-predefined or user-defined. SQL> DECLARE 2 v_departmentid department.departmentid%TYPE; 3 v_count NUMBER; 4 e_Invalid_Departmentid EXCEPTION; 5 BEGIN 6 v_departmentid := '&v_departmentid'; 7 SELECT count(*) INTO v_count FROM department WHERE departmentid=v_departmentid; 8 IF v_count = 0 THEN 9 RAISE e_Invalid_Departmentid; 10 END IF; 11 DBMS_OUTPUT.PUT_LINE('Valid Department id'); 12 EXCEPTION 13 WHEN OTHERS THEN 14 DBMS_OUTPUT.PUT_LINE('Invalid Department id'); 15 END; A simple analogy to persons familiar with Java, is that this is similar to the generic Exception class. We can have only one WHEN OTHERS exception handler within the exception section of a PL/SQL block.

219 | P a g e

Infosys Foundation Program

Relational Database Management System

WHEN OTHERS should be the last among the exception handler as it refers to rest of all errors not handled in the exception block of the PL/SQL block in which it is defined. Always place the WHEN OTHERS exception handler in the outermost block of PL/SQL block, when nested blocks are present within it.

8.9.

Using SQLCODE and SQLERRM

Using WHEN OTHERS we are able to just handle the unknown or unexpected runtime errors but to know the name of Oracle error and the Oracle code because of which the PL/SQL block failed we use SQLCODE and SQLERRM. Thus it helps us in identifying the reason behind the exception raised. A programmer might be interested in inserting the reasons behind failure of PL/SQL block into an audit_log table, which has the details of log records tracking the errors which happened over a period of time. (not dealt in the code snippet ) SQL> DECLARE 2 e_Missing_Null Exception; 3 PRAGMA EXCEPTION_INIT( e_Missing_Null, -1400); 4 v_sqlcode number; 5 v_sqlerrmsg varchar2(255); 6 BEGIN 7 INSERT INTO department VALUES (40, NULL, 'I101'); 8 EXCEPTION 9 WHEN OTHERS THEN 10 v_sqlcode:=SQLCODE; 11 v_sqlerrmsg:= SUBSTR(SQLERRM,1,255); 12 DBMS_OUTPUT.PUT_LINE('SQLCODE '||v_sqlcode); 13 DBMS_OUTPUT.PUT_LINE('SQLERRM '||v_sqlerrmsg); 14 END; SQLCODE and SQLERRM can be used both in the executable part and the exception part of a PL/SQL block. SQLCODE gives the numeric value of the oracle error code and SQLERRM gives the oracle error code and message associated with the oracle error. The maximum length of SQLERRM is 512 characters. The above example depicts the best practice that can be adhered to by assigning the SQLCODE and SQLERRM to a local variable in PL/SQL block and then using it. As these functions are procedural, we cannot use these variables directly inside an SQL statement.

220 | P a g e

Infosys Foundation Program

Relational Database Management System

8.10. RAISE_APPLICATION_ERROR built in procedure RASIE_APPLICATION_ERROR is a built in procedure used to create error messages, very much similar in a manner consistent with other oracle errors. In one shot, if we want to define the customized error messages and use it without writing any separate exception handlers for the same then we can go with this built in procedure. The two mandatory parameters for this are the error number and the error message. The error number has to be in the range of -20000 to -20999. The error message should not exceed 512 characters. SQL> DECLARE 2 v_departmentid department.departmentid%TYPE; 3 v_count NUMBER; 4 BEGIN 5 v_departmentid := '&v_departmentid'; 6 SELECT count(*) INTO v_count FROM department WHERE departmentid=v_departmentid; 7 IF v_count = 0 THEN 8 RAISE_APPLICATION_ERROR(-20000, 'Invalid Department id'); 9 END IF; 10 DBMS_OUTPUT.PUT_LINE('Valid Department id'); 11 END; In the above example, given a non-existent department id it would throw, “Invalid Department id” using RAISE_APPLICATION_ERROR built in. An implicit rollback also would happen whenever this procedure is executed, and hence changes initiated by this procedure would be rolled back.

8.11. Exception Propagation Exception can be raised in the declaration section, executable section and exception section. Exception raised in the executable section alone can be handled in the same PL/SQL block. Exception raised in the declaration and exception section of a PL/SQL block can be handled in the outer block in which it is enclosed. If it is not handled in the outer block, it would check whether the outer block is enclosed with any other block and so on. This we call it as propagation of exception. If none of the block handles the exception raised it would be propagated to the calling environment.

221 | P a g e

Infosys Foundation Program

Relational Database Management System

8.11.1.

Exception raised in the declaration section

In the below PL/SQL block exception is raised in the declaration section. As the programmer assigns a character value to a numeric variable, this leads to VALUE_ERROR predefined exception. Even though WHEN OTHERS is handled in the same block, as the exception is raised in the declaration part it can be handled only in the outer block. Hence we could see the error message displayed as numeric or value error. DECLARE v_seatsavailable NUMBER(3) := 'ABC'; BEGIN DBMS_OUTPUT.PUT_LINE(v_seatsavailable); EXCEPTION WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE('Value error occurred'); END; DECLARE * ERROR at line 1: ORA-06502: PL/SQL: numeric or value error: character to number conversion error ORA-06512: at line 2 In the below code snippet a PL/SQL block is enclosed in yet another block. WHEN OTHERS handler present in the outer block, handles the exception raised in the declaration part of the inner block, producing ‘Other error’ as output. SQL> BEGIN 2 3 DECLARE 4 v_seatsavailable NUMBER(3) := 'ABC'; 5 BEGIN 6 DBMS_OUTPUT.PUT_LINE(v_seatsavailable); 7 EXCEPTION 8 WHEN OTHERS THEN 9 DBMS_OUTPUT.PUT_LINE('Value error occurred'); 10 END; 11 12 DBMS_OUTPUT.PUT_LINE('Completed'); 13 EXCEPTION 14 WHEN OTHERS THEN 15 DBMS_OUTPUT.PUT_LINE('Other error');

222 | P a g e

Infosys Foundation Program

Relational Database Management System

16 END;

8.11.2.

Exception raised in the executable section

e_Invalid_Departmentid is an exception raised in the executable part of a PL/SQL block using RAISE statement and has been handled in the same block. Hence the following code produces 'Invalid Departmentid’ and prints ‘Successful completion' as execution of the outer block continues normally.

SQL> DECLARE 2 e_Invalid_Departmentid EXCEPTION; 3 BEGIN 4 BEGIN 5 RAISE e_Invalid_Departmentid; 6 EXCEPTION 7 WHEN e_Invalid_Departmentid THEN 8 DBMS_OUTPUT.PUT_LINE('Invalid Departmentid'); 9 END; 10 DBMS_OUTPUT.PUT_LINE('Successful completion'); 11 END;

8.11.3.

Exception raised in the exception section

In the below code snippet, e_Invalid_Itemid is an exception raised in the inner block and handled in the same block. But the e_Invalid_Itemid exception handler in turn raise yet another exception named e_Invalid_Customerid and this exception as it is raised in the exception block of a PL/SQL block it has to be handled in the exception section of the outer block. DECLARE e_Invalid_Itemid EXCEPTION; e_Invalid_Customerid EXCEPTION; BEGIN BEGIN RAISE e_Invalid_Itemid; EXCEPTION WHEN e_Invalid_Itemid THEN

223 | P a g e

Infosys Foundation Program

Relational Database Management System

RAISE e_Invalid_Customerid; WHEN e_Invalid_Customerid THEN DBMS_OUTPUT.PUT_LINE('Invalid Customerid'); END; END; As there is no outer block, PL/SQL runtime engine would assume that e_Invalid_Customerid is an unhandled user-defined exception and as that exception is in turn raised by e_Invalid_Itemid, the engine would also say that e_Invalid_Itemid is also an unhandled user defined exception. Hence, if we execute the above PL/SQL block, we could see twice in our output an error message is shown saying “unhandled user-defined exception” DECLARE * ERROR at line 1: ORA-06510: PL/SQL: unhandled user-defined exception ORA-06512: at line 9 ORA-06510: PL/SQL: unhandled user-defined exception The above the PL/SQL block is slightly modified and shown below with e_Invalid_Customerid handled in the outer block. DECLARE e_Invalid_Itemid EXCEPTION; e_Invalid_Customerid EXCEPTION; BEGIN BEGIN RAISE e_Invalid_Itemid; EXCEPTION WHEN e_Invalid_Itemid THEN RAISE e_Invalid_Customerid; WHEN e_Invalid_Customerid THEN DBMS_OUTPUT.PUT_LINE(‘Invalid Customerid in the nested block’); END; EXCEPTION WHEN e_Invalid_Customerid THEN DBMS_OUTPUT.PUT_LINE(‘Invalid Customerid in the outer block’); END;

224 | P a g e

Infosys Foundation Program

Relational Database Management System

9. PL/SQL cursors 9.1.

Cursors

Every SQL query submitted to the Oracle server affects one or more rows. The subset of row which is affected by the submitted SQL query is momentarily kept in a special place in the system memory of the oracle server. This temporary area is called private SQL work area, in which the rows affected by the query, count of number of records affected by the given query and a pointer to the parsed query, all are kept. Thus cursor is a private SQL work area. Every SQL statement executed by the oracle server has a separate private SQL work area associated with it. More than one row can be kept in the private SQL work area, but only one row can be processed at a time. This area is also called as context area by some authors. The set of rows that are held by the cursor currently is called an active set. Oracle can manage the cursor operations by itself for statements such as SELECT and DML statements then it is called implicit cursor. When the programmer manages the cursor operations then we call it as explicit cursor. Managing the cursor involves allocation of memory for the work area, opening the work area, fetching the records from the work area, closing or releasing the work area after the processing is done.

9.2.

Implicit cursors

Whenever INSERT, UPDATE or DELETE statements are executed, PL/SQL implicit cursors are created by default and rows are processed. Also when we write a SELECT statement which deals with only one row, implicit cursors are created and managed by the oracle server.

225 | P a g e

Infosys Foundation Program

Relational Database Management System

9.3.

Implicit cursors attributes

Implicit Cursor Attribute

Meaning

SQL%ROWCOUNT

Number of records affected by the most recent SQL statement

SQL%FOUND

Evaluates to TRUE if the most recent SQL statement affects one or more rows

SQL%NOTFOUND

Evaluates to TRUE if the most recent SQL statement does not affect any rows

SQL%ISOPEN

Always evaluates to FALSE because PL/SQL closes implicit cursors immediately after they are executed

After successful insertion, the values of implicit cursor attributes are as shown below. SQL%ISOPEN SQL%FOUND SQL%NOTFOUND SQL%ROWCOUNT

FALSE TRUE FALSE 1

After successful updation, the values of implicit cursor attributes are as shown below. SQL%ISOPEN SQL%FOUND SQL%NOTFOUND SQL%ROWCOUNT

FALSE TRUE FALSE Depends on number of rows updated

After successful deletion, the values of implicit cursor attributes are as shown below. SQL%ISOPEN SQL%FOUND SQL%NOTFOUND SQL%ROWCOUNT

FALSE TRUE FALSE Depends on number of rows deleted

Do not make use of implicit cursor attributes to test the unsuccessfulness of SELECT statement using SQL%NOTFOUND beneath the SELECT statement. When a SELECT statement 226 | P a g e

Infosys Foundation Program

Relational Database Management System

fails, NO_DATA_FOUND predefined exception will be thrown. As soon as the control moves to NO_DATA_FOUND exception handler, if we try to check the values of all implicit cursor attributes it would be as shown below.

SQL%ISOPEN SQL%FOUND SQL%NOTFOUND SQL%ROWCOUNT

227 | P a g e

FALSE FALSE TRUE 0

Infosys Foundation Program

Relational Database Management System

9.4.

Implicit cursor example

The below example shows the usage of implicit cursor attributes. BEGIN UPDATE instructor SET remaininghours=NULL WHERE dateofjoining > '10-JAN-2007'; DBMS_OUTPUT.PUT_LINE(SQL%ROWCOUNT ||' rows updated'); IF SQL%NOTFOUND THEN DBMS_OUTPUT.PUT_LINE(' Nobody joined after 10-JAN-2007'); END IF; COMMIT; END;

9.5.

Explicit Cursors

When we want to write SELECT statements in PL/SQL that deals with more than one row, we have to go for explicit cursors. When we specify any SELECT statement in the declaration part, we can consider that our plan is to go for explicit cursors. As mentioned earlier, we need to manage the explicit cursors of our own. Hence what are the various operations that the developer needs to do while managing explicit cursors, to gain complete control is what is discussed in the subsequent sections.

9.6.

Operations on explicit cursor

The operations on explicit cursor are as follows: 1. Declaring the cursor 2. Opening the cursor 3. Fetching the cursor 4. Closing the cursor Let’s have a closer look into all these activities in the below sections.

9.6.1.

Declaring the cursor

Use the CURSOR keyword to start the cursor declaration followed by the cursor identifier name and immediately followed by IS keyword. Note that this cursor identifier need not be declared anywhere in the PL/SQL block. Followed by IS keyword we can write any SQL query. All SQL queries supported in SQL environment are supported here in the declaration part too.

228 | P a g e

Infosys Foundation Program

Relational Database Management System

CURSOR c1 IS SELECT branchid FROM branch WHERE departmentid IN (SELECT departmentid FROM department); CURSOR c2 IS SELECT branchid, branchname, headofdepartment FROM department WHERE departmentid > 20; CURSOR c3 IS SELECT departmentid, count(*) FROM branch WHERE departmentid > 20 GROUP BY branchid; It is not necessary to write an INTO clause in the cursor declaration, as this does not make any sense. Even if by chance INTO clause is made use of in the cursor declaration, PL/SQL compiler would not throw any error. Since only when we actually fetch the records from the active set one after the other, we need to supply appropriate place holders to store the resultant value fetched and this is taken care by the FETCH statement (discussed later). Mere declaration of CURSOR alone will not immediately identify the active set.

9.6.2.

Opening the cursor OPEN cursorname; Example: OPEN c1; OPEN c2; OPEN c3;

The above code snippet shows the syntax for opening a cursor. The cursor identifier used in the cursor declaration needs to be specified while opening the cursor. We can open cursors both in the executable part as well as in the exception part of a PL/SQL block. If the cursor is already open, then the PL/SQL runtime engine would throw CURSOR_ALREADY_OPEN predefined runtime exception. The SELECT associated with the cursor declaration is executed only when we open the cursor. Thus the OPEN command prepares the cursor for use, identifies the active set associated with the given SQL query, and positions the cursor before the first row. If the SQL query fetches no rows from the database, it would not throw any exception. We need to make use of explicit cursor attributes to test the outcome after fetch. The same set of cursor attributes what we have discussed earlier with respect to implicit cursor attributes, can be made use with explicit cursor also by replacing the SQL keyword prefixed with every implicit cursor attribute with the respective cursor identifier name.

229 | P a g e

Infosys Foundation Program

Relational Database Management System

Within a PL/SQL block a cursor can be opened any number of times. Every time when we open the cursor different active sets can be identified, depending on the current state of the records in the database. Do not try to reopen the cursor without closing it as it would throw an exception, which is discussed earlier. The safest way of opening a cursor is as shown below, wherein we check whether the cursor is already open and if not, we open the cursor. IF NOT c1%ISOPEN THEN OPEN c1; END IF; Before we open the cursor, assuming that C1 is the explicit cursor which we are dealing with, the values of various explicit cursor attributes will be as shown in the below table: C1%ISOPEN C1%FOUND C1%NOTFOUND C1%ROWCOUNT

FALSE INVALID_CURSOR exception INVALID_CURSOR exception INVALID_CURSOR exception

After we open the cursor, assuming that C1 is the explicit cursor which we are dealing with, the values of various explicit cursor attributes, before fetching any record will be as shown in the below table: C1%ISOPEN C1%FOUND C1%NOTFOUND C1%ROWCOUNT

TRUE NULL NULL 0

Do not make use of implicit cursor attributes like SQL%FOUND to test the outcome of explicit cursors. If used, the outcome of recently executed SQL statement will be reflected in those variables if present and not the outcome of explicit cursor. If no SQL statement is present then all these variables would evaluate to NULL.

9.6.3.

Fetching records from the cursor

The syntax for fetching records from the cursor is as shown below. FETCH cursorname INTO listofvariables | PL/SQL record variable Example:

230 | P a g e

Infosys Foundation Program

Relational Database Management System

FETCH c1 INTO v_branchid; FETCH c2 INTO v_branchid, v_branchname, v_headofdepartment; FETCH c3 INTO v_branchid, v_count; FETCH c3 INTO v_branchrec; Immediately after opening the cursor, we can start fetching the records from the active set identified. If we try to fetch records from an unopened cursor, an INVALID_CURSOR exception would be thrown. Make sure to specify the name of cursor which is already opened. Followed by INTO keyword we need to specify the list of variable names in which the values have to be populated. Care should be taken that datatype of the list of columns mentioned in the SELECT statement should exactly match with the datatype of the variables in the FETCH statement. Usually we place this FETCH statement within a LOOP .. END LOOP construct as we need to repeatedly execute the same statement, several times for fetching all the subsequent records until we reach the last record. Here we assume that our active set has more than one record, and that is again the reason why we have opted for explicit cursors. Whenever the FETCH is successful, %FOUND is set to TRUE and if unsuccessful %NOTFOUND is set to TRUE. Thus to transfer the control outside the LOOP .. END LOOP construct we have to make use of an EXIT WHEN statement immediately after the FETCH. As soon as %NOTFOUND is set to TRUE we can transfer the control outside the LOOP .. END LOOP construct. Before we fetch the first record from the cursor, the values of various explicit cursor attributes will be as shown in the below table: C1%ISOPEN C1%FOUND C1%NOTFOUND C1%ROWCOUNT

TRUE NULL NULL 0

After we successfully fetch the first record from an explicit cursor, the values of various explicit cursor attributes will be as shown in the below table: C1%ISOPEN C1%FOUND C1%NOTFOUND C1%ROWCOUNT

231 | P a g e

TRUE TRUE FALSE 1

Infosys Foundation Program

Relational Database Management System

Subsequently for every successful fetch all other explicit cursor attributes will be present as such except C1%ROWCOUNT which is incremented by 1. After the first UNSUCESSFUL fetch from explicit cursor, the value of various explicit cursor attributes will be as shown in the below table: C1%ISOPEN C1%FOUND C1%NOTFOUND C1%ROWCOUNT

TRUE FALSE TRUE n

where n is the maximum number of records present in the explicit cursor.

9.6.4.

Closing the cursor

The below code snippet shows the syntax of close cursor statement. CLOSE cursorname; Example: CLOSE c1; CLOSE c2; CLOSE c3; Cursor name specified in close cursor is the name of the cursor to be closed. If we try to close a cursor which is already closed INVALID_CURSOR exception would be thrown. Memory allocated to an explicit cursor is released only when we close the cursor. Usually a programmer closes the cursor once he has completed processing on the set of records present in the active set. Reopen the cursor, if required. Do not attempt to fetch records from the closed cursor as this would also lead to INVALID_CURSOR exception. IF c1%ISOPEN THEN CLOSE c1; END IF;

9.7.

Explicit cursor – Simple loop

232 | P a g e

Infosys Foundation Program

Relational Database Management System

SQL> DECLARE CURSOR c1 IS SELECT branchid, seatsavailable FROM branch WHERE departmentid in (SELECT departmentid FROM department); v_branchid branch.branchid%TYPE; v_seatsavailable branch.seatsavailable%TYPE; BEGIN OPEN C1; LOOP FETCH c1 INTO v_branchid, v_seatsavailable; EXIT WHEN c1%NOTFOUND; UPDATE branch SET seatsavailable = v_seatsavailable + 1 WHERE branchid=v_branchid; DBMS_OUTPUT.PUT_LINE(v_branchid); END LOOP; CLOSE c1; COMMIT; END; / The above code snippet is an example of implementation of explicit cursors using LOOP ... END LOOP; construct. To increment the number of seats available by one in all the branches associated with every department of a university an explicit cursor implementation is done. The cursor declaration happens in the declaration part where all the branches associated with every department are identified. The cursor is opened and we fetch every record present in the identified active set into appropriate PL/SQL variables. An update statement wherein the seats available is incremented by one for every branch id present in the active set. Finally we close the cursor and the private SQL work area allocated is released and committed.

9.8.

Explicit cursor – With Group by clause SQL> DECLARE CURSOR cur_branch IS SELECT branchname, COUNT(*) as no_of_applicant_opted FROM applicant c, branch b WHERE c.optedbranch = b.branchid GROUP BY branchname ; v_branchname branch.branchname%TYPE; v_noofapplicants NUMBER; BEGIN OPEN cur_branch;

233 | P a g e

Infosys Foundation Program

Relational Database Management System

DBMS_OUTPUT.PUT_LINE('Branch Name No. of Application Opted'); LOOP FETCH cur_branch INTO v_branchname, v_noofapplicants; EXIT WHEN cur_branch %NOTFOUND; DBMS_OUTPUT.PUT(v_branchname||' '); DBMS_OUTPUT.PUT(v_noofapplicants); DBMS_OUTPUT.NEW_LINE; END LOOP; CLOSE cur_branch; END; / The above code snippet demonstrates the usage of GROUP BY clause in cursor declaration. This example deals with the display of branch name and the total number of applicants opted for every branch. As we have used an aggregate function COUNT (*) in the cursor declaration, an alias name is necessary which would help us in accessing the value for that column.

9.9. • • • •

Explicit cursor attributes cursorname%ISOPEN – Is the cursor open? cursorname%ROWCOUNT – How many rows have been fetched so far? cursorname%NOTFOUND – Has a fetch failed? cursorname%FOUND – Has a row been fetched?

The discussion about the values present in these explicit cursor attributes is done along with cursor operations.

9.10. Using record variables with explicit cursors The below example demonstrates how to make use of record variable for accessing the active set record details and display of the same. v_curvar is a record variable declared as %ROWTYPE to hold both the branchname and the count of number of applicants opted in every branch. CURSOR cur_branch IS SELECT branchname, COUNT(*) as no_of_applicant_opted FROM applicant c, branch b WHERE c.optedbranch = b.branchid GROUP BY branchname ; v_curvar cur_branch%ROWTYPE; v_newrec v_curvar%TYPE; This record variable declared is made use of in FETCH statement after the INTO clause.

234 | P a g e

Infosys Foundation Program

Relational Database Management System

When we deal with more number of columns using cursors, usage of record variables will make our life easier, as no separate variables need to be declared for every column. If we want to come out with yet another record variable which has a similar structure, then we can simply say the name of newly needed record variable followed by earlier creating record variable%TYPE. Thus as shown in the above cursor declaration the record structure of v_curvar and v_newrec is similar. SQL> DECLARE CURSOR cur_branch IS SELECT branchname, COUNT(*) as no_of_applicant_opted FROM applicant c, branch b WHERE c.optedbranch = b.branchid GROUP BY branchname ; v_curvar cur_branch%ROWTYPE; BEGIN OPEN cur_branch; DBMS_OUTPUT.PUT_LINE('Branch Name No of Application Opted'); LOOP FETCH cur_branch INTO v_curvar; EXIT WHEN cur_branch %NOTFOUND; DBMS_OUTPUT.PUT(v_curvar.branchname||' '); DBMS_OUTPUT.PUT(v_curvar.no_of_applicant_opted); DBMS_OUTPUT.NEW_LINE; END LOOP; CLOSE cur_branch; END; /

9.11. Navigating cursors with WHILE LOOP The below example demonstrates how to deal with explicit cursor and WHILE construct. This construct would allow us to execute a set of statements repeatedly, when we specify a condition that evaluates to TRUE. Hence we need to identify an explicit cursor attribute which would be TRUE as long as we are able to fetch records for an active set. As we discussed earlier, cursorname%FOUND is an explicit cursor attribute that evaluates to TRUE as long as we are able to FETCH records from an active set. But this explicit cursor is NULL once we open the cursor, and is initialized with TRUE value only after the first successful fetch. SQL> DECLARE CURSOR cur_branch IS SELECT branchname, COUNT(*) as no_of_applicant_opted

235 | P a g e

Infosys Foundation Program

Relational Database Management System

FROM applicant c, branch b WHERE c.optedbranch = b.branchid GROUP BY branchname ; v_curvar cur_branch%ROWTYPE; BEGIN OPEN cur_branch; FETCH cur_branch INTO v_curvar; DBMS_OUTPUT.PUT_LINE('Branch Name No of Application Opted'); WHILE cur_branch%FOUND LOOP DBMS_OUTPUT.PUT(v_curvar.branchname||' '); DBMS_OUTPUT.PUT(v_curvar.no_of_applicant_opted); DBMS_OUTPUT.NEW_LINE; FETCH cur_branch INTO v_curvar; END LOOP; CLOSE cur_branch; END; / Hence twice the fetch statement has to be written, one outside the WHILE construct and another inside the WHILE construct.

9.12. Cursor FOR LOOP Cursor FOR construct help us to process explicit cursor, and at the same time, it relieves the PL/SQL programmer from the burden of dealing with various cursor operations such as opening, fetching records, closing, exit condition checking. Meaning, these cursor operations are implicitly taken care by this cursor FOR construct, allowing the programmer to concentrate on the implementation of business logic. Below is the syntax for dealing with cursor FOR loop. The set of statements to be repeatedly executed are enclosed within LOOP.. END LOOP; recname is the name of record variable, which is implicitly declared for us while using cursor FOR construct. This recname need not be declared in the declaration section of the PL/SQL block. FOR recname IN cursorname LOOP .. .. END LOOP; Thus the lists of operations implicitly taken care by Cursor FOR loop are x

Implicit open, fetch, exit condition check, close

236 | P a g e

Infosys Foundation Program

Relational Database Management System

x

Implicit record variable declaration

DECLARE CURSOR cur_branch IS SELECT branchname, COUNT(*) as no_of_applicant_opted FROM applicant c, branch b WHERE c.optedbranch = b.branchid GROUP BY branchname ; BEGIN DBMS_OUTPUT.PUT_LINE('Branch Name No of Applicants Opted'); FOR v_curvar IN cur_branch LOOP DBMS_OUTPUT.PUT(v_curvar.branchname||' '); DBMS_OUTPUT.PUT(v_curvar.no_of_applicant_opted); DBMS_OUTPUT.NEW_LINE; END LOOP; END; / As we could see in the above code snippet, v_curvar is a record variable implicitly declared and cursor operations are implicitly taken care of by the cursor FOR loop construct.

9.13. Implicit cursor FOR LOOP Not only the record variable can be implicitly declared, but even the cursor declaration can be implicitly declared by placing the cursor definition statements in the FOR loop itself. The below code snippet demonstrates the same. Thus we do not know the name of private SQL work area which is set aside for the below cursor operation. SQL> BEGIN DBMS_OUTPUT.PUT_LINE('Branch Name No of Applicants Opted'); FOR v_curvar IN (SELECT branchname, COUNT(*) as no_of_applicant_opted FROM applicant c, branch b WHERE c.optedbranch = b.branchid GROUP BY branchname ) LOOP DBMS_OUTPUT.PUT(v_curvar.branchname||' '); DBMS_OUTPUT.PUT(v_curvar.no_of_applicant_opted); DBMS_OUTPUT.NEW_LINE; END LOOP; END; / Thus the query is placed within parenthesis in the FOR loop itself. Apart from this all other implicit cursor operations are also available when we use the above construct.

237 | P a g e

Infosys Foundation Program

Relational Database Management System

9.14. Cursor related predefined oracle server exceptions INVALID_CURSOR predefined exception and CURSOR_ALREADY_OPEN predefined exception are the two predefined exceptions which we are discussing in this section.

9.14.1.

INVALID_CURSOR exception

Two different situations in which INVALID_CURSOR predefined exception is thrown: x When we try fetching records from an unopened cursor x When we try to close a cursor which is already closed The below example demonstrates the first one. Cursor C1 identifies the departmentid corresponding to various branches, but without opening the cursor C1, we are trying to fetch records from it. SQL> DECLARE 2 CURSOR c1 IS SELECT departmentid FROM department 3 WHERE departmentid in (SELECT departmentid FROM branch)); 4 v_departmentid department.department%TYPE; 5 BEGIN 6 FETCH C1 INTO v_departmentid; 7 WHILE C1%FOUND 8 LOOP 9 DBMS_OUTPUT.PUT_LINE(v_departmentid); 10 FETCH C1 INTO v_departmentid; 11 END LOOP; 12 CLOSE C1; 13 COMMIT; 14 EXCEPTION 15 WHEN INVALID_CURSOR THEN 16 DBMS_OUTPUT.PUT_LINE('Invalid cursor exception thrown'); 17 END;

9.14.2.

CURSOR_ALREADY_OPEN exception

The below example demonstrates when CURSOR_ALREADY_OPEN exception is thrown. As we have learnt, we can open and close the cursor any number of times within PL/SQL blocks. But before reopening the cursor, we have to make sure that it is closed. Opening a cursor which is already opened, throws CURSOR_ALREADY_OPEN exception. SQL> DECLARE 2 CURSOR c1 IS SELECT departmentid FROM department 3 WHERE departmentid in (SELECT departmentid FROM branch));

238 | P a g e

Infosys Foundation Program

Relational Database Management System

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

v_departmentid department.department%TYPE; BEGIN OPEN C1; FETCH C1 INTO v_departmentid; WHILE C1%FOUND LOOP OPEN C1; DBMS_OUTPUT.PUT_LINE(v_departmentid); FETCH C1 INTO v_departmentid; END LOOP; CLOSE C1; COMMIT; EXCEPTION WHEN CURSOR_ALREADY_OPEN THEN DBMS_OUTPUT.PUT_LINE('Cursor already open exception thrown'); END;

9.15. Parameterized cursors We can pass one or more parameters or arguments to a parameterized cursor. This helps us to identify different active sets at runtime by passing different input values. While opening cursor we have to pass parameters, which could either hardcoded as shown in the example below or we can accept input values from the user. Parameters need to start with p_ as a mark of coding convention. CURSOR cursorname (parameter datatype) IS query; Every formal parameter mentioned in the cursor declaration, should have a corresponding actual parameter in the open statement. The datatype of the formal and actual parameters also should match. Do not specify the size while mentioning the formal parameter in the cursor declaration. For example, p_branchid is even though a NUMBER variable of size 3, we don’t mention it as NUMBER (3) in the cursor declaration, whereas we say only NUMBER, without special mention to the size. Merely passing the parameters along would not suffice, but use these parameters in the WHERE clause of the SQL query mentioned in the cursor declaration, as this is the one, which will help us in identifying different active sets based on different inputs. SQL> DECLARE 2 CURSOR c1(p_branchid NUMBER) IS SELECT branchid, seatsavailable 3 FROM branch where branchid = p_branchid; 4 v_branchid branch.branchid%TYPE;

239 | P a g e

Infosys Foundation Program

Relational Database Management System

5 6 7 8 9 10 11 12 13 14

v_seatsavailable NUMBER(3); BEGIN OPEN c1(1001); LOOP FETCH c1 INTO v_branchid, v_seatsavailable; EXIT WHEN c1%NOTFOUND; DBMS_OUTPUT.PUT_LINE(v_branchid||' '||v_seatsavailable); END LOOP; CLOSE c1; END;

9.16. Explicit cursor – FOR UPDATE CURSOR cursorname IS SELECT .. FROM .. FOR UPDATE [OF column_reference] [NOWAIT]; The syntax of SELECT statement in cursor declaration is one and the same as we have seen earlier but with an additional FOR UPDATE clause, which should be last clause even after ORDER BY (if any). When we plan for updation of records present in the active set, we can use FOR UPDATE clause in the cursor declaration of SELECT statement, which helps us to gain exclusive row lock on the set of records present in the active set. Thus the rows cannot be modified by other users who intend to operate on the same set of records. The user who has applied an exclusive row lock has to relinquish the lock by executing either COMMIT or ROLLBACK statement so that other users can modify the records present in the active set. If some other session has already acquired an exclusive row lock on one or more records already, then the current session has to wait for these locks to be released by the other session. This might lead to an indefinite wait. To avoid this, we can include a NOWAIT clause in the cursor declaration. Inclusion of NOWAIT in the cursor declaration checks whether records are not locked by anybody. If locked by somebody, it throws an oracle error and terminates the execution of PL/SQL block.

9.17. FOR UPDATE cursor declaration CURSOR cursorname IS SELECT ... FROM ... FOR UPDATE [OF column_reference] [WAIT n];

240 | P a g e

Infosys Foundation Program

Relational Database Management System

We can even instruct PL/SQL runtime engine to wait for n seconds, as shown in the above syntax. If the rows are not unlocked within n seconds, then it would return an oracle error. SQL> DECLARE CURSOR c1 IS SELECT empno, sal FROM emp FOR UPDATE OF sal; v_empno emp.empno%TYPE; v_sal emp.sal%TYPE; BEGIN FOR rec IN c1 LOOP UPDATE emp SET sal=sal + 100 WHERE empno=rec.empno; END LOOP; COMMIT; END; The above example demonstrates incrementing the salary of all employees by 100 in the EMP table. The list of columns present after FOR UPDATE clause specifies the column(s) to be updated. Do not try to update any derived column(s) or column(s) with aggregate functions in the SELECT query, as it is not possible. For example, in the below code snippet max (hoursremaining) is an aggregate function based column, meaning there is no column in the instructor table with name max (hourseremaining). SQL> DECLARE 2 CURSOR c1 IS SELECT instructorid, max(hoursremaining) 3 as maximumhours FROM instructor 4 GROUP BY instructorid FOR UPDATE OF maximumhours; 5 BEGIN 6 FOR rec IN c1 7 LOOP 8 DBMS_OUTPUT.PUT_LINE(rec.instructorid ||' '||rec.maximumhours); 9 END LOOP; 10 END; ERROR at line 6: ORA-06550: line 2, column 76: PL/SQL: ORA-01786: FOR UPDATE of this query expression is not allowed ORA-06550: line 2, column 15: PL/SQL: SQL Statement ignored

241 | P a g e

Infosys Foundation Program

Relational Database Management System

9.18. WHERE CURRENT OF clause WHERE CURRENT OF cursorname; Whenever we make use of FOR UPDATE in the cursor declaration, we are allowed to make use of WHERE CURRENT OF clause with the UPDATE statement. This clause can be used just to say that the updation has to happen at the current row pointed by the explicit cursor. In other words, the updation should be applied only to the row which we have just fetched (recently). The below code snippet demonstrates the usage of WHERE CURRENT OF clause. SQL> DECLARE CURSOR c1 IS SELECT empno, sal FROM emp FOR UPDATE OF sal; v_empno emp.empno%TYPE; v_sal emp.sal%TYPE; BEGIN FOR rec IN c1 LOOP UPDATE emp SET sal=sal + 100 WHERE CURRENT OF c1; END LOOP; COMMIT; END; As we have empno column which distinguishes one row from the other in the active set, we can use empno to uniquely identify a row, in the WHERE condition of UPDATE statement, instead of WHERE CURRENT OF cursorname, which was our earlier implementation. Thus without using WHERE CURRENT OF clause, we can still implement SELECT with FOR UPDATE. Updates are allowed on columns which are not mentioned in the FOR UPDATE clause, but this is not a good programming practice.

10.

Transaction processing in PL/SQL

Transaction processing available allows multiple users to work on the database concurrently. At the same time it also ensures that each user sees a consistent version of data and that all the changes are applied in the right order. There is no need to write extra code to prevent problems with multiple users accessing data concurrently. Oracle uses locks to control concurrent access to data and locks only the minimal amount of data necessary, for the least possible time.

10.1. Using COMMIT statement in PL/SQL

242 | P a g e

Infosys Foundation Program

Relational Database Management System

COMMIT statement in PL/SQL marks the end of current transaction. This statement can be used both in the executable section and exception section. It helps us to save changes made during that transaction permanent and is visible to all users. Transactions are not tied to PL/SQL BEGIN ... END blocks. There can be more than one transaction implemented in the same PL/SQL block. There might be a situation where in not even one transaction could have been implemented completely within a PL/SQL block. A block can contain multiple transactions and a transaction can span multiple blocks. SQL> BEGIN UPDATE emp SET sal=sal + 100 WHERE empno=7935; END; With reference to the above PL/SQL block, there could have been some other DML statements which might have been executed before this UPDATE statement. Hence, there is no assurance that this is the first DML statement which modifies the database, hence may or may not be the beginning of transaction. Moreover, there is no commit statement present in the PL/SQL block and hence this is not an end of the transaction. SQL> DECLARE --assume declaration of appropriate variables and exceptions BEGIN COMMIT; --Generation of bill and insertion of record to billing table INSERT INTO billing VALUES(1002, 2345610001, 'C2', 09','creditcard');

62,'21-Mar-

COMMIT;

--updation of stock in the item table UPDATE item set qtyonhand=qtyonhand-1 WHERE itemid='STN001'; UPDATE item set qtyonhand=qtyonhand-1 WHERE itemid='BAK001'; COMMIT; EXCEPTION --assume appropriate exceptions are handled END;

243 | P a g e

Infosys Foundation Program

Relational Database Management System

With reference to the above PL/SQL block, the first commit statement ends the earlier transaction. A DML statement for generation of bill starts the transaction. The commit statement beneath that ends the generation of bill transaction. Updation of stock in the item table is another transaction which ends with a commit statement. Thus a PL/SQL block can contain multiple transactions.

10.2. Using ROLLBACK statement in PL/SQL ROLLBACK statement present in a PL/SQL block ends the current transaction. This statement helps in undoing any changes made during that transaction. Thus making mistakes such as deleting a wrong row can be restored with the help of this statement. This statement can be used both in the executable section and exception section. SQL>DECLARE --assume that the itemid is unique v_itemid item.itemid%TYPE:='STN003'; --assume that an itemname called Pen already exists in the ITEM table v_itemname item.itemname%TYPE:='Pen'; v_itemrec item%ROWTYPE; BEGIN UPDATE item SET qtyonhand=qtyonhand + 50 WHERE itemid='STN001'; INSERT INTO item(itemid, itemname) VALUES (v_itemid, v_itemname); SELECT * INTO v_itemrec FROM item WHERE itemname=v_itemname; DBMS_OUTPUT.PUT_LINE('Item name is unique'); EXCEPTION WHEN TOO_MANY_ROWS THEN DBMS_OUTPUT.PUT_LINE('Item name duplicated'); ROLLBACK; END; As shown in the above PL/SQL block, if we try to duplicate item records, by inserting a new item with an existing item name, TOO_MANY_ROWS exception is thrown. Thus “Item name duplicated” message is printed on the screen after which the database is rollbacked immediately undoing the above insertion and updation.

10.3. Using SAVEPOINT in PL/SQL SAVEPOINT statement lets us to rollback part of a transaction instead of the whole transaction. These are similar to the bookmarks that we create while reading a book, where at any point of time, we can revert to a particular location, for later reference.

244 | P a g e

Infosys Foundation Program

Relational Database Management System

SAVEPOINT names and marks the current point in the processing of a transaction. Hence when we want to rollback to specific point, we can do so by using savepoint name with the ROLLBACK statement. SQL> INSERT INTO emp VALUES( 1004, 6000); 1 row created. SQL> SAVEPOINT S1; Savepoint created. SQL> UPDATE emp SET sal=1000 WHERE empno=1002; 1 row updated. SQL> SAVEPOINT S2; Savepoint created. SQL> DELETE FROM emp WHERE empno=1003; 1 row deleted. SQL> SAVEPOINT S3; Savepoint created. For example, we have done insertion, updation and deletion on EMP table and savepoint created after every DML operation. Savepoint S1 created after inserting an employee record, S2 created after updating the sal of an employee record and S3 created after deleting an employee record. Now if we simply say ROLLBACK in the SQL prompt all the changes made in EMP table would be restored. Instead if we want to retain INSERTION and UPDATION happened earlier, and discarding the DELETION alone then we need to say ROLLBACK to S2; in the SQL prompt. In this case, as we have rolled back to S2, whatever the savepoint(s) which we have after S2 will be cleared. Thus S3 will be cleared. The same behavior is exhibited within PL/SQL block also. Instead if we want to retain INSERTION alone, discarding the UPDATION and DELETION then we need to say ROLLBACK to S1. In this case, as we have rolled back to S1, whatever the

245 | P a g e

Infosys Foundation Program

Relational Database Management System

savepoint(s) which we have after S1 will be cleared. Thus both S2 and S3 will be cleared. The same behavior is exhibited within PL/SQL block also. Another important thing is that these savepoint names are undeclared identifiers in a PL/SQL block. This means that there is no need to do any separate declaration for variables which are used as savepoint names. The number of save points for each session is also unlimited. These savepoints are alive only for the current session in which it is created.

10.4. Concurrency control A simple way to think of oracle read consistency is •

readers do not wait for writers ( or other readers of the same data)

John the writer does some update operation on a record, while at the same time Jack who is reading the record sees the consistent version of the data. Even though John has updated the supplier name to XYZ, Jack cannot see this updation, as John has not committed. At the same time Jack need not wait until John completes updation. This proves that readers do not wait for writers or other readers of the same data.

246 | P a g e

Infosys Foundation Program

Relational Database Management System



writers do not wait for readers (of the same data)

John the writer here does not wait until the reader Jack completes reading of a record. Simultaneously while John is writing some record, Jack can still read the same record, but the consistent, committed version of the data alone would be given to him. This proves that writers do not wait for readers of the same data



247 | P a g e

Writers only wait for other writers if they attempt to update identical rows in concurrent transactions

Infosys Foundation Program

Relational Database Management System

Two writers cannot plan for an updation of the same record at the same time. Individual who gains exclusive access to the record first alone, would be allowed to do modification. Others have to wait. As shown in the above screenshot, John gains exclusive access to the record first and does an updation of supplier name. Jack also tries to update the same record later, but as John has gained exclusive access Jack has to wait, until John releases the lock on the record. This proves that writers only wait for other writers of the same data.

11.

On Line Analytical Processing (OLAP)

Data is the one of the most valuable assets of any organization or enterprise. Operational activities of an organization include day-to-day business processes necessary to run it. Systems that support such processes are called the On Line Transaction Processing (OLTP) systems. Operational data are highly structured data that is continuously generated and stored in what is typically called as operational or transactional or OLTP databases. An organization’s success also depends on its ability to analyze data and to make intelligent decisions that would potentially affect its future. Systems that facilitate such analysis are called On Line Analytical Processing (OLAP) systems.

248 | P a g e

Infosys Foundation Program

Relational Database Management System

An OLTP application rarely requires historical data. An OLAP application requires historical data because an analysis is generally based on a substantial amount of historical data to enable trend analysis and future predictions. An OLTP transaction is characterized by several users creating, updating or retrieving individual records whereas OLAP application is characterized by higher level views of the data. Thus the focus of OLTP and OLAP are fundamentally different. The following section gives the difference between OLTP and OLAP.

11.1. Difference between OLTP and OLAP

Definition Data

Data Atomicity

Normalization

History

Queries

Updates Response time

249 | P a g e

OLTP OLAP On Line Transaction On Line Analytical Processing Processing Dynamic (day to day Static (historical data) transaction / operational data) Data is stored at microscopic Data is aggregated or summarized and stored at the level higher level Normalized Databases to De-normalized Databases to facilitate insertion, deletion facilitate queries and analysis and updation Old data is purged or archived Historical data stored to enable trend analysis and future predictions Simple queries and updates Complex queries Queries use small amounts of Queries use large amounts of data data ( one record or a few records) Example: Example: update account balance Total annual sales for north enroll for a course region Total monthly sales for north region Updates are frequent Updates are infrequent Fast response time is Transactions are slow Queries consume a lot of important Data must be up-to-date, bandwidth

Infosys Foundation Program

Relational Database Management System

Joins in queries

Data models Focus

consistent at all times Joins are more and complex as tables are normalized An OLTP system aims at one specific process Example: ordering from an online store Complex data models, many tables OLTP focuses on performance

Joins are few and simple as tables are de-normalized An OLAP integrates data from different processes sales, Example: Combines inventory and purchasing data Simple data models, fewer tables OLAP focuses on flexibility and broader scope

A practical solution to enable analytical processes is to implement a data warehouse.

11.2. Data Warehouse A data warehouse is a repository which stores integrated information for efficient querying and analysis. Data warehouse has data collected from multiple, disparate sources of an organization. It is the basis for decision support and data analysis systems.

11.2.1. x x x

Why data warehouse is needed?

Analysis requires millions of records of data which are historical in nature Data is collected from heterogeneous sources (e.g. RDBMS, flat files, etc.) Need to make quick and effective strategic decisions

In essence, it is a copy of the organization’s operational data adequately modified to support the needs of analytical processes and stored outside the operational database.

11.2.2.

Characteristics of Data Warehouse:

According to Bill Inmon, known as the father of Data Warehousing, a data warehouse is a subject oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions. x

Subject-oriented: means that all data pertinent to a subject/ business area are collected and stored as a single unit

x

Integrated: means that data from multiple disparate sources are transformed and stored in a globally accepted fashion

x

Static/non-volatile: means data once entered into the warehouse does not change. It is periodically added if required

250 | P a g e

Infosys Foundation Program

Relational Database Management System

x

Time variant: Data warehouse maintains historical data which are used to analyze the business or market trends and facilitate future predictions Data Mining

MOLAP

Operational

Data Warehouse

databases

Reporting

ETL Operational

Output

Process

databases

ROLAP

Flat

Data Marts

Analysis

Data Marts

Files

Data Sources

Data Warehouse Server

OLAP Servers

Presentation Tier

Figure 11-1: Data warehouse architecture

11.2.3.

Data Warehousing Terminology

Data sources: An organization has many functional units with their own data. Data from all such sources have to be consolidated and put into a consistent form that would reflect the business of an organization as a whole. These sources of data for a data warehouse are known as data sources or operational data sources. Metadata: Metadata is the data about the data. Metadata is the layer of the data warehouse, which stores the information like the source data, transformed data, date and time of data extraction, target databases, date and time of data loading, etc. Measure attributes: A numerical value that can be summarized or can be aggregated upon. Example: Consider an inventory application. Assume that the inventory store sells twenty products in one day, each for 5 dollars. Thus it generates 100 dollars in total sales for the day. Therefore, sales dollars is one measure. The store owner might want to get the 251 | P a g e

Infosys Foundation Program

Relational Database Management System

information about the number of customers they had that day. Did 5 customers buy 4 products each, or did one customer buy twenty products. Thus, customer count is another measure. Dimension attributes: Dimensions can be defined as the perspectives used for looking at the data. “How you want your data to be seen?” this answers your question about what is a Dimension? Some examples of dimensions are: Product Time Location Customer Age Customer Income There is almost always a time dimension on anything which is being analyzed. Considering the example given for measure attributes, sales of a product can be analyzed by day, or by month or by quarter, or by half year, or by year. Sales can also be analyzed by category or by product. The time, product, geographic dimensions are very common. Data that can be modeled as dimension attributes and measure attributes are called multidimensional data.

252 | P a g e

Infosys Foundation Program

Relational Database Management System

11.2.4.

Data Collection for Data Warehouse Applications

Extraction, transformation and loading (ETL): This is the most important step in Data Warehousing. Definition of ETL: The processes such as Extract, Transform and Load are described as the process of selecting, migrating, transforming, cleansing and converting mapped data from the operational environment to data warehouse environment. Data needs to be taken from various disparate sources to the data preparation area. This process is known as data extraction. This data preparation area, also known as data staging area consists of relational tables. Data from various heterogeneous sources are altered into a uniform format and put into relational tables of data preparation area which can be readily loaded into the data warehouse database. This process is known as loading. The data is loaded into the fact table and dimension tables in the data warehouse database. Refer to Figure 11-2. Data is periodically extracted

Data is cleansed and transformed

Data Staging Area

User query the data warehouse

Data Warehouse

Source Systems Figure 11-2: Extraction, Transformation and Loading Process

11.2.5.

Storing of data in Data warehouse

Dimensional Modeling: The dimensional modeling is also known as star schema because in dimensional modeling there is a large central fact table with many dimension tables surrounding it. Fact Tables: Each data warehouse or data mart includes one or more fact tables. A fact table is the central table of a star or snowflake schema. This central table captures the data that measures the organization's business operations. Fact tables usually contain large numbers of rows.

253 | P a g e

Infosys Foundation Program

Relational Database Management System

One of the main features of the fact table is that, it has numerical data or facts, which can be summarized to give the information about the operational history of the organization. The fact tables also contain a multipart index which is nothing but a foreign key to the primary key of a related dimension table. The dimension tables contain the attributes of the fact records. The fact tables should not contain the attributes, which hold descriptive information. Dimension Tables: The attributes in these tables describe the fact records in the fact table. It contains attributes which summarize the useful information required by the analyst. Dimension table even contains attributes providing descriptive information. Some attributes have hierarchies for example a dimension containing information about product may contain a hierarchy that separates products into categories, with each of these categories further subdivided into manufacturer. Cube: The OLAP tools allows you to turn data stored in relational databases into meaningful, easy to navigate business information by creating data cube. The dimensions of a cube represent distinct categories for analyzing business data. Categories such as time, geography or product line breakdowns are typical cube dimensions. Dimension hierarchies: Refer to Figure 11-3. The product dimension contains individual products. Products are further divided into categories, and further divided according to manufacturer. The dimension table stores the hierarchy for the dimension.

Day

Products_ Manufacturer

Month

Products_ Category

Quarter

Products

Year

Figure 11-3: Dimension Hierarchies

Available Schemas for dimensional modeling: Star schema Snowflake Schema

254 | P a g e

Infosys Foundation Program

Relational Database Management System

Star Schema: It is the simplest data warehouse schema. It resembles a star. The center of the star consists of at least one or more fact tables and the points radiating from the center are the dimension tables. Refer to Figure 11-4.

Star Schema Dimension Table

Dimension Table

Fact Table

Dimension Table

Dimension Table Figure 11-4: Star Schema

Snowflake Schema: It is a complex data warehouse schema. The snowflake schema consists of a single, central fact table, which is surrounded by dimension hierarchies which are normalized. Each level of the dimension is represented in a table. Refer to Figure 11-5. Products_ Manufacturer

Products_ Category Dimension Table

Products

Fact Table E.g. Sales Dimension Table

Customers

Countries

Cities

Figure 11-5: Snowflake Schema

Disadvantages of Snowflake Schema: x It increase the number of dimension tables x It requires more foreign key joins

255 | P a g e

Infosys Foundation Program

Relational Database Management System

11.2.6.

Reporting of a Data warehouse application

A data mart is a subset of a data warehouse which focuses on a single area of data and it is organized for quick analysis. It can be a small data warehouse itself. 11.2.6.1. x x x

x

x

x x

Advantages of Data Marts:

It focuses on presentation rather than the organization of data It facilitates data reporting It provides meaningful reports to the users pertaining to their business area thereby allowing them to view and concentrate only on the data that is related to their business area Example: providing sales data to the sales department, providing financial data to the financial department It makes the data design simpler and easier. It breaks the whole design into several smaller sub units which is beneficial to the customers and the team that is involved in development. It is also easier to maintain. Reporting of data becomes faster and more efficient because reporting is generally done at the sub unit level and data marts assist in faster retrieval compared to querying the entire data warehouse It helps in incrementally building up the enterprise data warehouse It helps to ensure security

Data Warehouse

Data Mart1

Data Mart2

Data Mart3

End User 1

End User 2

End User 3

Figure 11-6: Each end user works with a focused subset of Data Warehouse called Data Mart

256 | P a g e

Infosys Foundation Program

Relational Database Management System

Several data marts can be built, each for a particular business area provided they all conform to the data warehouse architecture from where they get the data for reporting. Data marts can be used in conjunction with each other. Refer to Figure 11-6.

257 | P a g e

Infosys Foundation Program

Relational Database Management System

11.2.7.

Difference between Data Warehouse and Data Mart

Data Warehouse A data warehouse is a repository which stores integrated information from multiple disparate sources for efficient querying and analysis

Data Mart A data mart is a subset of a data warehouse which focuses on a single area of data and it is organized for quick analysis.

It mainly focuses on the organization of It focuses mainly on the presentation of data and offers little focus about the data to the customers rather than the way presentation of data. in which the data is organized in the data warehouse There is usually a central data warehouse There can be several data marts that system operate on the central data warehouse Data Warehouse is used on an enterprise Data Mart is used on a business division / level department level Data Warehouse contains data from Data Mart only contains the required heterogeneous sources for analysis subject specific data for local analysis

11.2.8.

Popular tools available for data warehousing

Reporting / Analysis Tools: x Micro Strategy: DSS Agent / Server x Cognos: Improptu x Brio Technology: Brio Query x Seagate Software: Crystal Reports x MS-SQL Server 2005 SQL Server Reporting Service (SSRS) ETL: x x x x

Oracle Warehouse Builder Informatica: Power Center Acta: ActaWorks MS-SQL Server 2005 SQL Server Integration Service (SSIS)

Databases: x MDDB o Oracle o MS-SQL Server 2005 SQL Server Application Service (SSAS)

258 | P a g e

Infosys Foundation Program

Relational Database Management System

11.3. Summary x x x

x x

x

An OLAP application requires historical data because an analysis is generally based on a substantial amount of historical data to enable trend analysis and future predictions A data warehouse is a repository which stores integrated information for efficient querying and analysis Extract, transform and load process (ETL) is described as the process of selecting, migrating, transforming, cleansing and converting mapped data from the operational environment to data warehouse environment A data mart is a subset of a data warehouse which focuses on a single area of data and it is organized for quick analysis. Star schema is the simplest data warehouse schema. It resembles a star. The center of the star consists of at least one or more than one fact tables and the points radiating from the center are the dimension tables. The snowflake schema consists of a single, central fact table, which is surrounded by dimension hierarchies which are normalized.

259 | P a g e

Infosys Foundation Program

Relational Database Management System

Appendix-A Boyce Codd Normal Form (BCNF) A relation is said to be in Boyce Codd Normal Form (BCNF) if and only if all the determinants are candidate keys. BCNF relation is a strong 3NF, but not every 3NF relation is BCNF. Let us understand this concept using slightly different Result table structure. Student# 101 102 101 103 104 102 105 103 105 104

EmailID

[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

Course#

Marks

M4

82

M4

62

H6

79

C3

65

B3

77

P3

68

P3

89

B4

54

H6

87

M4

65

RESULT Table

Student#

Course#

EmailID

Overlapping Candidate Key

In the RESULT table, we have two candidate keys namely Student# Course# and Course# EmaiIId. Course# is overlapping among those candidate keys. Hence these candidate keys are called as “overlapping candidate keys” as shown above.

260 | P a g e

Infosys Foundation Program

Appendix

The non-key attribute, Marks is non-transitively and fully functionally dependant on key attributes. Hence this is in 3NF. But this is not in BCNF because there are four determinants in this relation namely: x Student# (Student# decides EmailiD) x EMailID (EmailID decides Student#) x Student# Course# (decides rest of the attributes in RESULT table) x Course# EMailID (decides rest of the attributes in RESULT table) All above determinants are not candidate keys. EMailID decides Student# but EMailID on its own is not a candidate key. Similarly Student# decides EMailID of a student but Student# alone is not a candidate key. Only combination of Student# Course# and Course# EMailID are candidate keys. To make this table BCNF, we need to split this table into the following structure:

Student#

Course#

Marks

STUDENT TABLE

Student#

EmailID 101 102 103 104 105

[email protected] [email protected] [email protected] [email protected] [email protected]

101

M4

82

102

M4

62

101

H6

79

103

C3

65

104

B3

77

102

P3

68

105

P3

89

103

B4

54

105

H6

87

104

M4

65

Boyce Codd Normal Form

Now both the tables are not only in 3NF, but also in BCNF because all the determinants are candidate keys. In the first table, Student# decides EMailID and EMailID decides Student# and both are candidate keys. In second table, Student# Course# is only determinant and candidate key. Hence it qualifies BCNF definition that every determinant must be a candidate key. Note: If the table has only one non-composite candidate key and if it is in 3NF, then the table will also be in BCNF.

Basically 2NF and 3NF takes away the redundancy, anomalies which exist among the key and non-key attributes on other hand BCNF takes away the redundancy, anomalies which exist

261 | P a g e

Infosys Foundation Program

Appendix

among the key attributes. At Infosys, we rarely (around 1% of database design) normalize the databases to BCNF.

Embedded SQL Purpose To blend SQL language statements directly into a program written in a host programming languages, such as C, Pascal, COBOL, FORTRAN and PL/I, use embedded SQL statements. The following techniques are used to embed the SQL statements: x

x

x

x

SQL statements are intermixed with statements of the host language in the source program. This embedded SQL source program is submitted to a SQL pre-complier, which processes the SQL statements Variables of the host programming language can be referenced in the embedded SQL statements, allowing values calculated by the program to be used by the SQL statements Program language variables are also used by the embedded SQL statements to receive the results of SQL queries, allowing the program to use and process the retrieved values Special program variables are used to assign NULL values to database columns and to support the retrieval of NULL values from the database

Why Embedded SQL? SQL has the following limitations: x x x x x

No provision to declare variables No unconditional branching/jump statement No IF statement to test conditions No FOR, DO or WHILE statements to construct loops No block structure

In order to understand the embedded SQL program, one has to be familiar with the following terminologies: EXEC SQL: Every embedded SQL statement begins with an introducer that flags it as a SQL statement. The IBM SQL products use the introducer exec sql for most host languages.

262 | P a g e

Infosys Foundation Program

Appendix

SQLCA: The sqlca (SQL Communication Area) is a data structure that contains error variables and status indicators. By examining the SQLCA, the application program can determine the success or failure of its embedded SQL statements. exec sql include sqlca; This statement tells the SQL pre-complier to include a SQL Communications Area in the program. As RDBMS executes each embedded SQL statement, it sets the value of the variable sqlcode in the SQLCA to indicate the completion status of the statement. x A sqlcode of zero indicates successful completion of the statement x A negative sqlcode indicates a serious error that prevented the statement from executing correctly x A positive sqlcode indicates a warning condition. The most common warning with a value of 100, is the out of data warning returned when a program tries to retrieve the next row of query results and no more rows are left to retrieve. Host variables: A host variable is a program variable. It is declared using the data types of the programming language such as “C” and manipulated by programming language statements. A host variable is also used in embedded SQL statements to store/retrieve data to/from the database. To identify the host variable, the variable is prefixed by a colon (:) when it appears in an embedded SQL statement. A host variable can appear in an embedded SQL statement wherever a constant can appear. The two embedded SQL statements begin declare section and end declare section bracket the host variable declarations and are non-executable. Use of host variables to store data into the database: x The input provided by the user using the standard input device is stored in the host variables x Values of the host variable is then written to the database using the INSERT SQL statement Use of host variables to retrieve data from the database: x The data values retrieved from the database using the SELECT SQL statement are held in the host variables x The contents of the host variables are then displayed on the standard output device using functions such as printf() in “C” Indicator variables: To store NULL values in the database or retrieve NULL values from the database, embedded SQL allows each host variable to have a companion host indicator 263 | P a g e

Infosys Foundation Program

Appendix

variable. In an embedded SQL statement, the host variable and the indicator variable together specify a single SQL-style value, as follows: x An indicator value of zero indicates that the host variable contains a valid value x A negative indicator value indicates that the host variable should be assumed to have a NULL value; the actual value of the host variable is irrelevant and should be disregarded x A positive indicator value indicates that the host variable contains a valid value, which may have been rounded off or truncated A host variable is immediately followed by the name of the corresponding indicator variable. Both variable names are preceded by a colon. Example: A simple embedded SQL program written in C. Problem statement: This program asks the customer for his Cust_ID, retrieves his record from the Customer_Details table and displays it on the standard output device. int main(int argc, char* argv) { /* inclusion of the SQL Communication Area in the program */ exec sql include sqlca; /* declaration of the HOST VARIABLES */ exec sql begin declare section char Mem_Cust_ID[5]; char Mem_Cust_Last_Name[25]; char Mem_Account_No[5]; char Mem_Bank_Branch[25]; char Mem_Cust_Email[30]; short iBank_Branch; exec sql end declare section; /* Prompt the user for Customer ID printf(“Enter Customer ID:”); scanf(“%s”,Mem_Cust_ID);

*/

/* execute the SQL query */ /* HOST VARIABLES are preceded by a colon (:) e.g.:Mem_Cust_ID */ /* HOST Variable followed by a companion */ /* host indicator variable */ /* e.g. :Mem_Bank_Branch :iBank_Branch */ exec sql SELECT Cust_ID, Cust_Last_Name, Account_No, Bank_Branch, Cust_Email FROM Customer_Details WHERE Cust_ID =:Mem_Cust_ID INTO :Mem_Cust_ID, :Mem_Cust_Last_Name, :Mem_Account_No,

264 | P a g e

Infosys Foundation Program

Appendix

:Mem_Bank_Branch :iBank_Branch, :Mem_Cust_Email; /* Display the retrieved data */ /* sqlca.sqlcode contains status information of the embedded SQL statement executed */ if (sqlca.sqlcode = = 0) { printf(“Customer ID: %s\n”, Mem_Cust_ID); printf(“Customer Name: %s\n”, Mem_Cust_Last_Name); printf(“Account No.: %s\n”, Mem_Account_No); /* checking the value of the INDICATOR VARIABLE */ if (iBank_Branch < 0) { printf(“Bank Branch is NULL\n”); } else { printf(“Bank Branch: %s\n”, Mem_Bank_Branch); } printf(“Customer Email: %s\n”, Mem_Cust_Email); } else if (sqlca.sqlcode = = 100) { printf(“No customer with that Customer ID.\n”); } else { printf(“SQL error: %ld\n”, sqlca.sqlcode); } /* returns success code to the operating system */ return 0; }

Timestamping Another concurrency management technique is Timestamping. Every resource in database will be associated with last successful read and last successful write timestamp (time of occurrence up to milliseconds Ex: 12th December 2004 11:22:33.345). Let us consider: x x

RDBMS author by name Hanu is modifying this course material as one transaction Trainees reading this course material as another transaction

If other RDBMS author, say Seema, is also modifying the same course material at the same time, it leads to Lost-update and Phantom record conditions. If Hanu starts modifying while trainees are studying this material, it leads to dirty read or incorrect summary problems.

265 | P a g e

Infosys Foundation Program

Appendix

To avoid these problems we can follow these two rules: Hanu can start the course modification transaction only if: x Course material is successfully modified before starting this transaction x No trainees are currently reading this course material Similarly trainees can start reading course material transaction only if: x It was successfully updated before they start reading it Let us consider an example of database DB_BANK_DETAILS as discussed earlier. In table ACC_DETAILS, a particular row is read successfully by transaction BalanceEnquiry at 12:11:45.345 of 12th December 1945. This will be the last read timestamp of this row. If any other transaction reads this row after this time, that particular time will be the last read timestamp of the row. Similarly every row will have last updated timestamp. If transaction BalanceUpdate updates the row R1 at 13:32:22.345 of 12th December 1945, this will be recorded as last updated timestamp of the row R1. A transaction can read only rows or columns that have been updated by an older transaction if not, transaction is rolled back. Let us assume that Row R7 of the table ACC_DETAILS was successfully updated at 09:24:22.46 Hrs of 15th August 1947 by some transaction. Any transactions started after 09:24:22.46 Hrs of 15th August 1947 can read this row. Transactions started before 09:24:22.46 Hrs of 15th August 1947 need to be rolled back and start afresh to read this data. In general for read, the condition can be defined as TS > TU where TS is the start time of transaction and TU is the last successful update timestamp of the resource. A transaction can update only rows or columns that have been read and updated by an older transaction else this transaction is rolled back. Similarly any transaction can update row R7 only if it is started after the last successful update and the last successful read. Assume a transaction started at 10:24:23.49 Hrs of 15th August 1947 and wishes to update row R7 at 10:29:11.34 Hrs of 15th August 1947.

266 | P a g e

Infosys Foundation Program

Appendix

It is possible to update this row only if the row R7 was successfully updated and read before 10:24:23.49 Hrs of 15th August 1947. Any transaction started after 10:24:23.49 Hrs of 15th August 1947 cannot change the value of this row. Generic rule for updating data is TS > TU and TS > TR. Where TS is transaction start timestamp, TU is the last successful updated timestamp and TR is the last successful read time. The biggest advantage of timestamping is it leads to no dead lock condition as no resources are locked. Timestamping technique leads to large number of rollbacks. Due to this reason timestamping technique is not implemented as the concurrency control mechanism in most of the commercial RDBMS applications. Note: Almost all the commercial RDBMS packages use a locking technique as the concurrency controlling mechanism while maintaining the consistency in the system.

267 | P a g e

Infosys Foundation Program

Glossary

Glossary Abstract: Conceptual/theoretical object. Abstraction: A simplified representation of something that is potentially quite complex. It is often not necessary to know the exact details of how something works, is represented or is implemented, because it can still be used in its simplified form. Ambiguity: Uncertainty. Anomalies: Irregularities. Anomaly: A departure from the expected; an abnormality. Atomic: The smallest levels to which a data can be broken down and still remains meaningful. Attribute: The literal meaning is quality; characteristic; trait or feature. Entities get their meaning in a database with the help of a set of attributes. Consider for e.g., in the bank system, Cust_ID, Cust_Email etc. describe Customer-Detail entity set. Backup: A second copy of a file or a set of files to be used if the primary or the main file(s) are destroyed or corrupted. Backups are essential for every data but it is one of the most trivial work. For critical work, two backup sets are recommended. Business rules: The rules or the policies which govern the functioning of the application. Business users: The users who owns the application. Cardinality of a relation: It is the number of rows or tuples in a table. Centralized: Systems where the flow of data or the beginning of activities, decision making are initiated at the same central point and spread to other remote points in the organization Conceptual: To generalize abstract ideas from specific instances. Concurrent Access: Performing two (or more) operations on the same piece of data at the same time. Constraints: restriction, limitation. Data manipulation: Data manipulation refers to the insertion of new data, modification of existing data, etc. Data Redundancy: The same data is stored in more than one place in a database. Decomposable: Further split or reduce. Degree of a relation: It is the number of attributes or columns in a table. Distinct: Not identical.

268 | P a g e

Infosys Foundation Program

Glossary

Distributed: We say that a computer system is distributed when many different types of components and objects related to an application can be situated on different computers, which are connected to a network. Encryption: The process of manipulating the data in such a way that it should not be interpreted by all but should be interpreted by the intended users. End User: The person for whom a system is being developed. Example: a bank teller or a bank manager is an end user of a bank system. Entity: An entity is a “thing” or “object” in the real world that is distinguishable from other objects. Example: employee is an entity, and book can be considered to be another entity. Flat files: File containing records that has no structured interrelationship. Files used in programming fundamentals (PF) projects were essentially flat files. Fourth Generation Language (4GL): A 4GL is typically non-procedural and designed so that end users can specify what they want without having to know how the computer will process their requirement. Grant Privilege: To assign a privilege to a user or to a group. Heterogeneous: diverse, mixed, varied. Heterogeneous Network: A network that consists of network interface cards, servers, workstations, operating systems, and applications from many vendors, all these working together as a single unit. The network usually uses different media and different types of protocols on different network links. Homogeneous: All the same, uniform, harmonized. Homogeneous Network: A network composed of systems of similar architecture and runs a single network layer protocol. Inconsistency: lacking uniformity or agreement. Instance: Occurrence. Integrated: United into a larger unit. Something, which is brought together in order to form a working whole in a satisfactory manner. Integrity Constraints: It is a set of rules to ensure the correctness and accuracy of data. Interrelated: interconnected Intuitive: Natural. Iterative: Process of repeating the same task. Jargons: It is a specialized language or a technical language of a profession or a trade. Main Memory: This concept is discussed in OS course - All the read and write operations happen in main memory before they are written into hard disks. Model: A representation or a scaled down structure of an object.

269 | P a g e

Infosys Foundation Program

Glossary

Page: It is part of a table. Usually in one page multiple rows are stored. Participating entities: The entities which are joined by the relation. Queries: A request that a user makes on the database. Recovery: Restoration, return to an original state. Requirement specification: A document which contains requirement for a specific application. Revoke Privilege: Cancel, withdraw. Schema: A description of a database. It specifies (among other things) the relations or Tables, their attributes or columns, and the domains of the attributes. Semantic: Meaning. Shared: It is a type of database access, which allows multiple users to log on to the database at the same time. Simulate: To make a model. Site: Geographical location. Software application designer: The person who designs software applications. SQL: (Structured Query Language). It is a language, which is used by relational databases to request or to query, or to update and manage data. Static: Something which does not change. (Example: the typical web page is static in which the content of the webpage does not change until the owner of the web page or the web master physically alters the document.) Superset: Given two sets, X and Y, we say X is a superset of Y if all the elements of Y are also elements of X. Every set is a superset of itself. Every set is a superset of the empty set. Table: It is a two dimensional space having columns and rows. A table contains a specified number of attributes or columns but can have any number of records or rows. Tablespace: The logical part of the database which represents collection of the structures like tables, etc created by various users. Tangible: Physical object. Transaction: It is a set of processing steps, which are considered as a single activity or unit of work to achieve a desired result. In DBMS, collection of processing steps that form a single logical unit of work is called a transaction. A database system ensures proper execution of transactions despite failures – either the entire transaction executes, or none of it does. Transient: Temporary, transitory, momentary. Transitive: In-direct.

270 | P a g e

Infosys Foundation Program

Glossary

Tuple: This is a mathematical term for a finite sequence of n terms. E.g., the set {1, 2, 3, 4} is a fourtuple. A tuple is equivalent of a record. In RDBMS, a table has n tuples. Unauthorized: Not permitted, illegal, unlawful. View: A virtual table in the database defined by a query.

271 | P a g e

Infosys Foundation Program

Index

Index A ABORT .............................................. 137 ACID ................................................ 139 ALTER TABLE ....................................... 80 Application Programmer ........................ 21 Attribute ............................................ 37 B Bottom-Up .......................................... 55 Boyce Codd Normal Form ....................... 249 C Candidate Key ...................................... 27 Cardinality of a Relation ......................... 26 Cardinality of relationship ....................... 37 Cartesian product ................................ 118 Centralized ......................................... 15 CHECK OPTION .................................... 128 Check-Points ...................................... 160 COMMIT ............................................ 137 Conceptual / Logical level ...................... 18 Concurrency ....................................... 141 Concurrent Access ................................... 4 Concurrent Access Anomalies ................. 13 Co-Related Sub-Queries ......................... 114 CREATE TABLE ..................................... 75 Cube ................................................ 244 D Data Control Language (DCL) ................... 129 Data Definition Language (DDL)................. 74 Data Isolation ...................................... 12 Data Manipulation Language (DML)............. 87 Data Model .......................................... 22 Data Redundancy ................................. 11 Data Security ...................................... 10 Data Warehouse .................................. 240 Database .............................................. 2 Database Administrator ......................... 21 Database Management System .................... 1 DBMS Interface....................................... 8 Deadlock ........................................... 154 Deferred Update .................................. 157 Degree of a Relation .............................. 26 DELETE .............................................. 90 Derived Attribute ................................. 41 Determinant ........................................ 56

272 | P a g e

Dimension Hierarchies ......................... 244 Dimension Tables ................................ 244 Dimensional Modeling .......................... 243 Dirty Read ......................................... 143 Distributed .......................................... 15 Domain Integrity ................................. 140 DROP TABLE ........................................ 83 DROP VIEW ......................................... 126 E Embedded SQL .................................... 251 End User ............................................ 21 Entity ................................................ 37 Entity Integrity Constraint ...................... 29 Exclusive Lock ..................................... 146 EXISTS .............................................. 124 External / View level ............................ 18 F Fact Tables ........................................ 243 File System ........................................... 6 First Normal Form ................................. 60 Flat Files .............................................. 6 Foreign Key ......................................... 30 Full Functional Dependency ................... 58 Functionally Dependent ........................ 57 G GRANT .............................................. 130 GROUP BY .......................................... 103 H HAVING ............................................. 107 Heterogeneous ..................................... 17 Hierarchical Data Model .......................... 23 Homogeneous ...................................... 17 Horizontal View ................................... 126 I Immediate Update ............................... 158 Incorrect Summary ............................... 143 Independent Sub-Queries ....................... 112 Index ................................................. 84 INNER JOINS ....................................... 120 INSERT ............................................... 87 Integrated............................................. 3 Intent Locking ..................................... 149 Internal / Physical level ......................... 19

Infosys Foundation Program

Index

Key Attribute .................................. 40, 59

Record Based Logical Model ............... 23, 175 Recursive Relationship .......................... 41 Referential Constraint ............................ 31 Relational Database .............................. 26 Relational Model ................................... 25 Relationship ....................................... 37 REVOKE ............................................. 131 RIGHT OUTER JOIN ............................... 123 ROLLBACK ......................................... 137

L

S

Lack of Flexibility ................................ 13 LEFT OUTER JOIN................................ 122 Lost Update ........................................ 142

Second Normal Form .............................. 61 SELECT............................................... 93 SELF JOIN .......................................... 119 Self Referencing .................................. 31 Shared Intent Exclusive ......................... 150 Shared Lock ........................................ 146 Sharing ................................................ 3 Snowflake Schema .............................. 245 SQL ................................................... 71 Star Schema....................................... 244 Super Key .......................................... 33

Interrelated .......................................... 2 INTERSECT ......................................... 111 J Joins ................................................ 118 K

M Many to Many Relationship ...................... 39 Many to One Relationship ........................ 39 Master File ............................................ 8 Multivalued Attribute ............................ 41 N Network Model ..................................... 24 Non-Key Attributes ............................... 33 Normalization ..................................... 55 NOT EXISTS ........................................ 125 O Object Based Logical Model ..................... 23 On Line Analytical Processing (OLAP) ......... 239 One to Many Relationship ........................ 38 One to One Relationship ......................... 38 Overlapping Candidate Keys .................. 249 P Partially Dependent ............................... 59 Participating Entities ............................. 52 Phantom Record .................................. 144 Primary Key......................................... 29 Program/Data Dependence ..................... 12 R

T Third Normal Form ................................ 64 Timestamping ..................................... 254 Top-Down Approach ............................. 42 Transaction Log ................................... 156 Transactions .......................................... 1 Transitive Dependency ........................... 59 TRUNCATE TABLE .................................. 84 U UNION............................................... 109 V Vertical View ...................................... 126 View................................................. 126 W WHERE ............................................... 94

RDBMS ............................................... 26

273 | P a g e

Infosys Foundation Program

Related Documents


More Documents from "sankar"