Sql

  • Uploaded by: Wilfredy Idarraga Cadena
  • 0
  • 0
  • January 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Sql as PDF for free.

More details

  • Words: 7,824
  • Pages: 37
Loading documents preview...
QUERIES

Like I LIKE

can be a useful operator when you want to compare similar values.

The movies table contains two films with similar titles, 'Se7en' and 'Seven'. How could we select all movies that start with 'Se' and end with 'en' and have exactly one character in the middle? SELECT * FROM movies WHERE name LIKE 'Se_en';

is a special operator used with the WHERE clause to search for a specific pattern in a column.  name LIKE 'Se_en' is a condition evaluating the name column for a specific pattern.  Se_en represents a pattern with a wildcardcharacter. The _ means you can substitute any individual character here without breaking the pattern. The names Seven and Se7en both match this pattern. 1. Let's test it out. 

LIKE

In the code editor, type: SELECT * FROM movies WHERE name LIKE 'Se_en';

Like II The percentage sign % is another wildcard character that can be used with LIKE. This statement below filters the result set to only include movies with names that begin with the letter 'A': SELECT * FROM movies WHERE name LIKE 'A%';

is a wildcard character that matches zero or more missing letters in the pattern. For example: %

 

matches all movies with names that begin with letter 'A' %a matches all movies that end with 'a' A%

We can also use % both before and after a pattern: SELECT * FROM movies WHERE name LIKE '%man%';

Here, any movie that contains the word 'man' in its name will be returned in the result. is not case sensitive. 'Batman' and 'Man of Steel' will both appear in the result of the query above. 1. In the text editor, type: LIKE

SELECT * FROM movies WHERE name LIKE '%man%';

How many movie titles contain the word 'man'? What are the different use cases of the LIKEoperator? finds any values that start with the letter 'c' %c finds any values that end with the letter 'c' %re% finds values that have 're' in any position _a% finds any values that have the letter 'a' in the second index a_%_% finds any values that start with 'a' and are at least 3 characters in length. a%r finds any values that start with 'a' and end with 'r'.



LIKE c%



LIKE



LIKE



LIKE



LIEK



LIKE

2. Let's try one more. Edit the query so that it selects all the information about the movie titles that begin with the word 'The'. You might need a space in there!

Is Null By this point of the lesson, you might have noticed that there are a few missing values in the movies table. More often than not, the data you encounter will have missing values. Unknown values are indicated by NULL. It is not possible to test for NULL values with comparison operators, such as = and !=. Instead, we will have to use these operators: 

IS NULL



IS NOT NULL

To filter for all movies with an IMDb rating: SELECT name FROM movies WHERE imdb_rating IS NOT NULL;

1. Now let's do the opposite.

Write a query to find all the movies without an IMDb rating. Select only the name column! We want to query for movies that have a missing value in their imdb_rating field: SELECT name FROM movies WHERE imdb_rating IS NULL;

Notice how we used IS NULL instead of IS NOT NULL here.

Between The BETWEEN operator can be used in a WHEREclause to filter the result set within a certain range. The values can be numbers, text or dates. This statement filters the result set to only include movies with names that begin with letters 'A' up to, but not including 'J'. SELECT * FROM movies WHERE name BETWEEN 'A' AND 'J';

Here is another one:

SELECT * FROM movies WHERE year BETWEEN 1990 AND 1999;

In this statement, the BETWEEN operator is being used to filter the result set to only include movies with years between 1990 up to, and including 1999. Really interesting point to emphasize again:  

two letters is not inclusive of the 2nd letter. BETWEEN two numbers is inclusive of the 2nd number. BETWEEN

Instructions

1. Using the BETWEEN operator, write a query that selects all information about movies whose name begins with the letters 'D', 'E', and 'F'. This should be very similar to the first query in the narrative.

BETWEEN 'D' AND 'G'

should be the condition:

SELECT * FROM movies WHERE name BETWEEN 'D' AND 'G';

This will return all the names that begin with 'D', 'E', and 'F'. The upper bound should be 'G' because BETWEEN is not inclusive of the second letter. BETWEEN 'D' AND 'F'

shouldn't be the condition because it would return names that begin with 'D' and

'E', but not 'F'. And don't forget to capitalize D and G! is case-sensitive. If the condition is BETWEEN 'a' AND 'z', it would only return lowercase (a-z) results and not uppercase (A-Z). BETWEEN

2. Remove the previous query. Using the BETWEEN operator, write a new query that selects all information about movies that were released in the 1970's. In this statement, the BETWEEN operator is being used to filter the result set to only include movies with years in 1970-1979: SELECT * FROM movies WHERE year BETWEEN 1970 AND 1979;

Remember, BETWEEN two numbers is inclusive of the second number. Notice that there is a movie from 1979 in the result. Also, numeric values (INTEGER or REAL data types) don't need to be wrapped with single quotes, whereas TEXT string values do.

And Sometimes we want to combine multiple conditions in a WHERE clause to make the result set more specific and useful. One way of doing this is to use the ANDoperator. Here, we use the AND operator to only return 90's romance movies.

SELECT * FROM movies WHERE year BETWEEN 1990 AND 1999 AND genre = 'romance';   

is the 1st condition. genre = 'romance' is the 2nd condition. AND combines the two conditions. year BETWEEN 1990 AND 1999

With AND, both conditions must be true for the row to be included in the result. 1. In the previous exercise, we retrieved every movie released in the 1970's. Now, let's retrieve every movie released in the 70's, that's also well received. In the code editor, type: SELECT * FROM movies WHERE year BETWEEN 1970 AND 1979 AND imdb_rating > 8;

We are putting AND

imdb_rating > 8

on another line and indented just so it is easier to read

Or Similar to AND, the OR operator can also be used to combine multiple conditions in WHERE, but there is a fundamental difference:  

operator displays a row if all the conditions are true. OR operator displays a row if any condition is true. AND

Suppose we want to check out a new movie or something action-packed: SELECT * FROM movies WHERE year > 2014 OR genre = 'action';   

is the 1st condition. genre = 'action' is the 2nd condition. OR combines the two conditions. year > 2014

With OR, if any of the conditions are true, then the row is added to the result. Instructions

1. Let's test this out: SELECT * FROM movies WHERE year > 2014 OR genre = 'action';

This retrieves all the movies released after 2014 or in the action genre. We are putting OR genre = 'action' on another line and indented just so it is easier to read.

Order By That's it with WHERE and its operators. Moving on! It is often useful to list the data in our result set in a particular order. We can sort the results using ORDER BY, either alphabetically or numerically. Sorting the results often makes the data more useful and easier to analyze. For example, if we want to sort everything by the movie's title from A through Z: SELECT * FROM movies ORDER BY name; 

ORDER BY

is a clause that indicates you want to sort the result set by a particular

column.  name is the specified column. Sometimes we want to sort things in a decreasing order. For example, if we want to select all of the well-received movies, sorted from highest to lowest by their year: SELECT * FROM movies WHERE imdb_rating > 8 ORDER BY year DESC;

is a keyword used in ORDER BY to sort the results in descending order (high to low or Z-A).  ASC is a keyword used in ORDER BY to sort the results in ascending order (low to high or A-Z). The column that we ORDER BY doesn't even have to be one of the columns that we're displaying. 

Note: ORDER 1.

DESC

BY

always goes after WHERE (if WHERE is present).

Suppose we want to retrieve the nameand year columns of all the movies, ordered by their name alphabetically. Type the following code: SELECT name, year FROM movies ORDER BY name;

2. Your turn! Remove the previous query.

Write a new query that retrieves the name, year, and imdb_rating columns of all the movies, ordered highest to lowest by their ratings. Hint What are the columns that are selected and the table we are interested in? SELECT name, year, imdb_rating FROM movies;

Next, let's sort them.

SELECT name, year, imdb_rating FROM movies ORDER BY imdb_rating DESC;

We added DESC here because we want to sort it in a descending order. If you run this query, the result will start with movies with an IMDb rating of 9.0 all the way down to 4.2.

Limit We've been working with a fairly small table (fewer than 250 rows), but most SQL tables contain hundreds of thousands of records. In those situations, it becomes important to cap the number of rows in the result. For instance, imagine that we just want to see a few examples of records. SELECT * FROM movies LIMIT 10;

is a clause that lets you specify the maximum number of rows the result set will have. This saves space on our screen and makes our queries run faster. LIMIT

Here, we specify that the result set can't have more than 10 rows. always goes at the very end of the query. Also, it is not supported in all SQL databases. LIMIT

Instructions

1. Combining your knowledge of LIMITand ORDER highest rated movies.

BY,

write a query that returns the top 3

Select all the columns. First, what column(s) and table are we interested in? SELECT * FROM movies;

Next, sort them by rating (descending so we start from the highest). SELECT * FROM movies ORDER BY imdb_rating DESC;

Lastly, add a LIMIT cap.

SELECT * FROM movies ORDER BY imdb_rating DESC LIMIT 3;

If you run this query, the result will be 'The Dark Knight' at an impressive 9.0, 'Inception' and 'Star Wars: Episode V - The Empire Strikes Back' tying for second with a rating of 8.8.

Case A CASE statement allows us to create different outputs (usually in the SELECT statement). It is SQL's way of handling if-then logic.

Suppose we want to condense the ratings in movies to three levels:   

If the rating is above 8, then it is Fantastic. If the rating is above 6, then it is Poorly Received. Else, Avoid at All Costs.

SELECT name, CASE WHEN imdb_rating > 8 THEN 'Fantastic' WHEN imdb_rating > 6 THEN 'Poorly Received' ELSE 'Avoid at All Costs' END FROM movies;   

Each WHEN tests a condition and the following THEN gives us the string if the condition is true. The ELSE gives us the string if all the above conditions are false. The CASE statement must end with END.

In the result, you have to scroll right because the column name is very long. To shorten it, we can rename the column to 'Review' using AS: SELECT name, CASE WHEN imdb_rating > 8 THEN 'Fantastic' WHEN imdb_rating > 6 THEN 'Poorly Received' ELSE 'Avoid at All Costs' END AS 'Review' FROM movies;

1.

Let's try one on your own. Select the name column and use a CASEstatement to create the second column that is:   

'Chill' if genre = 'romance' 'Chill' if genre = 'comedy' 'Intense' in all other cases

Optional: Rename the whole CASEstatement to 'Mood' using AS. Give it your best shot! Check hint for the answer. his is the final boss! Your query should look like: SELECT name, CASE WHEN genre = 'romance' THEN 'Chill' WHEN genre = 'comedy' THEN 'Chill' ELSE 'Intense' END AS 'Mood' FROM movies;   

If the genre is romance, then it is Chill. If the genre is comedy, then it is Chill. Else, it is Intense.

Don't forget the comma after name. Here is another query that will give us the same result:

SELECT name, CASE WHEN genre = 'romance' OR genre = 'comedy' THEN 'Chill' ELSE 'Intense' END AS 'Mood' FROM movies;  

If the genre is romance or comedy, then it is Chill. Else, it is Intense.

Review Congratulations! We just learned how to query data from a database using SQL. We also learned how to filter queries to make the information more specific and useful. Let's summarize:         

is the clause we use every time we want to query information from a database. AS renames a column or table. DISTINCT return unique values. WHERE is a popular command that lets you filter the results of the query based on conditions that you specify. LIKE and BETWEEN are special operators. AND and OR combines multiple conditions. ORDER BY sorts the result. LIMIT specifies the maximum number of rows that the query will return. CASE creates different outputs. SELECT

Instructions

Feel free to experiment a bit more with the movies table before moving on! AGGREGATE FUNCTIONS

Introduction We've learned how to write queries to retrieve information from the database. Now, we are going to learn how to perform calculations using SQL. Calculations performed on multiple rows of a table are called aggregates. In this lesson, we have given you a table named fake_apps which is made up of fake mobile applications data.

Here is a quick preview of some important aggregates that we will cover in the next five exercises:     

COUNT():

count the number of rows SUM(): the sum of the values in a column MAX()/MIN(): the largest/smallest value AVG(): the average of the values in a column ROUND(): round the values in the column

Let's get started! 1. Before getting started, take a look at the data in the fake_apps table. In the code editor, type the following: SELECT * FROM fake_apps;

What are the column names? The column names are id, name, category, downloads, and price.

AGGREGATE FUNCTIONS

Count The fastest way to calculate how many rows are in a table is to use the COUNT() function. is a function that takes the name of a column as an argument and counts the number of non-empty values in that column. COUNT()

SELECT COUNT(*) FROM table_name;

Here, we want to count every row, so we pass * as an argument inside the parenthesis. 1. Let's count how many apps are in the table. In the code editor, run: SELECT COUNT(*) FROM fake_apps;

There are 200 apps. Common errors: 

Missing parenthesis.



Missing ;.

2. Add a WHERE clause in the previous query to count how many free apps are in the table. Hint Remember the WHERE statement? The following code should go inside the previous query, before the semicolon: SELECT COUNT(*) FROM fake_apps WHERE price = 0;  

indicates we want to only include rows where the following condition is true. price = 0 is the condition. WHERE

There are 73 free apps in the table.

Sum SQL makes it easy to add all values in a particular column using SUM(). is a function that takes the name of a column as an argument and returns the sum of all the values in that column. SUM()

What is the total number of downloads for all of the apps combined? SELECT SUM(downloads) FROM fake_apps;

This adds all values in the downloads column. 1. Let's find out the answer! In the code editor, type: SELECT SUM(downloads) FROM fake_apps;

There are 3,322,760 total download

Max / Min The MAX() and MIN() functions return the highest and lowest values in a column, respectively.

How many downloads does the most popular app have? SELECT MAX(downloads) FROM fake_apps;

The most popular app has 31,090 downloads! takes the name of a column as an argument and returns the largest value in that column. Here, we returned the largest value in the downloads column. MAX()

MIN()

works the same way but it does the exact opposite; it returns the smallest value.

1. What is the least number of times an app has been downloaded? In the code editor, type: SELECT MIN(downloads) FROM fake_apps;

1,387 downloads.

price of the most expensive app. Hint SELECT MAX(price) FROM fake_apps;

$14.99 is the price of the most expensive app.

Average SQL uses the AVG() function to quickly calculate the average value of a particular column. The statement below returns the average number of downloads for an app in our database: SELECT AVG(downloads) FROM fake_apps;

The AVG() function works by taking a column name as an argument and returns the average value for that column. 1. Calculate the average number of downloads for all the apps in the table. In the code editor, type: SELECT AVG(downloads) FROM fake_apps;

16,613.8 average downloads. 2. Remove the previous query.

Write a new query that calculates the average price for all the apps in the table. Hint

Which column should go inside the parenthesis? SELECT AVG(_____) FROM fake_apps;

The average price is $2.02365.

Round By default, SQL tries to be as precise as possible without rounding. We can make the result table easier to read using the ROUND()function. ROUND()

function takes two arguments inside the parenthesis:

1. a column name 2. an integer It rounds the values in the column to the number of decimal places specified by the integer. SELECT ROUND(price, 0) FROM fake_apps;

Here, we pass the column price and integer 0as arguments. SQL rounds the values in the column to 0 decimal places in the output. 1. Let's return the name column and a rounded price column. In the code editor, type: SELECT name, ROUND(price, 0) FROM fake_apps;

Hint We are selecting ROUND(price, 2. Remove the previous query.

0)

as the second column in this query.

In the last exercise, we were able to get the average price of an app ($2.02365) using this query: SELECT AVG(price) FROM fake_apps;

Now, let's edit this query so that it rounds this result to 2 decimal places. This is a tricky one! Hint

You can treat AVG(price) just like any other value and place it inside the ROUND function like so: ROUND(AVG(price), 2)

Here, AVG(price) is the 1st argument and 2 is the 2nd argument because we want to round it to two decimal places: SELECT ROUND(AVG(price), 2) FROM fake_apps;

You can treat AVG(price) just like any other value and place it inside the ROUND function like so: ROUND(AVG(price), 2)

Here, AVG(price) is the 1st argument and 2 is the 2nd argument because we want to round it to two decimal places: SELECT ROUND(AVG(price), 2) FROM fake_apps;

Group By I Oftentimes, we will want to calculate an aggregate for data with certain characteristics. For instance, we might want to know the mean IMDb ratings for all movies each year. We could calculate each number by a series of queries with different WHERE statements, like so: SELECT AVG(imdb_rating) FROM movies WHERE year = 1999; SELECT AVG(imdb_rating) FROM movies WHERE year = 2000; SELECT AVG(imdb_rating) FROM movies WHERE year = 2001;

and so on.

Luckily, there's a better way! We can use GROUP

BY

to do this in a single step:

SELECT year, AVG(imdb_rating) FROM movies GROUP BY year ORDER BY year;

is a clause in SQL that is used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups. GROUP BY

The GROUP BY statement comes after any WHEREstatements, but before ORDER 1. In the code editor, type: SELECT price, COUNT(*) FROM fake_apps GROUP BY price;

BY

Here, our aggregate function is COUNT()and we arranged price into groups.

or LIMIT.

What do you expect the result to be? The result contains the total number of apps for each price. It is organized into two columns, making it very easy to see the number of apps at each price.

2. In the previous query, add a WHEREclause to count the total number of apps that have been downloaded more than 20,000 times, at each price. Hint Remember, WHERE statement goes before the GROUP BY statement: SELECT price, COUNT(*) FROM fake_apps WHERE downloads > 20000 GROUP BY price;

3. Remove the previous query.

Write a new query that calculates the total number of downloads for each category. Select category and SUM(downloads). First, select the two columns we want: SELECT category, SUM(downloads) FROM fake_apps;

Next, group the result for each category by adding a GROUP BY: SELECT category, SUM(downloads) FROM fake_apps GROUP BY category;

Group By II Sometimes, we want to GROUP

BY

a calculation done on a column.

For instance, we might want to know how many movies have IMDb ratings that round to 1, 2, 3, 4, 5. We could do this using the following syntax: SELECT ROUND(imdb_rating), COUNT(name) FROM movies GROUP BY ROUND(imdb_rating) ORDER BY ROUND(imdb_rating);

However, this query may be time-consuming to write and more prone to error. SQL lets us use column reference(s) in our GROUP   

is the first column selected 2 is the second column selected 3 is the third column selected 1

and so on.

BY

that will make our lives easier.

The following query is equivalent to the one above: SELECT ROUND(imdb_rating), COUNT(name) FROM movies GROUP BY 1 ORDER BY 1;

Here, the 1 refers to the first column in our SELECT statement, ROUND(imdb_rating). 1. Suppose we have the query below: SELECT category, price, AVG(downloads) FROM fake_apps GROUP BY category, price;

Write the exact query, but use column reference numbers instead of column names after GROUP BY. These numbers represent the selected columns: refers to category. refers to price. 3 refers to AVG(downloads)



1



2



Now, change the GROUP BY with numbers: SELECT category, price, AVG(downloads) FROM fake_apps GROUP BY 1, 2;

Note: Even if you use column names instead of numbers, it will still be correct because these two queries are exactly the same!

Having In addition to being able to group data using GROUP groups to include and which to exclude.

BY,

SQL also allows you to filter which

For instance, imagine that we want to see how many movies of different genres were produced each year, but we only care about years and genres with at least 10 movies. We can't use WHERE here because we don't want to filter the rows; we want to filter groups. This is where HAVING comes in. is very similar to WHERE. In fact, all types of WHERE clauses you learned about thus far can be used with HAVING. HAVING

We can use the following for the problem: SELECT year, genre, COUNT(name) FROM movies GROUP BY 1, 2 HAVING COUNT(name) > 10;

 

HAVING

When we want to limit the results of a query based on values of the individual rows, use WHERE. When we want to limit the results of a query based on an aggregate property, use HAVING. statement always comes after GROUP

BY,

but before ORDER

BY

and LIMIT.

1. Suppose we have the query below: SELECT price, ROUND(AVG(downloads)), COUNT(*) FROM fake_apps GROUP BY price;

It returns the average downloads (rounded) and the number of apps – at each price point. However, certain price points don't have very many apps, so their average downloads are less meaningful. Add a HAVING clause to restrict the query to price points that have more than 10 apps. The total number of apps at each price point would be given by COUNT(*). SELECT price, ROUND(AVG(downloads)), COUNT(*) FROM fake_apps GROUP BY price HAVING COUNT(*) > 10; COUNT(*) > 10

is the condition.

Because the condition has an aggregate function in it, we have to use HAVING instead of WHERE.

Review Congratulations! You just learned how to use aggregate functions to perform calculations on your data. What can we generalize so far?     

COUNT():

count the number of rows SUM(): the sum of the values in a column MAX()/MIN(): the largest/smallest value AVG(): the average of the values in a column ROUND(): round the values in the column

    

COUNT (): cuente el número de filas SUM (): la suma de los valores en una columna MAX () / MIN (): el valor más grande / más pequeño AVG (): el promedio de los valores en una columna ROUND (): redondea los valores en la columna

  

Aggregate functions combine multiple rows together to form a single value of more meaningful information.





is a clause used with aggregate functions to combine data from one or more columns. HAVING limit the results of a query based on an aggregate property. GROUP BY

 

GROUP BY es una cláusula utilizada con funciones agregadas para combinar datos de una o más columnas. HAVING limitar los resultados de una consulta basada en una propiedad agregada.

Instructions

Feel free to experiment a bit more with the fake_apps table before moving on!

MULTIPLE TABLES

Introduction In order to efficiently store data, we often spread related information across multiple tables. For instance, imagine that we're running a magazine company where users can have different types of subscriptions to different products. Different subscriptions might have many different properties. Each customer would also have lots of associated information. We could have one table with all of the following information:         

order_id customer_id customer_name customer_address subscription_id subscription_description subscription_monthly_price subscription_length purchase_date

However, a lot of this information would be repeated. If the same customer has multiple subscriptions, that customer's name and address will be reported multiple times. If the same subscription type is ordered by multiple customers, then the subscription price and subscription description will be repeated. This will make our table big and unmanageable. So instead, we can split our data into three tables: 

orders would contain o order_id o customer_id o subscription_id o purchase_date

just the information necessary to describe what was ordered:



subscriptions would contain o subscription_id o description o price_per_month o subscription_length



customers would contain o customer_id o customer_name o address

the information to describe each type of subscription:

the information for each customer:

In this lesson, we'll learn the SQL commands that will help us work with data that is stored in multiple tables. 1. Examine these tables by pasting the following code into the editor: SELECT * FROM orders LIMIT 5; SELECT * FROM subscriptions LIMIT 5; SELECT * FROM customers LIMIT 5;

orders

(a table with information on each magazine purchase) order_id customer_id subscription_id 1

2

3

purchase date 2017-0101

2

2

2

2017-0101

3

3

1

2017-0101

subscriptions

(a table that describes each type of subscription) subscription_id description price_per_month length 1

Politics Magazine

5

12 months

2

Fashion Magazine

10

6 months

3

Sports Magazine

7

3 months

customers

(a table with customer names and contact information) customer_id

customer_name

address

1

John Smith

123 Main St

2

Jane Doe

456 Park Ave

3

Joe Schmo

798 Broadway

Combining Tables Manually Let's return to our magazine company. Suppose we have the three tables described in the previous exercise – shown in the browser on the right (we are going to try something new!):   

orders subscriptions customers

If we just look at the orders table, we can't really tell what's happened in each order. However, if we refer to the other tables, we can get a complete picture. Let's examine the order with an order_id of 2. It was purchased by the customer with a customer_id of 2. To find out the customer's name, we look at the customers table and look for the item with a customer_id value of 2. We can see that Customer 2's name is 'Jane Doe' and that she lives at '456 Park Ave'. Doing this kind of matching is called joining two tables. 1. Using the tables displayed, what is the description of the magazine ordered in order_id 1? Type your answer on line 1 of the code editor. Be sure to capitalize it the same as in the table. First, we look at the orders table. The order with an order_id of 1 is in the first row. Its subscription_id is 3. Then we look at the subscriptions table. The subscription with a subscription_id of 3 is in the third row. Its description is Sports Magazine. Answer: Sports Magazine Don't write a query for this exercise! 2. Using the tables displayed, what is the customer_name of the customer in order_id 3? Type your answer on line 2 of the code editor. Be sure to capitalize it the same as in the table.

Combining Tables with SQL Combining tables manually is time-consuming. Luckily, SQL gives us an easy sequence for this: it's called a JOIN.

If we want to combine orders and customers, we would type: SELECT * FROM orders JOIN customers ON orders.customer_id = customers.customer_id;

Let's break down this command:

1. The first line selects all columns from our combined table. If we only want to select certain columns, we can specify which ones we want. 2. The second line specifies the first table that we want to look in, orders 3. The third line uses JOIN to say that we want to combine information from orderswith customers. 4. The fourth line tells us how to combine the two tables. We want to match orderstable's customer_id column with customerstable's customer_id column. Because column names are often repeated across multiple tables, we use the syntax table_name.column_name to be sure that our requests for columns are unambiguous. In our example, we use this syntax in the ONstatement, but we will also use it in the SELECTor any other statement where we refer to column names. For example: Instead of selecting all the columns using *, if we only wanted to select orders table's order_id column and customerstable's customer_name column, we could use the following query: SELECT orders.order_id, customers.customer_name FROM orders JOIN customers ON orders.customer_id = customers.customer_id;

Instructions

1. Join orders table and subscriptionstable and select all columns. Make sure to join on the subscription_idcolumn. Suppose we do: SELECT * FROM orders LIMIT 10; SELECT * FROM subscriptions LIMIT 10;

You will notice that both orders table and subscriptions table have a subscription_idcolumn. And we want to match these two columns:  

orders.subscription_id subscriptions.subscription_id

SELECT * FROM orders JOIN subscriptions ON orders.subscription_id = subscriptions.subscription_id;

Notice that in the column subscriptions.subscription_id, the table name has a 's' in the end. 2. Don't remove the previous query.

Add a second query after your first one that only selects rows from the join where description is equal to 'Fashion Magazine'. The WHERE clause goes after the JOIN! Answer: -- Checkpoint 1 SELECT * FROM orders JOIN subscriptions ON orders.subscription_id = subscriptions.subscription_id; -- Checkpoint 2 SELECT * FROM orders JOIN subscriptions ON orders.subscription_id = subscriptions.subscription_id WHERE subscriptions.description = 'Fashion Magazine';

What is the difference between the two result tables?

MULTIPLE TABLES

Inner Joins Let's revisit how we joined orders and customers. For every possible value of customer_id in orders, there was a corresponding row of customers with the same customer_id. What if that wasn't true? For instance, imagine that our customers table was out of date, and was missing any information on customer 11. If that customer had an order in orders, what would happen when we joined the tables? When we perform a simple JOIN (often called an inner join) our result only includes rows that match our ON condition. Consider the following animation, which illustrates an inner join of two tables on table1.c2 = table2.c2:

The first and last rows have matching values of c2. The middle rows do not match. The final result has all values from the first and last rows but does not include the non-matching middle row. Instructions

1. Suppose we are working for The Codecademy Times, a newspaper with two types of subscriptions:  

print newspaper online articles

Some users subscribe to just the newspaper, some subscribe to just the online edition, and some subscribe to both. There is a newspaper table that contains information about the newspaper subscribers. Count the number of subscribers who get a print newspaper using COUNT(). Use COUNT(*) to count all rows of a table: SELECT COUNT(*) FROM newspaper;

2. Don't remove your previous query. There is also an online table that contains information about the online subscribers. Count the number of subscribers who get an online newspaper using COUNT(). Use COUNT(*) to count all rows of a table: SELECT COUNT(*) FROM online;

Suppose we do: SELECT * FROM newspaper LIMIT 10; SELECT * FROM online LIMIT 10;

You will notice that both newspaper table and online table have an id column. And we want to match these two columns:  

newspaper.id online.id

Your ON statement should look like this: ON newspaper.id = online.id

Remember to use SELECT COUNT(*) to count the rows: SELECT COUNT(*) FROM newspaper JOIN online ON newspaper.id = online.id;

Query Results

COUNT(*) 60 COUNT(*) 65 COUNT(*) 50

Left Joins What if we want to combine two tables and keep some of the un-matched rows? SQL lets us do this through a command called LEFT JOIN. A left join will keep all rows from the first table, regardless of whether there is a matching row in the second table. Consider the following animation:

The first and last rows have matching values of c2. The middle rows do not match. The final result will keep all rows of the first table but will omit the un-matched row from the second table.

This animation represents a table operation produced by the following command: SELECT * FROM table1 LEFT JOIN table2 ON table1.c2 = table2.c2;

1. 2. 3. 4.

The first line selects all columns from both tables. The second line selects table1 (the "left" table). The third line performs a LEFT JOIN on table2 (the "right" table). The fourth line tells SQL how to perform the join (by looking for matching values in column c2).

1. Let's return to our newspaper and onlinesubscribers. Suppose we want to know how many users subscribe to the print newspaper, but not to the online. Start by performing a left join of newspaper table and online table on their id columns and selecting all columns. Remember to put newspaper first to keep all rows of newspaper: SELECT * FROM newspaper LEFT JOIN online ON newspaper.id = online.id;

2. Don't remove your previous query.

In order to find which users do notsubscribe to the online edition, we need to add a WHERE clause. Add a second query after your first one that adds the following WHERE clause and condition: WHERE online.id IS NULL

This will select rows where there was no corresponding row from the onlinetable. Hint Don't remove your previous query. Add the second query so you can compare the results! The second query should look like: SELECT * FROM newspaper LEFT JOIN online ON newspaper.id = online.id WHERE online.id IS NULL;

Remember, WHERE clause goes after the LEFT

JOIN.

Primary Key vs Foreign Key Let's return to our example of the magazine subscriptions. Recall that we had three tables: orders, subscriptions, and customers. Each of these tables has a column that uniquely identifies each row of that table:   

for orders subscription_id for subscriptions customer_id for customers order_id

These special columns are called primary keys. Primary keys have a few requirements:   

None of the values can be NULL. Each value must be unique (i.e., you can't have two customers with the same customer_id in the customers table). A table can not have more than one primary key column.

Let's reexamine the orders table: order_id customer_id subscription_id purchase_date 1

2

3

2017-01-01

2

2

2

2017-01-01

3

3

1

2017-01-01

Note that customer_id (the primary key for customers) and subscription_id (the primary key for subscriptions) both appear in this. When the primary key for one table appears in a different table, it is called a foreign key. So customer_id is a primary key when it appears in customers, but a foreign key when it appears in orders. In this example, our primary keys all had somewhat descriptive names. Generally, the primary key will just be called id. Foreign keys will have more descriptive names.

Why is this important? The most common types of joins will be joining a foreign key from one table with the primary key from another table. For instance, when we join orders and customers, we join on customer_id, which is a foreign key in orders and the primary key in customers. 1. Suppose Columbia University has two tables in their database:  

The classes table contains information on the classes that the school offers. Its primary key is id. The students table contains information on all students in the school. Its primary key is id. It contains the foreign key class_id, which corresponds to the primary key of classes.

Perform an inner join of classes and students using the primary and foreign keys described above, and select all the columns. Your ON statement should include:  

(a primary key) students.class_id (a foreign key) classes.id

It should look like: SELECT * FROM classes JOIN students ON classes.id = students.class_id;

You should already know how to do this join. But in this exercise, you learned that the matching column is usually a primary key of a table and foreign key of another!

Cross Join So far, we've focused on matching rows that have some information in common. Sometimes, we just want to combine all rows of one table with all rows of another table. For instance, if we had a table of shirts and a table of pants, we might want to know all the possible combinations to create different outfits. Our code might look like this: SELECT shirts.shirt_color, pants.pants_color FROM shirts CROSS JOIN pants; 

The first two lines select the columns shirt_color and pants_color.

 

The third line pulls data from the table shirts. The fourth line performs a CROSS JOIN with pants.

Notice that cross joins don't require an ONstatement. You're not really joining on any columns! If we have 3 different shirts (white, grey, and olive) and 2 different pants (light denim and black), the results might look like this: shirt_color

pants_color

white

light denim

white

black

grey

light denim

grey

black

olive

light denim

olive

black

3 shirts × 2 pants = 6 combinations! This clothing example is fun, but it's not very practically useful. A more common usage of CROSS list of values.

JOIN

is when we need to compare each row of a table to a

Let's return to our newspaper subscriptions. This table contains two columns that we haven't discussed yet:  

start_month:

the first month where the customer subscribed to the print newspaper (i.e., 2 for February) end_month: the final month where the customer subscribed to the print newspaper

Suppose we wanted to know how many users were subscribed during each month of the year. For each month (1, 2, 3) we would need to know if a user was subscribed. Follow the steps below to see how we can use a CROSS JOIN to solve this problem. 1. Eventually, we'll use a cross join to help us, but first, let's try a simpler problem.

Let's start by counting the number of customers who were subscribed to the newspaper during March. Use COUNT(*) to count the number of rows and a WHERE clause to restrict to two conditions:  

start_month <= 3 end_month >= 3

"During March" means that the customer's starting month was in or before March and final month was in or after March: SELECT COUNT(*) FROM newspaper WHERE start_month <= 3 AND end_month >= 3;

Answer: 13 2. Don't remove the previous query.

The previous query lets us investigate one month at a time. In order to check across all months, we're going to need to use a cross join. Our database contains another table called months which contains the numbers between 1 and 12. Select all columns from the cross join of newspaper and months. Hint SELECT * FROM newspaper CROSS JOIN months;

When you get the result, make sure to scroll right to take a look at the rightmost column, month. Each customer is CROSS JOIN'ed with each month. 3. Don't remove your previous queries. Create a third query where you add a WHERE statement to your cross join to restrict to two conditions:  

start_month <= month end_month >= month

This will select all months where a user was subscribed. Hint SELECT * FROM newspaper CROSS JOIN months WHERE ________ AND ________;

Scroll down to take a look at the result of this query. Notice how it filtered from the previous CROSS

JOIN.

Scroll right to look at the month column – some months are gone now. 4. Don't remove your previous queries. Create a final query where you aggregate over each month to count the number of subscribers. Fill in the blanks in the following query: SELECT month, COUNT(*) FROM ________ CROSS JOIN ________ WHERE ________ AND ________ GROUP BY ________;

Hint

SELECT month, COUNT(*) FROM newspaper CROSS JOIN months WHERE start_month <= month AND end_month >= month GROUP BY month;

While we are at it, let's rename the second column using AS:

SELECT month, COUNT(*) AS 'subscribers' FROM newspaper CROSS JOIN months WHERE start_month <= month AND end_month >= month GROUP BY month;

Scroll down to see the final result.

Which month has the highest subscribers? June! …………… SELECT COUNT(*) FROM newspaper WHERE start_month <= 3 AND end_month >= 3;

SELECT * FROM newspaper CROSS JOIN months;

SELECT * FROM newspaper CROSS JOIN months WHERE newspaper.start_month <= month AND newspaper.end_month >= month;

SELECT month, COUNT (*) FROM newspaper CROSS JOIN months

WHERE newspaper.start_month <= month AND newspaper.end_month >= month GROUP BY month;

Union Sometimes we just want to stack one dataset on top of the other. Well, the UNION operator allows us to do that. Suppose we have two tables and they have the same columns. table1:

pokemon

type

Bulbasaur

Grass

Charmander

Fire

Squirtle

Water

table2:

pokemon Snorlax

type Normal

If we combine these two with UNION: SELECT * FROM table1 UNION SELECT * FROM table2;

The result would be: pokemon

type

Bulbasaur

Grass

Charmander

Fire

Squirtle

Water

Snorlax

Normal

SQL has strict rules for appending data:  

Tables must have the same number of columns. The columns must have the same data types in the same order as the first table.

1. Let's return to our newspaper and onlinesubscriptions. We'd like to create one big table with both sets of data. Use UNION to stack the newspaper table on top of the online table. SELECT * FROM newspaper UNION SELECT * FROM online;

With Often times, we want to combine two tables, but one of the tables is the result of another calculation. Let's return to our magazine order example. Our marketing department might want to know a bit more about our customers. For instance, they might want to know how many magazines each customer subscribes to. We can easily calculate this using our orders table: SELECT customer_id, COUNT(subscription_id) AS 'subscriptions' FROM orders GROUP BY customer_id;

This query is good, but a customer_id isn't terribly useful for our marketing department, they probably want to know the customer's name. We want to be able to join the results of this query with our customers table, which will tell us the name of each customer. We can do this by using a WITH clause. WITH previous_results AS ( SELECT ... ... ... ... ) SELECT * FROM previous_results JOIN customers ON _____ = _____;

  

The WITH statement allows us to perform a separate query (such as aggregating customer's subscriptions) previous_results is the alias that we will use to reference any columns from the query inside of the WITH clause We can then go on to do whatever we want with this temporary table (such as join the temporary table with another table)

Essentially, we are putting a whole first query inside the parentheses () and giving it a name. After that, we can use this name as if it's a table and write a new query using the first query. 1. Place the whole query below into a WITHstatement, inside parentheses (), and give it name previous_query: SELECT customer_id, COUNT(subscription_id) AS 'subscriptions' FROM orders GROUP BY customer_id

Join the temporary table previous_querywith customers table and select the following columns:  

customers.customer_name previous_query.subscriptions

Check the answer in the hint below.

Remember to use the following ON statement as part of your JOIN: ON previous_query.customer_id = customers.customer_id

And for review, AS is how you give something an alias.

Here, we are essentially giving everything inside the parentheses (the sub-query) the name of previous_query using AS. Then, previous_query is used as a temporary table that we will query from and also join with the customers table: WITH previous_query AS ( SELECT customer_id, COUNT(subscription_id) AS 'subscriptions' FROM orders GROUP BY customer_id ) SELECT customers.customer_name, previous_query.subscriptions FROM previous_query JOIN customers ON previous_query.customer_id = customers.customer_id;

Do not include ; inside of the () of your WITHstatement.

MULTIPLE TABLES

Review In this lesson, we learned about relationships between tables in relational databases and how to query information from multiple tables using SQL. Let's summarize what we've learned so far:       

will combine rows from different tables if the join condition is true. LEFT JOIN will return every row in the lefttable, and if the join condition is not met, NULL values are used to fill in the columns from the right table. Primary key is a column that serves a unique identifier for the rows in the table. Foreign key is a column that contains the primary key to another table. CROSS JOIN lets us combine all rows of one table with all rows of another table. UNION stacks one dataset on top of another. WITH allows us to define one or more temporary tables that can be used in the final query. JOIN

Instructions

Feel free to experiment a bit more with the orders, subscriptions, customers tables before moving on!

Related Documents

Sql
March 2021 0
Sql
January 2021 3
Makalah Sql
January 2021 1
Programacion Sql
March 2021 0
Makalah Sql
January 2021 3
Sql Injections
January 2021 1

More Documents from "rajen_2cool"

Sql
January 2021 3
March 2021 0
March 2021 0
Ntc-1500
January 2021 3