Querying With the SQL Select Command

Robert W. Weeks bob.weeks@cox.net, http://members.cox.net/bob.weeks

SQL is a fourth generation language designed for working with relational databases. SQL usually contains several dozen commands. The select command is the most important for most users. Its purpose is to retrieve data.

The basic form of the SQL select command:

Select columnlist
	from tablelist
	where conditions
	group by grouplist
	order by orderlist

Columnlist: The list of columns to appear in the output. To retrieve more than one column, separate the column names with commas. * (the asterisk) used as a column name means retrieve all columns. Columns may include calculated values (eg: datein - dateout, salary * 1.1) and columns calculated with aggregating functions (eg: avg(salary), sum(fee)). The order in which columns are listed affects only the order in which they are listed in the output.

Tablelist: The list of tables that participate in the query. Separate multiple table names with commas. Note that if a column from a table appears in the condition, it needs to appear in the Tablelist even though it may not appear in the Columnlist. This is common with join conditions. Exception: Subqueries are a separate query. The order in which tables are listed doesn't matter.

Conditions: Expressions that serve to limit the number of rows returned from a table, or to join rows from two or more tables. Examples:

where name = "John Doe" Only rows where the name column contains "John Doe" will be included in the output. Character data is enclosed in quotation marks or apostrophes.
where pprice > 15.95 Only rows where the pprice column contains a value greater than 15.95 will be included in the output. Numeric data isn't surrounded with quotation marks.
where pprice >= 15.95 Only rows where the pprice column contains a value greater than or equal to 15.95 will be included in the output.
where pdate > 03/01/95 Only rows where the pdate column is greater than (after) March 1, 1995 will be included.
Where pdate >= 3/1/95 and pdate < 4/1/95 Only rows where the pdate column is greater than or equal to March 1, 1995 and at the same time the pdate column is less than April 1, 1995 will be included.
Where color = "Red" or color = "Blue" Only rows where the color column contains Red or the color column contains Blue will be included. This produces output that contains both red and blue cars, for example.
where catalog.id = tape.id An example of a join condition. (The notation catalog.id means the id column from the catalog table.) In this case, SQL will match rows from the catalog table with rows from the tape table that make this statement true. When a match is found, SQL creates a row in the output for each match. For example, the catalog table may have one row where the id column contains 1. The tape table may have three rows where the id column contains 1. SQL will produce three rows in the output, as there are three cases that match this condition.
where member.member in (select member from rental) A subquery. In this example, SQL first performs the subquery (in parenthesis). This subquery produces a temporary list (or table) that contains all the values found in the member column of the rental table. Then, SQL uses the in operator to compare member.member with the temporary list. If member.member is found in the list, the expression is true. SQL can also use not in.

Conditions may be combined together with or, and, and not. Example: where color = "Red" and pdate >= 03/01/95 and pdate < 04/01/95.

Grouplist: If specified, SQL groups the rows in the output by columns in this list. Grouping is most often used with aggregating functions to produce summaries of data. SQL produces one row for each unique value found in the grouping column, and the function is calculated for all rows that belong to the group.

Orderlist: If specified, SQL sorts the output rows by the columns in this list. Example: order by state, city

Aggregating (summary) functions: Sum, count, min, max, and avg are the most common. These function are applied to a column as in select sum(salary), which sums the values in the salary column. Aggregating functions may be used alone, as in this example:

Select sum(salary) from employee

Select sum(salary) from employee where department = "Engineering"

Select count(*) from employee

The first example sums the salary column from the employee table, for all rows in the table. The second example sums the salary column from the employee table, but only for those rows where the department column contains the value Engineering. Presumably, this will return a smaller value than the first example. Count is used with * as the argument, which means to count the rows.

Aggregating functions are often used with the group by part of the select command. This example:

select department, sum(salary) from employee group by department

produces an output table with two columns: a department name, and the sum of the salary column for all the rows in that department:

Department      Sum of Salary
--------------  ---------------
Engineering     350,000
Marketing       125,000
Production      568,349

Sample Pine Valley Video Club Database

Member Table

MEMBERID NAME         ADDRESS         DATEJOINED
     1   John Doe     123 N. Main     02/05/94  
     2   Jane Wilson  100 N. Broadway 03/10/94  
     3   Bob Smith    100 E. Elm      03/15/94  
     4   Sue Williams 100 W. Oak      04/10/94  

Tape Table

TAPEID CATALOGID SUPPLIERID PURCHASEDA PURCHASEPR
     1         1          1 01/05/94        15.57
     2         1          1 01/05/94        15.57
     3         1          2 01/15/94        14.95
     4         2          2 01/05/94        15.95
     5         2          2 01/05/94        15.95
     6         2          1 02/01/94        15.95
     7         3          2 02/01/94        15.95
     8         4          2 02/01/94        15.95
     9         4          1 02/01/94        15.95
    10         5          1 02/01/94        24.58
    11         6          2 02/01/94         4.58
    12         7          3 02/05/94        15.45
    13         8          2 02/05/94        21.57
    14         9          1 02/05/94        14.95
    15        10          1 02/05/94        15.95

Supplier Table

SUPPLIERID NAME           ADDRESS        
         1 ABC Tape Co.   100 N. Main    
         2 Acme Tape Co.  100 N. Broadway
         3 Tapes 'R' Us   100 E. Pine    

Charge Table

CHARGECODE  CHARGE
  A         3.00
  B         2.00
  C         1.15

Catalog Table

CATALOGID TITLE           RATING CHARGECODE
        1 Forrest Gump    PG-13  A         
        2 Batman Forever  PG-13  A         
        3 The Lion King   G      A         
        4 Apollo 13       PG-13  A         
        5 Wyatt Earp      R      B         
        6 Tombstone       R      B         
        7 City Slickers   PG     C         
        8 101 Dalmations  G      C         
        9 Dumb and Dumber PG-13  C         
       10 Casablanca      G      C         

Rental Table

MEMBERID  TAPEID DATERENTED      DATEDUE      DATERETURN RENTALFEE
       1       3 04/01/94        04/05/94     04/03/94   3.00
       1       8 04/10/94        04/14/94     04/12/94   3.00
       1      10 04/10/94        04/14/94     04/12/94   2.00
       2       3 04/25/94        04/29/94     04/28/94   3.00
       1       7 05/01/94        05/05/94     05/02/94   3.00
       2       7 05/03/94        05/07/94     05/10/94   3.00
       1       4 05/15/94        05/19/94     05/18/94   3.00
       1       7 05/15/94        05/19/94     05/18/94   3.00
       1       9 05/15/94        05/19/94     05/18/94   3.00
       1       2 06/01/94        06/05/94     06/04/94   3.00
       2       3 06/01/94        06/05/94                3.00
       3      13 06/01/94        06/05/94     06/06/94   1.00
       1       6 06/01/94        06/05/94                3.00

SQL Select Command Examples

List the members' names.

select name from member

NAME
John Doe
Jane Wilson
Bob Smith
Sue Williams

This is about the simplest SQL query possible, retrieving all rows for a single column from a table. This query retrieves all rows from the member table.

List all the members' names and the date they joined.

select name, datejoined
from member

NAME                  DATEJOINED        
John Doe              02/05/94          
Jane Wilson           03/10/94          
Bob Smith             03/15/94          
Sue Williams          04/10/94          

Separate more than one column name with commas.

List all the members.

Select *
from member

MEMBERID NAME                  ADDRESS                    DATEJOINED        
       1 John Doe              123 N. Main                02/05/94          
       2 Jane Wilson           100 N. Broadway            03/10/94          
       3 Bob Smith             100 E. Elm                 03/15/94          
       4 Sue Williams          100 W. Oak                 04/10/94          

We used * to as a shortcut for all columns from the member table.

List all the members, sorted by name.

select *
from member 
order by name

MEMBER NAME           ADDRESS              DATEJOINED
3      Bob Smith      100 E. Elm           03/15/94 
2      Jane Wilson    100 N. Broadway      03/10/94 
1      John Doe       123 N. Main          02/05/94 
4      Sue Williams    100 W. Oak          04/10/94 

Without using order by, SQL presents the rows in no particular order. The order of rows means nothing to a relational database. Humans, of course, may find it convenient to have rows shown in some order.

On what date did member number 3 join?

select datejoined
	from member
	where memberid = 3

DATEJOINED        
03/15/94          

The question asks for the date only, so that is the only column requested. Generally it's best to output only those columns needed to answer the question; extra columns may contribute to information overload. The where part of the statement serves to determine which rows in the member table appear in the output. In this case, only rows for which the MemberID column contains the value 3 will be in the output. Since MemberID is the primary key of the member table, it's good that we had just one row of output! (What would it mean if there were two or more rows of output?)

List the names of the members who joined the club in February of 1994.

select name from member 
	where datejoined >= 02/01/94 and datejoined < 03/01/94

NAME        
John Doe    

Note the use of two date conditions joined by and to select dates within a certain date range.

Members often ask for G-rated movies for family viewing. List the titles of the tapes with a G rating.

select title
from catalog 
	where rating = "G"

TITLE         
The Lion King 
101 Dalmations
Casablanca    

Surround character data, such as a movie rating, with quotation marks or apostrophes (different implementations of SQL vary). Don't use them with numeric data. Notice that if we select rows based on the value in a column (the rating column in this example), it does not mean that we must include the column in the output. What is the meaning of the word tape in this context?

List the titles of the tapes that are rated either G or PG-13.

select title
from catalog
where rating = "G" or rating = "PG-13"

TITLE                               
Forrest Gump                        
Batman Forever                      
The Lion King                       
Apollo 13                           
101 Dalmations                      
Dumb and Dumber                     
Casablanca                          

List the tapes which are currently rented out.

select * from rental
	where datereturn is null

MEMBERID   TAPEID DATERENTED   DATEDUE    DATERETURN  RENTALFEE
       2        3 06/01/94     06/05/94                    3.00
       1        6 06/01/94     06/05/94                    3.00

Is null selects rows where the column datereturn is empty (has not had a value entered), meaning that the member hasn't yet returned the tape. What is the meaning of the word tape in this context?

How many tapes do we have?

select count(*)
	from tape

COUNT
 15

Count(*) counts the number of rows. What is the meaning of a row in the tape table?

How much have we spent on tapes?

select sum(purchasepr) from tape

SUM OF PURCHASEPR
      238.87

This shows the use of a summary operator or aggregating function to calculate a statistic for a column of data. Be sure to distinguish summing from counting. Count counts the number of rows, while sum adds values in a column. Very often a row in a table corresponds to some real-world thing. For example, a row in the tape table corresponds to a video tape on the shelf of the store. If we want to know how many tapes we have, count the number of rows in the tape table.

What is the average price paid for a tape?

select avg(purchasepr)
	from tape

AVERAGE OF PURCHASEPR
     15.92

This shows the use of a summary operator or aggregating function to calculate a statistic for a column of data.

How many members are there?

select count(*) from member

COUNT
  4

What is the meaning of a row in the member table? What does it mean to count the rows?

How many tapes have been rented? What was the total rental revenue?

select count(*), sum(rentalfee)
	from rental

COUNT      SUM OF RENTALFEE
 13            36.00

How many tapes were rented in the month of April 1994? What was the total rental revenue?

select count(*), sum(rentalfee)
	from rental
	where daterented >= 4/1/94
	and daterented < 5/1/94

COUNT       SUM OF RENTALFEE
4              11.00

Not a very busy month. But no matter how many tapes were rented, this SQL query will tell us how many and for how much.

List tapes. Show the tape ID, purchase date, and movie title. Sort the output by the purchase date.

select tape.tapeid, tape.purchaseda, catalog.title
	from tape, catalog
	where tape.catalogid = catalog.catalogid
	order by purchaseda

TAPEID PURCHASEDA         TITLE                               
     1 01/05/94           Forrest Gump                        
     2 01/05/94           Forrest Gump                        
     4 01/05/94           Batman Forever                      
     5 01/05/94           Batman Forever                      
     3 01/15/94           Forrest Gump                        

(Not all results shown.) Because this query draws data from more than one table, we must tell SQL how to join rows in the tables. The join condition where tape.catalogid = catalog.catalogid does that. This tells SQL to join rows that contain common catalog ID values. Without the join condition, this query fails to produce the desired result.

List members' names along with names of movies they have rented. Sort by member name, then movie title, then date rented. Show date rented and date returned.

select member.name, catalog.title, rental.daterented, rental.datereturn 
	from member, catalog, rental, tape 
	where member.memberid = rental.memberid 
	and rental.tapeid = tape.tapeid 
	and tape.catalogid = catalog.catalogid 
	order by member.name, catalog.title, daterented

NAME             TITLE            DATERENTED  DATERETURN        
Bob Smith        101 Dalmations   06/01/94    06/06/94          
Jane Wilson      Forrest Gump     04/25/94    04/28/94          
Jane Wilson      Forrest Gump     06/01/94      /  /            
Jane Wilson      The Lion King    05/03/94    05/10/94          
John Doe         Apollo 13        04/10/94    04/12/94          
John Doe         Apollo 13        05/15/94    05/18/94          
John Doe         Batman Forever   05/15/94    05/18/94          
John Doe         Batman Forever   06/01/94      /  /            
John Doe         Forrest Gump     04/01/94    04/03/94          
John Doe         Forrest Gump     06/01/94    06/04/94          
John Doe         The Lion King    05/01/94    05/02/94          
John Doe         The Lion King    05/15/94    05/18/94          
John Doe         Wyatt Earp       04/10/94    04/12/94          

Note the use of the tablename.columnname notation, which is necessary as some of the tables have columns with the same name. Here our where condition provided the linkage between the member and rental tables (common memberid column), the linkage between the rental and tape tables (common tapeid column), and the linkage between the tape and catalog tables (common catalogid column). How do we know that these tables need to be linked in this manner? One way is by examining the entity-relationship diagram for this database.

How many tapes were rented by John Doe?

select count(*)
	from rental, member
	where rental.memberid = member.memberid
	and name = "John Doe"

COUNT
 9

Is it valid to refer to members by their names as does this query?

Our supplier Acme Tape Co. has called us to say that they have discovered mice in their warehouse, and that the tapes we purchased during January 1994 have been infected with the deadly hantavirus. We need to contact the members who have rented these tapes.

select member.name, member.address
	from member, rental, tape, supplier
	where member.memberid = rental.memberid
	and rental.tapeid = tape.tapeid
	and tape.supplierid = supplier.supplierid
	and supplier.name = "Acme Tape Co."
	and tape.purchaseda >= 1/1/94 
	and tape.purchaseda < 2/1/94

NAME                  ADDRESS                   
John Doe              123 N. Main               
Jane Wilson           100 N. Broadway           
Jane Wilson           100 N. Broadway           
John Doe              123 N. Main               

If we add the word distinct after the word select, SQL will eliminate the duplicates. It would be helpful if this database included members' telephone numbers.

List our tapes along with the number of copies we have of each tape.

select catalogid, count(*)
	from tape
	group by catalogid

CATALOGID    COUNT
    1         3
    2         3
    3         1
    4         2
    5         1
    6         1
    7         1
    8         1
    9         1
   10         1

Group by tells SQL to create one row of output for each unique value found in the group by column (catalogid in this case). Then, calculate the summary operator for each row. The summary operator in this example counts the number of rows. Each row corresponds to a copy of a tape. But what does catalogid mean? What is the movie title for catalogid number 1?

List our tapes along with the number of copies we have of each tape. This time show the movie titles.

select catalog.title, count(*) 
	from catalog, tape 
	where catalog.catalogid = tape.catalogid 
	group by catalog.title

  TITLE                       COUNT
  Forrest Gump                  3
  Batman Forever                3
  The Lion King                 1
  Apollo 13                     2
  Wyatt Earp                    1
  Tombstone                     1
  City Slickers                 1
  101 Dalmations                1
  Dumb and Dumber               1
  Casablanca                    1

In this case, we join the tape table with the catalog table so that we can display movie titles rather than catalog numbers

Count the number of times that each of our tapes has been rented.

select tape.tapeid, catalog.title, count(*) 
	from catalog, tape, rental 
	where catalog.catalogid = tape.catalogid 
	and tape.tapeid = rental.tapeid 
	group by rental.tapeid

TAPEID TITLE            COUNT
     2 Forrest Gump       1
     3 Forrest Gump       3
     4 Batman Forever     1
     6 Batman Forever     1
     7 The Lion King      3
     8 Apollo 13          1
     9 Apollo 13          1
    10 Wyatt Earp         1
    13 101 Dalmations     1

Count the number of times that each of our movie titles has been rented.

select catalog.catalogid, catalog.title, count(*) 
	from catalog, tape, rental 
	where catalog.catalogid = tape.catalogid 
	and tape.tapeid = rental.tapeid 
	group by catalog.catalogid

CATALOGID TITLE           COUNT
        1 Forrest Gump      4
        2 Batman Forever    2
        3 The Lion King     3
        4 Apollo 13         2
        5 Wyatt Earp        1
        8 101 Dalmations    1

What is the average price paid for a tape from each supplier? How many tapes have we purchased from each supplier?

select supplier.name, count(*), avg(tape.purchasepr) 
	from supplier, tape 
	where supplier.supplierid = tape.supplierid 
	group by supplier.name

  NAME                   COUNT    AVERAGE OF PURCHASEPR
  ABC Tape Co.             7            16.93
  Acme Tape Co.            7            14.99
  Tapes 'R' Us             1            15.45

The group by tells SQL to create one row of output for each supplier name. The summary statistics are calculated for each row.

List the tapes which are currently rented for each member and the due date.

select member.name, catalog.title, rental.daterented, rental.daterented + 4 
	from rental, tape, catalog, member 
	where empty(rental.datereturn) 
	and rental.tapeid = tape.tapeid 
	and tape.catalogid = catalog.catalogid 
	and rental.memberid = member.memberid 
	order by member.name

NAME            TITLE           DATERENTED    EXP_4         
Jane Wilson     Forrest Gump    06/01/94      06/05/94      
John Doe        Batman Forever  06/01/94      06/05/94      

Are there any members who have not yet rented a tape?

select * from member 
	where memberid not in 
		(select distinct memberid from rental)

  MEMBERID NAME         ADDRESS         DATEJOINED
         4 Sue Williams 100 W. Oak      04/10/94  

This query works in this way: The "inner" or nested select (select distinct memberid from rental) selects those members who have rented a tape. That's because the meaning of a row in the rental table is that a certain member rented a certain tape. Given this result, that is, the members who have rented a tape, all we have to do is select the members from the member table who don't appear in this list of members who have rented a tape. The clause where memberid not in selects those rows from the member table where the memberid isn't in the list of members who have rented tapes.

Produce a list of G-rated movies, but include only those movies that a customer (say customer number 2) has not yet rented.

select * from catalog 
	where rating = "G" 
	and catalogid not in 
		(Select catalogid from tape, rental 
			where rental.memberid = 2 
			and tape.tapeid = rental.tapeid)

  CATALOGID TITLE                RATING CHARGECODE
          8 101 Dalmations       G      C         
         10 Casablanca           G      C         

The nested select part of this command generates a list of the catalogid values of tapes that member number 2 has rented. The not in operator selects rows from the catalog table where the catalogid is not in this list.

Members who have rented Forrest Gump have also rented what other movies?.

Select title from catalog
	where catalogid in
		(select distinct catalog.catalogid from catalog, tape, rental
		where tape.catalogid = catalog.catalogid
		and tape.tapeid = rental.tapeid
		and rental.memberid in
			(select distinct member.memberid from member, tape, rental, catalog
			where catalog.title = "Forrest Gump"
			and tape.catalogid = catalog.catalogid
			and tape.tapeid = rental.tapeid
			and rental.memberid = member.memberid))

TITLE
Forrest Gump
Batman Forever
The Lion King
Apollo 13
Wyatt Earp