What's Wrong With This Code (#6)

Joe Developer is working on a new application for a book publisher. Authors can publish zero or more books. Books can have zero or more authors.

Book and Author schema

Joe wrote a query to get a total count of all authors, and a total count of all books. Joe read on the Internet that the DISTINCT keyword is good to use in these scenarios.

SELECT DISTINCT
  COUNT(Authors.Id) AS TotalAuthors,
  COUNT(Books.Id) As TotalBooks
FROM
  Authors  
  FULL JOIN AuthorsBooks AB ON Authors.Id = AB.AuthorID
  FULL JOIN Books ON AB.BookID = Books.Id

The problem is - the numbers seem too high. What's wrong? Is it easy to fix?

posted on Monday, September 25, 2006 11:26 PM by scott

Comments

Monday, September 25, 2006 9:11 PM by Eric W. Bachtal

# re: What's Wrong With This Code (#6)

DISTINCT at the SELECT level will only ensure that unique rows appear in the result set. In this example, DISTINCT should be used within each COUNT() to ensure that duplicates are eliminated before the counts are calculated:

SELECT
COUNT(DISTINCT Authors.Id) AS TotalAuthors,
COUNT(DISTINCT Books.Id) As TotalBooks
FROM
Authors
FULL JOIN AuthorsBooks AB ON Authors.Id = AB.AuthorID
FULL JOIN Books ON AB.BookID = Books.Id
Monday, September 25, 2006 10:00 PM by Julian Kuiters

# re: What's Wrong With This Code (#6)

DISTINCT should be place within the count function, this way you are only counting the distinct values within each column.

SELECT
COUNT(DISTINCT Authors.Id) AS TotalAuthors,
COUNT(DISTINCT Books.Id) As TotalBooks
FROM
Authors
FULL JOIN AuthorsBooks AB ON Authors.Id = AB.AuthorID
FULL JOIN Books ON AB.BookID = Books.Id

Monday, September 25, 2006 10:17 PM by scott

# re: What's Wrong With This Code (#6)

Julian, Eric: Correct!

I also updated the post to fix a small typo (not in the code).
Monday, September 25, 2006 10:41 PM by Wilhelm Svenselius

# re: What's Wrong With This Code (#6)

Um, I might be missing something, but wouldn't it be much (much!!) faster and simpler to just SELECT COUNT(*) from the Authors and Books tables in order to get the author and book counts?
Tuesday, September 26, 2006 6:26 AM by scott

# re: What's Wrong With This Code (#6)

It could be, Wilhelm [but then it wouldn't be interesting to look at for bugs :)]
Tuesday, September 26, 2006 9:04 AM by Rick Glos

# re: What's Wrong With This Code (#6)

In fact, I think AuthorsBooks is just confusing the developer. If you want 1 resultset containing a count of authors and a count of books the query would be better written like this:

-- use count(1) not count(*) to speed up
select
(select count(1) from authors) as [AuthorsCount]
, (select count(1) from books) as [BooksCount]
Tuesday, September 26, 2006 1:56 PM by yaip

# re: What's Wrong With This Code (#6)

I am a bit ignorant about this. What tool have you used for the ER diagram?
Tuesday, September 26, 2006 7:12 PM by Christopher Steen

# Link Listing - September 26, 2006

NHibernate 1.2 Beta1 Released! [Via: Ayende Rahien ] WSCF 0.7: Looking for three *thorough* testers!...
Tuesday, September 26, 2006 9:05 PM by scott

# re: What's Wrong With This Code (#6)

@ Rick: Good one, I like that.

@ yaip: This is the database diagram tool in SQL Server 2005. It is similar to the same tool in SQL 2000, just a slightly different look.
Monday, October 02, 2006 12:11 PM by marshall

# re: What's Wrong With This Code (#6)

@rick - count(*) and count(1) make no difference. unless that's an access thing of course but turn all the profiling options on in sqlserver and you'll see no difference.

just trying to squash that myth