LINQ Mysteries: The Distinct Function

Recently I had to use the Distinct function introduced in LINQ. My surprise was that depending on where you put the Distinct clause you will receive different results.

Let us take the following example: Let dtAnswers be a DataTable that has two columns, named answer_value and answer_comment. What I was seeking as a result was to return the count of the different answer_value values from all the rows. So knowing how it is done in SQL, I’ve wrote the following LINQ query in code:

Dim nDifferentValues As Integer = _
(From answer in dtAnswers.Rows _
Distinct Select answer("answer_value").ToString()).Count()

My surprise was that this will return always the same thing, no matter what rows I have and what values I have in the answer_value column. So after struggling several hours I’ve decided to try the Function syntax of LINQ:

Dim nDifferentValues As Integer = _
(From answer in dtAnswers.Rows _
Select answer("answer_value").ToString()).Distinct.Count()

Now this returns results correctly.
My guess is that the first query computes distinct on the rows first (by reference) and after that from the result selects answer("answer_value").ToString(). While the second query first selects answer("answer_value").ToString() and after that computes distinct.
So be careful where you put you LINQ functions 😉