Cannot directly use the IF function on a MySQL query that uses GROUP BY

Published: | by Julian Knight Reading time ~6 min.
📖 Posts | 📎 Development | 🔖 MySQL, SQL

Cannot directly use the IF function on a query that uses GROUP BY.

This tip provides an alternative that is cross-db. It is extracted from:

http://surfnet.dl.sourceforge.net/sourceforge/souptonuts/README_mysql.txt SQL Characteristic Functions: Do it without if, case, or GROUP_CONCAT.

Yes, there is use for this … if statements sometimes cause problems when used in combination. The simple secret, and it’s also why they work in almost all databases, is the following functions:

  • sign (x) returns -1,0, +1 for values x < 0, x = 0, x > 0 respectively
  • abs( sign( x) ) returns 0 if x = 0 else, 1 if x > 0 or x < 0
  • 1-abs( sign( x) ) complement of the above, since this returns 1 only if x = 0 Quick example:
sign(-1) = -1,  abs( sign(-1) ) = 1,  1-abs(sign(-1) ) = 0

Data for full example:

CREATE TABLE exams (
pkey int(11) NOT NULL auto_increment,
name varchar(15),
exam int,
score int,
PRIMARY KEY  (pkey)
);
insert into exams (name,exam,score) values ('Bob',1,75);
insert into exams (name,exam,score) values ('Bob',2,77);
insert into exams (name,exam,score) values ('Bob',3,78);
insert into exams (name,exam,score) values ('Bob',4,80);
insert into exams (name,exam,score) values ('Sue',1,90);
insert into exams (name,exam,score) values ('Sue',2,97);
insert into exams (name,exam,score) values ('Sue',3,98);
insert into exams (name,exam,score) values ('Sue',4,99);
mysql&gt; select * from exams;
+------+------+------+-------+
| pkey | name | exam | score |
+------+------+------+-------+
|    1 | Bob  |    1 |    75 |
|    2 | Bob  |    2 |    77 |
|    3 | Bob  |    3 |    78 |
|    4 | Bob  |    4 |    80 |
|    5 | Sue  |    1 |    90 |
|    6 | Sue  |    2 |    97 |
|    7 | Sue  |    3 |    98 |
|    8 | Sue  |    4 |    99 |
+------+------+------+-------+
8 rows in set (0.00 sec)
mysql&gt; select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4
from exams group by name;
+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob  |    75 |    77 |    78 |    80 |
| Sue  |    90 |    97 |    98 |    99 |
+------+-------+-------+-------+-------+
2 rows in set (0.00 sec)

Note, the above pivot table was created with one select statement You may think IF’s would be cleaner. NO, WATCH OUT!

Look the following gives INCORRECT RESULTS!

select name,
if(exam=1,score,null) as exam1,
if(exam=2,score,null) as exam2,
if(exam=3,score,null) as exam3,
if(exam=4,score,null) as exam4
from exams group by name;
+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob  |    75 |  NULL |  NULL |  NULL |
| Sue  |    90 |  NULL |  NULL |  NULL |
+------+-------+-------+-------+-------+
2 rows in set (0.00 sec)

Note the above gives indeterminate results. Paul DuBois [ paul at snake.net ] showed me the correct way to

perform this select statement. According to him When you include a GROUP BY clause in a query, the only values you

can select are the grouped columns or summary values calculated

from the groups. If you display additional columns, they’re not

tied to the grouped columns and the values displayed for them are

indeterminate. If you rewrite the query like this, you get the correct result:

select name,
sum(if(exam=1,score,null)) as exam1,
sum(if(exam=2,score,null)) as exam2,
sum(if(exam=3,score,null)) as exam3,
sum(if(exam=4,score,null)) as exam4
from exams group by name;
+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob  |    75 |    77 |    78 |    80 |
| Sue  |    90 |    97 |    98 |    99 |
+------+-------+-------+-------+-------+
mysql&gt; select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
sum(score*(1-abs(sign(exam- 2)))) -
sum(score*(1-abs(sign(exam- 1)))) as delta_1_2,
sum(score*(1-abs(sign(exam- 3)))) -
sum(score*(1-abs(sign(exam- 2)))) as delta_2_3,
sum(score*(1-abs(sign(exam- 4)))) -
sum(score*(1-abs(sign(exam- 3)))) as delta_3_4
from exams group by name;
+------+-------+-------+-------+-------+-----------+-----------+-----------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 |delta_3_4 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+
| Bob  |    75 |    77 |    78 |    80 |         2 |         1 |2 |
| Sue  |    90 |    97 |    98 |    99 |         7 |         1 |1 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+
2 rows in set (0.00 sec)

Above delta_1_2 shows the difference between the first and second exams, with the numbers

being positive because both Bob and Sue improved their score with each exam. Calculating

the deltas here shows it’s possible to compare two rows, not columns which is easily done

with the standard SQL statements but rows in the original table.

mysql&gt;select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam-1)))) as delta_1_2,
sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam-2)))) as delta_2_3,
sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam-3)))) as delta_3_4,
sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam-1))))  +
sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam-2))))  +
sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam-3))))  as TotalIncPoints
from exams group by name;
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 |delta_3_4 | TotalIncPoints |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
| Bob  |    75 |    77 |    78 |    80 |         2 |         1 |2 |              5 |
| Sue  |    90 |    97 |    98 |    99 |         7 |         1 |1 |              9 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
2 rows in set (0.00 sec)

TotalIncPoints shows the sum of the deltas.

select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam-1)))) as delta_1_2,
sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam-2)))) as delta_2_3,
sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam-3)))) as delta_3_4,
sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam-1))))  +
sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam-2))))  +
sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam-3))))  as TotalIncPoints,
(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/4 as AVG
from exams group by name;
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 |delta_3_4 | TotalIncPoints | AVG   |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
| Bob  |    75 |    77 |    78 |    80 |         2 |         1 |2 |              5 | 77.50 |
| Sue  |    90 |    97 |    98 |    99 |         7 |         1 |1 |              9 | 96.00 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
2 rows in set (0.00 sec)

It’s possible to combine Total Increasing Point TotalIncPoints with AVG. In fact, it’s possible to combine all of the example cuts of the data into one SQL statement, which provides additional options for displaying data on your page

select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))))/2  as AVG1_2,
(sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))))/2 as AVG2_3,
(sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/2 as AVG3_4,
(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/4 as AVG
from exams group by name;
+------+-------+-------+-------+-------+--------+--------+--------+-------+
| name | exam1 | exam2 | exam3 | exam4 | AVG1_2 | AVG2_3 | AVG3_4 | AVG|
+------+-------+-------+-------+-------+--------+--------+--------+-------+
| Bob  |    75 |    77 |    78 |    80 |  76.00 |  77.50 |  79.00 |77.50 |
| Sue  |    90 |    97 |    98 |    99 |  93.50 |  97.50 |  98.50 |96.00 |
+------+-------+-------+-------+-------+--------+--------+--------+-------+
2 rows in set (0.00 sec)

Exam scores are listing along with moving averages…again it’s all with one select statement.


comments powered by Disqus