Cannot directly use the IF function on a MySQL query that uses GROUP BY
Published:
| by Julian Knight Reading time ~6 min.
📖 Posts
| 📎
Development
| 🔖
MySQL, SQL
Cannot directly use the IF
function on a query that uses GROUP BY
.
This tip provides an alternative that is cross-db. It is extracted from:
http://surfnet.dl.sourceforge.net/sourceforge/souptonuts/README_mysql.txt
SQL Characteristic Functions: Do it without if
, case
, or GROUP_CONCAT
.
Yes, there is use for this … if
statements sometimes cause problems when used in combination.
The simple secret, and it’s also why they work in almost all databases, is the following functions:
sign (x) returns -1,0, +1 for values x < 0, x = 0, x > 0
respectivelyabs( sign( x) ) returns 0 if x = 0 else, 1 if x > 0 or x < 0
1-abs( sign( x) )
complement of the above, since this returns 1 only ifx = 0
Quick example:
sign(-1) = -1, abs( sign(-1) ) = 1, 1-abs(sign(-1) ) = 0
Data for full example:
CREATE TABLE exams (
pkey int(11) NOT NULL auto_increment,
name varchar(15),
exam int,
score int,
PRIMARY KEY (pkey)
);
insert into exams (name,exam,score) values ('Bob',1,75);
insert into exams (name,exam,score) values ('Bob',2,77);
insert into exams (name,exam,score) values ('Bob',3,78);
insert into exams (name,exam,score) values ('Bob',4,80);
insert into exams (name,exam,score) values ('Sue',1,90);
insert into exams (name,exam,score) values ('Sue',2,97);
insert into exams (name,exam,score) values ('Sue',3,98);
insert into exams (name,exam,score) values ('Sue',4,99);
mysql> select * from exams;
+------+------+------+-------+
| pkey | name | exam | score |
+------+------+------+-------+
| 1 | Bob | 1 | 75 |
| 2 | Bob | 2 | 77 |
| 3 | Bob | 3 | 78 |
| 4 | Bob | 4 | 80 |
| 5 | Sue | 1 | 90 |
| 6 | Sue | 2 | 97 |
| 7 | Sue | 3 | 98 |
| 8 | Sue | 4 | 99 |
+------+------+------+-------+
8 rows in set (0.00 sec)
mysql> select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4
from exams group by name;
+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob | 75 | 77 | 78 | 80 |
| Sue | 90 | 97 | 98 | 99 |
+------+-------+-------+-------+-------+
2 rows in set (0.00 sec)
Note, the above pivot table was created with one select statement You may think IF’s would be cleaner. NO, WATCH OUT!
Look the following gives INCORRECT RESULTS!
select name,
if(exam=1,score,null) as exam1,
if(exam=2,score,null) as exam2,
if(exam=3,score,null) as exam3,
if(exam=4,score,null) as exam4
from exams group by name;
+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob | 75 | NULL | NULL | NULL |
| Sue | 90 | NULL | NULL | NULL |
+------+-------+-------+-------+-------+
2 rows in set (0.00 sec)
Note the above gives indeterminate results. Paul DuBois [ paul at snake.net ] showed me the correct way to
perform this select statement. According to him
When you include a GROUP BY
clause in a query, the only values you
can select are the grouped columns or summary values calculated
from the groups. If you display additional columns, they’re not
tied to the grouped columns and the values displayed for them are
indeterminate. If you rewrite the query like this, you get the correct result:
select name,
sum(if(exam=1,score,null)) as exam1,
sum(if(exam=2,score,null)) as exam2,
sum(if(exam=3,score,null)) as exam3,
sum(if(exam=4,score,null)) as exam4
from exams group by name;
+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob | 75 | 77 | 78 | 80 |
| Sue | 90 | 97 | 98 | 99 |
+------+-------+-------+-------+-------+
mysql> select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
sum(score*(1-abs(sign(exam- 2)))) -
sum(score*(1-abs(sign(exam- 1)))) as delta_1_2,
sum(score*(1-abs(sign(exam- 3)))) -
sum(score*(1-abs(sign(exam- 2)))) as delta_2_3,
sum(score*(1-abs(sign(exam- 4)))) -
sum(score*(1-abs(sign(exam- 3)))) as delta_3_4
from exams group by name;
+------+-------+-------+-------+-------+-----------+-----------+-----------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 |delta_3_4 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+
| Bob | 75 | 77 | 78 | 80 | 2 | 1 |2 |
| Sue | 90 | 97 | 98 | 99 | 7 | 1 |1 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+
2 rows in set (0.00 sec)
Above delta_1_2 shows the difference between the first and second exams, with the numbers
being positive because both Bob and Sue improved their score with each exam. Calculating
the deltas here shows it’s possible to compare two rows, not columns which is easily done
with the standard SQL statements but rows in the original table.
mysql>select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
sum(score*(1-abs(sign(exam- 2)))) - sum(score*(1-abs(sign(exam-1)))) as delta_1_2,
sum(score*(1-abs(sign(exam- 3)))) - sum(score*(1-abs(sign(exam-2)))) as delta_2_3,
sum(score*(1-abs(sign(exam- 4)))) - sum(score*(1-abs(sign(exam-3)))) as delta_3_4,
sum(score*(1-abs(sign(exam- 2)))) - sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam- 3)))) - sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam- 4)))) - sum(score*(1-abs(sign(exam-3)))) as TotalIncPoints
from exams group by name;
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 |delta_3_4 | TotalIncPoints |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
| Bob | 75 | 77 | 78 | 80 | 2 | 1 |2 | 5 |
| Sue | 90 | 97 | 98 | 99 | 7 | 1 |1 | 9 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
2 rows in set (0.00 sec)
TotalIncPoints shows the sum of the deltas.
select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
sum(score*(1-abs(sign(exam- 2)))) - sum(score*(1-abs(sign(exam-1)))) as delta_1_2,
sum(score*(1-abs(sign(exam- 3)))) - sum(score*(1-abs(sign(exam-2)))) as delta_2_3,
sum(score*(1-abs(sign(exam- 4)))) - sum(score*(1-abs(sign(exam-3)))) as delta_3_4,
sum(score*(1-abs(sign(exam- 2)))) - sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam- 3)))) - sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam- 4)))) - sum(score*(1-abs(sign(exam-3)))) as TotalIncPoints,
(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/4 as AVG
from exams group by name;
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 |delta_3_4 | TotalIncPoints | AVG |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
| Bob | 75 | 77 | 78 | 80 | 2 | 1 |2 | 5 | 77.50 |
| Sue | 90 | 97 | 98 | 99 | 7 | 1 |1 | 9 | 96.00 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
2 rows in set (0.00 sec)
It’s possible to combine Total Increasing Point TotalIncPoints with AVG. In fact, it’s possible to combine all of the example cuts of the data into one SQL statement, which provides additional options for displaying data on your page
select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))))/2 as AVG1_2,
(sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))))/2 as AVG2_3,
(sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/2 as AVG3_4,
(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/4 as AVG
from exams group by name;
+------+-------+-------+-------+-------+--------+--------+--------+-------+
| name | exam1 | exam2 | exam3 | exam4 | AVG1_2 | AVG2_3 | AVG3_4 | AVG|
+------+-------+-------+-------+-------+--------+--------+--------+-------+
| Bob | 75 | 77 | 78 | 80 | 76.00 | 77.50 | 79.00 |77.50 |
| Sue | 90 | 97 | 98 | 99 | 93.50 | 97.50 | 98.50 |96.00 |
+------+-------+-------+-------+-------+--------+--------+--------+-------+
2 rows in set (0.00 sec)
Exam scores are listing along with moving averages…again it’s all with one select statement.