2020-01-10
|~4 min read
|627 words
I have a table that represents a many to one relationship. How, then, do I find out which of the “ones” have the most “manys”?
Postgres has a few built in utilities that when combined make this quite simple.
Consider the example of a table that tracks all media (images, videos, etc.) that are owned by a particular user.
The media table might look something like this:
| media_id | media_type | user_id | …other details |
|---|---|---|---|
| 1 | image | ‘abc123’ | |
| 2 | image | ‘abc123’ | |
| 3 | image | ‘abc123’ | |
| 4 | image | ‘abc123’ | |
| 5 | ‘abc123’ | ||
| 6 | video | ‘abc123’ | |
| … | … | … | |
| 10000001 | image | ‘xyz789’ | |
| 10000002 | video | ‘xyz789’ |
How might I figure out which user has the most images?
TutorialsPoint.com describes the “Group By” clause in the following way:
The PostgreSQL GROUP BY clause is used in collaboration with the SELECT statement to group together those rows in a table that have identical data. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups.
The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the ORDER BY clause.
Note: a GROUP BY clause requires an aggregate function (e.g., SUM, COUNT, etc.).
So, a simple use of GROUP BY would look like:
SELECT COUNT(*) FROM media GROUP BY user_id;The results would look something like:
| count |
|---|
| 1 |
| 72 |
| 24 |
| 13 |
| … |
| 9 |
Okay! This is a good start. We’ve aggregated all of the media entries by the user, but have failed to do some useful things like:
Let’s fix these issues now.
In my case, I want to sort the counts in a descending fashion. To do that, we’ll alias the count and then order by (note, ORDER BY follows the GROUP BY clause per the TutorialsPoint description above):
SELECT COUNT(*) as media_count FROM media GROUP BY user_id ORDER BY media_count;| media_count |
|---|
| 72 |
| 24 |
| 13 |
| 9 |
| … |
| 1 |
Better! We’re in order - and we have some useful context for what we’re counting because of the alias.
One issue down, two to go. Let’s identify the user next:
SELECT user_id, COUNT(*) as media_count FROM media GROUP BY user_id ORDER BY media_count;| user | media_count |
|---|---|
| ‘abc123’ | 72 |
| ‘def827’ | 24 |
| ‘has879’ | 13 |
| ‘zed127’ | 9 |
| … | … |
| ‘xyz789’ | 1 |
Okay! Last step, let’s limit the type of media to only be images!
SELECT user_id, COUNT(*) as image_count FROM media WHERE media_type='image' GROUP BY user_id ORDER BY image_count;| user | media_count |
|---|---|
| ‘abc123’ | 41 |
| ‘has879’ | 13 |
| ‘def827’ | 5 |
| ‘zed127’ | 5 |
| … | … |
| ‘xyz789’ | 1 |
Et voilá! I now have a sorted list grouped by user and filtered to only include the relevant records.
It’s worth noting that this is not a particularly optimized query and can take a while when the tables are big.
Hi there and thanks for reading! My name's Stephen. I live in Chicago with my wife, Kate, and dog, Finn. Want more? See about and get in touch!