Pivot tables in PostgreSQL

A common complaint with relational databases is their lack of flexibility. This ‘lack’ of flexibility has been one of the drivers behind the NoSQL movement. As it turns out, though, there are tools and tricks that can be used with existing RDBMSs that could help you leverage the maturity, tool support, and performance of these systems while still achieving the flexibility you might need with your application. One such trick is the usage of pivot tables.

Pivot tables are a querying strategy that allows you to ‘flatten’ multiple rows into multiple rows of multiple columns. For example, consider the following two table definitions:

CREATE TABLE user_account (
   id SERIAL,
   email TEXT
);

CREATE TABLE user_attribute (
   user_account_id BIGINT,
   attribute_type TEXT,
   value TEXT
);

Using these two tables, we can, at least temporarily, punt on finding an ideal table representation for all users ever. Sure, we’re assuming all user_accounts will have an email, but for the sake of this example I want an excuse to do a join, so let it be. ;)

This is a very data-driven approach, where all one has to do to extend a conceptual object (in this case a user) is to add an attribute/property to it. If you don’t need it right now, your code doesn’t even need to change, but you can store it. You could even provide non-programmers with an interface into this system for extending your schema without a single bit of SQL anywhere in sight.

Here’s a simple user:

INSERT INTO user_account (email) VALUES ('foo@example.com');
INSERT INTO user_attribute
     (user_account_id, attribute_type, value)
   VALUES
     (1,'first-name','John'),
     (1,'last-name','Doe'),
     (1,'title','Dr');

Of course, now the problem is constructing a complete ‘view’ for these users that includes these attributes. We’d like to have one row per user, with all its attributes in individual columns. A basic join won’t really get us what we want, though:

db=# select id,email,attribute_type,value from user_account inner join user_attribute on id = user_account_id where email = 'foo@example.com';
 id |      email      | attribute_type | value
----+-----------------+----------------+-------
  1 | foo@example.com | first-name     | John
  1 | foo@example.com | last-name      | Doe
  1 | foo@example.com | title          | Dr
(3 rows)

Enter pivot tables. There’s a number of ways to do them. My favorite involves the CASE statement combined with an aggregate and GROUP BY. Here’s how it would apply to our example:

SELECT -- Grab the user_account columns.
       acc.id,
       acc.email,
       -- Use max() combined with a case statement to
       -- usher individual attribute values into columns.
       max(CASE WHEN (attr.attribute_type = 'first-name')
                THEN attr.value END) AS fname,
       max(CASE WHEN (attr.attribute_type = 'last-name')
                THEN attr.value END) AS lname,
       max(CASE WHEN (attr.attribute_type = 'title')
                THEN attr.value END) AS title
   FROM user_account AS acc
   -- Join the attribute table /once/.
   LEFT JOIN user_attribute AS attr
     ON (attr.user_account_id = acc.id)
   WHERE (acc.email = 'foo@example.com')
   -- Group by the non-pivoted columns
   GROUP BY acc.id,acc.email;

Which gives us the single-row results we wanted:

 id |      email      | fname | lname | title
----+-----------------+-------+-------+-------
  1 | foo@example.com | John  | Doe   | Dr
(1 row)

And there you have it. Of course, it’s not the prettiest way to generate this kind of output. Still, it’s a lot better than doing multiple queries and combining results in your application, and it would be easy to convert these queries into super-fast materialized views if performance ever becomes an issue.

Fortunately, PostgreSQL ships with a tablefunc module with a utility that makes this sort of data transformation easy — and even converts data types for you! You can read more about the crosstab() function on the Postgres documentation page for tablefunc. That page also includes a number of compelling use cases for pivot tables.

I’ve used pivot tables in applications I’ve worked on to do things such as pivoting on a large variety of external id codes as well as to define a relatively flexible name-specification system for a game. Have you used a similar trick with SQL? Do other SQL systems have similar utilities for generating pivot tables?

This entry was posted in programming and tagged , , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>