Programmers: Effect of denormalizing

vendredi 9 janvier 2015

Effect of denormalizing

Here's a scenario where I'm wondering whether to denormalize a relational database (MS SQL).

Text description of the requirement

I have users (stored in a users table).

Users belong to Accounts:

An account is created before its users are created

Every user must belong to exactly one account (neither no account, nor more than one account)

Any account may contain many users

A user's assignment to an account is permanent: users are never moved from one account to another

The Users table is predefined so I can't add an "accountId" column to the Users table; so to implement accounts I have:

A Users table, with a userId key

An Accounts table, with an accountId key

A UserAccounts table, with userId and accountId columns (to identify the account associated with each user); the UserAccounts table is constrained:
- Foreign key relations to Users and Accounts table
- Uniqueness constraint to ensure that a userId appears no more than once

Because I can't modify the Users table to insert and accountId column in it, I can't ensure that every user has an accountId. And in fact I will insert into the UserAccounts table after inserting into the Users table, though probably in the same transaction.

I have a dozen other tables (e.g. Sales) which define various types of thing which belong to users

These tables have a userId column (to identify which user each row belongs to)

The problem is that I also need to know what account each thing belongs to

Graphic description of the requirement

Here's a reworded version of the same problem (thanks to MichaelT who suggested this rewording).


   +-----------+    +--------------+      +--------------+
   | Users     |    | UserAccount  |      | Account      |
   +-----------+    +--------------+      +--------------+
+--> userId pk <----+ userId (uniq)|   +--> accountId pk |
|  |           |    | accountId    +---+  |              |
|  +-----------+    +--------------+      +--------------+
|                                                       
|                                                       
|  +-----------+                                        
|  | Sales     |                                        
|  +-----------+  (and several other tables like this)  
|  | saleId pk |                                        
+--+ userId fk |                                        
   |           |                                        
   +-----------+

I can't modify the Users table for various reasons: which is why each user's (single) account is defined using the separate UserAccount table, instead of adding an accountId foreign key to the Users table.

This UserAccount table implements a many-to-one relationship between users and accounts (not a many-to-many relationship). It could be defined using either the following keys:

userId as the primary key

(userId,accountId) as the primary key plus userId as a unique key

The question

Now for the question:

When doing a query against a Sales table that needs to include the Account information, I think there are two ways to implement knowing which account each thing belongs to:

Join the table to the UserAccounts table (to select the accountId for each userId):


select S.*, UA.accountId
from Sales S
    join UserAccount UA on S.userId = UA.userId
where
    S.something = somethingElse

Denormalize the Sales table[s], by storing the accountId in it as well as the userId (the accountId can then be retrieved from Sales without a join to UserAccount):


|  +--------------+                                        
|  | Sales        |                                        
|  +--------------+  (and several other tables like this)  
|  | saleId pk    |                                        
+--+ userId fk    |                                        
   | accountId fk |                                        
   +--------------+

If I did the latter I could ensure integrity by defining the userId plus accountId pair/combination as a foreign key constraint into the UserAccounts table (to ensure that the pairing of accountId with userId matches the pairing defined in the UserAccounts table).


CONSTRAINT Sales_FK FOREIGN KEY (userId,accountId)
    REFERENCES UserAccounts (userId,accountId)

I don't have vast experience with database design: I learned it from books (by Joe Celko).

I'm inhibited against denormalizing in general, but denormalizing seems appropriate here: why an extra JOIN in every SELECT statement (to get the accountId), when the same could be obtained by storing the accountId in the table with a (compound or composite) foreign key.

Is there a compelling reason to use (or to not use) either of these two possibilities?

If not then I'm tempted to use the 2nd method:

because it's simpler to define the select statement (without a join)

because it's (presumably) slightly better performance (without a join)

because it ensures that a corresponding row (with an accountId) exists in the UserAccounts table

Programmers

vendredi 9 janvier 2015

Effect of denormalizing

Text description of the requirement

Graphic description of the requirement

The question

Aucun commentaire:

Enregistrer un commentaire