samedi 3 janvier 2015

Effect of denormalizing


Here's a scenario where I'm wondering whether to denormalize a database.


I have users (stored in a users table).


Users belong to Accounts:



  • An account is created before its users are created

  • Every user must belong to one account

  • Any account may contain many users


The Users table is predefined so I can't add an "accountId" column to the Users table; so to implement accounts I have:



  • A Users table with a userId key

  • An Accounts table with an accountId key

  • A UserAccounts table with userId and accountId columns (to identify the account associated with each user)


The UserAccounts table is constrained:



  • Foreign key relations to Users and Accounts table

  • Uniqueness constraint to ensure that a userId appears no more than once


Because I can't modify the Users table to insert and accountId column in it, I can't ensure that every user has an accountId. And in fact I will insert into the UserAccounts table after inserting into the Users table, though probably in the same transaction.


Now for the question:



  • I have a dozen other tables (e.g. Sales) which define various types of thing which belong to users

  • These tables have a userId column (to identify which user each row belongs to)

  • The problem is that I also need to know what account each thing belongs to


There are two ways to implement knowing which account each thing belongs to:



  1. Join the table to the UserAccounts table (to select the accountId for each userId)

  2. Denormalize by storing the accountId as well as the userId in each table


If I did the latter I could ensure integrity by defining the userId plus accountId pair/combination as a foreign key constraint into the UserAccounts table (to ensure that the pairing of accountId with userId matches the pairing defined in the UserAccounts table).


I don't have vast experience with database design: I learned it from books (by Joe Celko).


I'm inhibited against denormalizing in general, but denormalizing seems appropriate here: why an extra JOIN in every SELECT statement (to get the accountId), when the same could be obtained by storing the accountId in the table with a (compound or composite) foreign key.


Is there a compelling reason to use (or to not use) either of these two possibilities?


If not then I'm tempted to use the 2nd method:



  • because it's simpler to define the select statement (without a join)

  • because it's (presumably) slightly better performance (without a join)

  • because it ensures that a corresponding row (with an accountId) exists





Aucun commentaire:

Enregistrer un commentaire