vendredi 9 janvier 2015

Effect of denormalizing


Here's a scenario where I'm wondering whether to denormalize a relational database (MS SQL).


Text description of the requirement


I have users (stored in a users table).


Users belong to Accounts:



  • An account is created before its users are created

  • Every user must belong to exactly one account (neither no account, nor more than one account)

  • Any account may contain many users

  • A user's assignment to an account is permanent: users are never moved from one account to another


The Users table is predefined so I can't add an "accountId" column to the Users table; so to implement accounts I have:



  • A Users table, with a userId key

  • An Accounts table, with an accountId key


  • A UserAccounts table, with userId and accountId columns (to identify the account associated with each user); the UserAccounts table is constrained:



    • Foreign key relations to Users and Accounts table

    • Uniqueness constraint to ensure that a userId appears no more than once




Because I can't modify the Users table to insert and accountId column in it, I can't ensure that every user has an accountId. And in fact I will insert into the UserAccounts table after inserting into the Users table, though probably in the same transaction.



  • I have a dozen other tables (e.g. Sales) which define various types of thing which belong to users

  • These tables have a userId column (to identify which user each row belongs to)

  • The problem is that I also need to know what account each thing belongs to


Graphic description of the requirement


Here's a reworded version of the same problem (thanks to MichaelT who suggested this rewording).



+-----------+ +--------------+ +--------------+
| Users | | UserAccount | | Account |
+-----------+ +--------------+ +--------------+
+--> userId pk <----+ userId (uniq)| +--> accountId pk |
| | | | accountId +---+ | |
| +-----------+ +--------------+ +--------------+
|
|
| +-----------+
| | Sales |
| +-----------+ (and several other tables like this)
| | saleId pk |
+--+ userId fk |
| |
+-----------+


I can't modify the Users table for various reasons: which is why each user's (single) account is defined using the separate UserAccount table, instead of adding an accountId foreign key to the Users table.


This UserAccount table implements a many-to-one relationship between users and accounts (not a many-to-many relationship). It could be defined using either the following keys:



  • userId as the primary key

  • (userId,accountId) as the primary key plus userId as a unique key


The question


Now for the question:


When doing a query against a Sales table that needs to include the Account information, I think there are two ways to implement knowing which account each thing belongs to:




  1. Join the table to the UserAccounts table (to select the accountId for each userId):



    select S.*, UA.accountId
    from Sales S
    join UserAccount UA on S.userId = UA.userId
    where
    S.something = somethingElse



  2. Denormalize the Sales table[s], by storing the accountId in it as well as the userId (the accountId can then be retrieved from Sales without a join to UserAccount):



    | +--------------+
    | | Sales |
    | +--------------+ (and several other tables like this)
    | | saleId pk |
    +--+ userId fk |
    | accountId fk |
    +--------------+


    If I did the latter I could ensure integrity by defining the userId plus accountId pair/combination as a foreign key constraint into the UserAccounts table (to ensure that the pairing of accountId with userId matches the pairing defined in the UserAccounts table).



    CONSTRAINT Sales_FK FOREIGN KEY (userId,accountId)
    REFERENCES UserAccounts (userId,accountId)



I don't have vast experience with database design: I learned it from books (by Joe Celko).


I'm inhibited against denormalizing in general, but denormalizing seems appropriate here: why an extra JOIN in every SELECT statement (to get the accountId), when the same could be obtained by storing the accountId in the table with a (compound or composite) foreign key.


Is there a compelling reason to use (or to not use) either of these two possibilities?


If not then I'm tempted to use the 2nd method:



  • because it's simpler to define the select statement (without a join)

  • because it's (presumably) slightly better performance (without a join)

  • because it ensures that a corresponding row (with an accountId) exists in the UserAccounts table





Aucun commentaire:

Enregistrer un commentaire