Programmers: Effect of denormalizing

samedi 3 janvier 2015

Effect of denormalizing

Here's a scenario where I'm wondering whether to denormalize a database.

I have users (stored in a users table).

Users belong to Accounts:

An account is created before its users are created

Every user must belong to one account

Any account may contain many users

The Users table is predefined so I can't add an "accountId" column to the Users table; so to implement accounts I have:

A Users table with a userId key

An Accounts table with an accountId key

A UserAccounts table with userId and accountId columns (to identify the account associated with each user)

The UserAccounts table is constrained:

Foreign key relations to Users and Accounts table

Uniqueness constraint to ensure that a userId appears no more than once

Because I can't modify the Users table to insert and accountId column in it, I can't ensure that every user has an accountId. And in fact I will insert into the UserAccounts table after inserting into the Users table, though probably in the same transaction.

Now for the question:

I have a dozen other tables (e.g. Sales) which define various types of thing which belong to users

These tables have a userId column (to identify which user each row belongs to)

The problem is that I also need to know what account each thing belongs to

There are two ways to implement knowing which account each thing belongs to:

Join the table to the UserAccounts table (to select the accountId for each userId)

Denormalize by storing the accountId as well as the userId in each table

If I did the latter I could ensure integrity by defining the userId plus accountId pair/combination as a foreign key constraint into the UserAccounts table (to ensure that the pairing of accountId with userId matches the pairing defined in the UserAccounts table).

I don't have vast experience with database design: I learned it from books (by Joe Celko).

I'm inhibited against denormalizing in general, but denormalizing seems appropriate here: why an extra JOIN in every SELECT statement (to get the accountId), when the same could be obtained by storing the accountId in the table with a (compound or composite) foreign key.

Is there a compelling reason to use (or to not use) either of these two possibilities?

If not then I'm tempted to use the 2nd method:

because it's simpler to define the select statement (without a join)

because it's (presumably) slightly better performance (without a join)

because it ensures that a corresponding row (with an accountId) exists

Programmers

samedi 3 janvier 2015

Effect of denormalizing

Aucun commentaire:

Enregistrer un commentaire