Database Normalization … taming the wild horse

For any application to use database and hence data effectively it is very important that data is organized properly. The most common problem for database is redundancy in data. If table has highly redundant data, that means for every operation (like insert, update or delete) it takes more effort to find correct record. And it can be even troublesome when we simply search for record and we get redundant results which is just waste of space and time. To overcome this problem of redundancy in data, a specific procedure is developed which is called “Normalization” . As we know, in RDBMSes (like SQL Server, Oracle, MySQL) data is stored based on relation. And this relations are used in process of normalization to reduce redundancy in data and hence improving overall performance of system.

Unfortunately, normalization is the process that can be used to properly arrange data is applied during very initial phase of database designing which is called logical modeling. But, most of the time when you realize that you have screwed up and have not properly optimized database it is already late. The concept of normalization was firstly introduced by E.F. Codd, who is known to us as an inventor of relational database management system. Actually he published a paper containing 12 rules to define what is relational database actually meant. All modern DBMSes follow these rules and if we consider while designing database, it can have significant effect later.

Concepts of normalization can be explained with following example. Consider following table which shows student details including chosen subject and advisor for that subject.

Un-Normalized Data

Student#

Advisor

Adv.-Room

Class1

Class2

Class3

1022

Jones

412

101-07

143-01

159-02

4123

Smith

216

201-01

211-02

214-01

First Normalization (1NF) : It is very basic rule of normalization. It dis-allows multivalued attributes & composite attributes. So, above shown table has to be rearranged as shown below.

Student#

Advisor

Adv.-Room

Class#

1022

Jones

412

101-07

1022

Jones

412

143-01

1022

Jones

412

159-02

4123

Smith

216

201-01

4123

Smith

216

211-02

But as you can see that data is still redundant

Second Normalization (2NF) : It states that each non key values must be fully functional dependent on primary key or composite primary key (in case of not having single unique key). And data should also satisfy 1NF. And they should be placed in different table. In above table (1NF), we can see that class# is not fully dependent on primary key (which is student#), so we can create two tables Students, and Registration by separating attributes.

 

students table  

Student#

Advisor

Adv.-Room

1022

Jones

412

4123

Smith

216

 

Registration table

Student#

Class#

1022

101-07

1022

143-01

1022

159-02

4123

201-01

4123

211-02

Third Normalization Form (3NF) : It has been said that, an average human analyzes and arranges data in 3rd normalized form by nature. So, if you look at the design sometimes you may “feel” that something is wrong (but to be honest, it needs more than feeling heart_broken). Third normalization prohibits transitive dependency. Also, data should satisfy 2NF conditions too. For example, in above table, Advisor Room is not directly dependent on Primary key Students. Actually relation is something like, Advisor Room –> Advisor –> Student# , which is a transitive dependency. So, we need to separate that column and put it in different table.

So finally we will have two tables from Student Table, Students and Advisor.

Student:  

Student#

Advisor

1022

Jones

4123

Smith

 

Advisor:  

Advisor

Adv.-Room

Jones

412

Smith

216

 

An so in the end, we finally have 3 tables, Student, Advisor and Registration. And as we can see, the data which was once highly redundant, is now properly arranged and less redundant. Which naturally solved problem we discussed at the beginning of this post, like insert, update, delete and ofcourse search.

Normal forms can be informally defined as, the key, the whole key and nothing but the key. (does it remind of something ??)

Actually we don’t need to further normalize our data, but if there further normalization techniques are available, which are called, Fourth Normalized form (4NF), Fifth Normalized Form (5NF), Boyce-Codd Normal Form (BCNF) and Sixth Normalization Form (6NF). But as a note you should remember that, as data gets highly normalized number of joins to get required results gets increased and that can seriously raise response time (as joins are in general very expensive operations.), so in general we don’t see normalization beyond 3NF. Even in some cases when allowed, Database Architects prefer to denormalize (reversing normalization by joining tables) which may improve performance with introduction to redundant data.

This is enough for today.

Its Just A Thought … fingerscrossed

Gaurang Sign

Leave a Reply

Your email address will not be published. Required fields are marked *