A relational database is a digital database whose organization is based on the relational model of data, as proposed by E.F. Codd in 1970. This model organizes data into one or more tables (or "relations") of rows and columns, with a unique key for each row. Generally, each entity type described in a database has its own table, the rows representing instances of that type of entity and the columns representing values attributed to that instance. Because each row in a table has its own unique key, rows in a table can be linked to rows in other tables by storing the unique key of the row to which it should be linked (where such unique key is known as a "foreign key"). Codd showed that data relationships of arbitrary complexity can be represented using this simple set of concepts.
The definition goes on to explain the differences with hierarchical data structures, etc. Perhaps technically correct but doesn't tell the whole story. To me the relational database is used to define how logical subsets of data are related and how they defend the integrity of the business model the database supports.
Data TablesIn general the tables in a database should be "as little as possible" consisting of the least number of columns possible to define a unique record. The link to other tables defines fundamental relationships between the data - like "parent-child" for example. When constructed the entire database defines not only these relationships, but how data flows through the business process that supports it. A well designed database defines the entire business model and can accommodate changes and additions with minor modifications. This can happen when the designer spends enough time with his or her feet on the ground to understand the business process, and creates data structure that is granular - almost molecular - in it's composition. This takes the most time to create, but also creates the most flexible and long-lasting structure. There are many other considerations, but there is no substitute for the really hard work of defining the business model with the database.
A relational database is not a spreadsheet - or a collection of spreadsheets. A spreadsheet makes sense for 2-dimensional representation of data and is used primarily to inform the human eye. It works great for the human eye because we can quickly relate to the two-dimensions and peer down into the individual data pieces. But it's not efficient for a computing engine - something that is designed to find and extract pieces of information as quickly as possible. The example below shows how the eye can quickly find a measurement by triangulating between dates in the rows and the instrument in the columns. Suppose you were asked to find OW-12 on 7/20/94:
While the eye can do this in an instant, it's very inefficient form of data storage. One way
to understand why is to look at the column headers. They are unique for each instrument so they essentially require a custom data structure. If you add a new measurement, you change the table structure. Every time you search you don't know which column the result will be in. Compare that structure with the following.
The table above would be used in a relational database to store the data shown in the spreadsheet. The data has been 'normalized' to minimize data redundancy and provide an efficient search path. It consists of only three columns no matter how many instruments you have. The first two columns define a unique record - the date and the sensor name. To find the record we found in the spreadsheet we work from left to right to find the day, then sensor, and then the value. Not as easy for the eye perhaps, but easier for a database.
So how is this structure used to define a Business Model? By first defining what constitutes a unique record in a table (called Primary Keys), and then creating relationships between tables, you define how data will be used to define your business model.
Example - Customers, Contracts and Plants
Business Model 1Lets assume your database is defining a customer, the contracts you have with that customer, and the plant where the work will be done. Maybe you first consider a simple business model like:
" Plants and Contracts belong to a Customer. Multiple Projects can be grouped under Contracts"
This can be represented with a simple organization chart as follows:
Business Model 2
What if another customer then presented you with another business model scenario, like:
"Project Numbers are specific to Plants, with the possibility of multiple projects under a single contract"
You need to be able to define something like the structure shown in the figure below where Plants are related to Projects:
Business Model 3
Then another customer presents you with another business model:
"Two separate customers with their own contracts and projects are using the same plant."
Unless you want to spend all your time programming, you want your database design to be able to represent and enforce the integrity of ALL the business models you have to support. The figure below shows an actual database design that supports the above scenarios.
It's not as complicated as it looks. It almost looks like the data structure for Business Model 1. The key is that Plants is not connected directly to Customers. It is linked to both Projects (Contract_Projects) and to Customers. The link to Customers is not direct either. It is using a "relationship-table" where the relationships between plants and customers is defined. All the above relationships defined by business models 1 through 3 are supported by this structure - without data redundancy.