SPOC-Web Icon, semantic Knowledge Management

Relationale Tabellen

Relational Tables

Relational Tables and Databases were invented in the 1970s. They have a poweful and well-founded mathematical Basis and a high Maturity. They are used in every Sector of Information Technologie, from Mainframe down to Mobile Phone.

The Basics are easy to understand. Mastering them yields a better Understanding for any Software-System.

In Spoc-Web this Theory is seamlessly connected to the new Ideas of the semantic Web. This Combination of proven and novel Technologies yield a System, that is both extremely  flexible and highly performant.

The Structure of relational Tables

Tables are 2-dimensional Data Stores, consisting of Columns and Rows.

Every Row represents an Object/ Thing with their own Identity.

The Columns each contain Data / Values of the same Type with same Meaning.

Allowed Data-Types are:

  • Text
  • Date and Time
  • Numbers: whole Numbers ("integer") or Float-Point-Numbers
  • References to Rows in relational Tables: usually as integerNumbers

Especially these References allow for the Flexibility and Power of the relational Modell and contribute to its great Success.

Rows and Columns form the so-called "Cells" at their Crossings. The Cell Content should be very "atomic", for both the User and the Database can process the "Inner" of Cells only with higher Effort. This means for Example, that You should create separate Columns/Cells for first and last Name, if You want to use them separately.

Tables: Schema and Example

Schmatic Table with Rows and Columns (incl. ID Column with technical, unique Values)
ID Column 2 ... Column N
 1  Row 1 ... ...
  ...   ...
m Row m ... ...
Beispiel-Tabelle: Personendaten einer Familie

ID

FirstName

LastName

Address

City

State

ZIP

3

John

Doe

569 Peach St.

Martin

TN

38237

6

Jane

Doe

569 Peach St.

Martin

TN

38237

8

Jack

Doe

569 Peach St.

Martin

TN

38237

9

Jill

Doe

569 Peach St.

Martin

TN 38237

Unique Table-Keys

When describing Objects we explained, how important a unique Name is. The same is true for Tables: the unique Name is called "Key" there. Tables usually have at least two disjoint Keys: a technical ID (usually an integer Number counting up) and a unique Combination of Domain Columns.

ID: Technical Key

Every Row should have a single, unique Name, that allows to distinguish if from all other Rows. A specific Column is created for this Purpose: the so-called "ID" or "Identity" Column. Usually an integer Number is created for this Purpose and counted up for each new Row. It is paramount, that this ID is NEVER changed. It must also NOT be re-used for other Rows, should the original Row be deleted. This is why a Number is so well suited, because there should be no Need to whimsily change a Number. Usually these Numbers are not published either, so there is no desire to change it too.
When these Numbers are published though, like Account Numbers, these are usually started with at least six digits, so that Customers don't ask to change them.

"Natural Key"

Apart from the technical Key there is at least one unique "natural Key" consisting of one or more Columns that allow to uniquely identify a Row. Because the technical Key is usually not published, the natural Key is needed to distinguish the Rows without resorting to the ID, using only publicly available Properties. With People a common Combination is: First and Last Name together with Birth Date and Place.

The Choice of Columns fundamentally depends on the Business Domain, especially on which Information is actually available. 

 

Tables = Classes

It shows that the Correspondence between relational Columns to the Classes described here is complete and NOT accidental. The following Table compares the structural Elements of relational Tables and the semantic Conzepts of Individuals and Collections.

Table Row Cell Column Reference Reference-Column Key ID
Semantic Thing Property Predicate Connection Relation Name ID

Normalization for Speed and Uniqueness

Relational Databases always use several Tables to store their Data. This is different from simple, tabular Lists and Spreadsheet Applications that use only a single Table. References within these Tables correspond to Connections between the Row-Objects. These References allow to avoid Duplication of Data, reducing the Load, but even more important: modelling Reality better, because real- World Objects are never duplicated.

In the Example above not only the Adresses is repeated, but also the Family Name.

Introducing a new Table called "Family" eliminates repeating Data and reflects the Fact that Families usually live together at the same Address:

Normalized-Table:

ID

FirstName

FamilyId

3

John

4

6

Jane

4

8

Jack

4

9

Jill

4

The Table to the left repeats the one from avove, but in a normalized Form: Family and Address are given by the Reference to Family#4 using the FamilyId Column. 

The Data are uuniquely stored there, next to those from other Families.

Associated Family-Tabelle:

ID

LastName

Address

City

State

ZIP

...

...

...

...

...

...

4

Doe

569 Peach St.

Martin

TN

38237

... ... ... ... ... ...

As You can see, Address and Family Name are repeated only once. This has several Advantages:

  • The Database becomes smaller and faster
  • Errors in the Address or Family Name can be fixed in a single Place (SPoC Principle)
  • If the Family moves, only a single Entry must be changed. You don't have to change (or forget) it for every Family Member. There can be no Inconsistencies and the Correctness and Completeness of the Data can be easily verified.

It may seem as if this are Problems of human Operations, that should not appear with Software, but exactly this happens, when the Database-Structures don't reflect the real Situation (any more). The Liste of Examples where Software-Users are forced to repeat or re-enter Data, because of wrong or outdated Structures, is endless.

Relation Cardinalities: 1:1, 1:N, N:1 und N:M

Relation Cardinalities describe the Flexibility of the Relation. It shows, how many Rows of one Tabe can possibly be related to how many from the other Table.

As shown in the Chapter about Normalization, a Relation from Table A to Table B is created ba adding a Column to A (e.g. "B_ID"), that uses ("references") the unique Values of Column B.ID. Since the unique (1) Values from B.ID can be used in in any Number (N) of Rows of Table A, we have a N (many A) to 1(one B) Relation from A to B.

Vice versa we also have the so called "inverse" 1:N Relation from Table B to Table A, which is NOT separately maintained in Spoc-Web. This Change in Direction/Perspective corresponds to the Change between active and passive Voice: Subject and Object switch Positions and Roles, the Verb changes between active and passive Conjugation.

 

Special Cases: 1:0/1 and N:M Relations

By Default, all Relations can be traced down to 1:N Types. Despite this, the following special Cases should be considered:

  • A-1: B-0/1 Relations appear when Table B uses the Values from A.ID (like in 1:N) and additionally has a Uniqueness Constraint on its Column B.A_ID. B can now refer to each Row of A only once. On the other Side though there may be Rows in A, that are NOT referred by any Row from B. Now we have to discern between the following two Cases:
    • B.A_ID allows for 'null' Values or
    • B.A_ID must always be filled. In this Case A_ID can be chosen to be the primary Key for Table B. This Kind of Relation is often used to model Polymorphy in object-orientierted Programming.
  • A-N : B-M Relations are realized by a Pair of Relations to a narrow Table P: Between Table A and B there is a Table P with only two Columns, containing Pairs of Values: A.ID and B.ID that point to the respective IDs of Table A and B. Since the Table P has only two Columns and no own Identity, it is often skipped during High-Level Modelling and the two 1:N and M:1 Relations are displayed as a single M:N Relation.
    Because there may be arbitrary many Pairs, any Combinations of Rows from A and B can be created.
    Examples are, among others, Participation of Studenten to Courses:
    • every Student can participate in several Courses
    • in every Course there will usually be several Studenten.

Pure N:M Relations are rather rare. Very often during Development more Data appears that need to be stored for the Relationship, like e.g. the Duration of the Participation in a Course. These Values can only be stored in Table P.

As soon as these additional Data also contain Relations to the Link Record, References to Rows of Table P must be enabled. So P receives a techncal ID Column, promoting it to a full Table/Class and giving its Rows unique Identities.

The Role of Column-Constraints

Spalten have a lot of Properties. Apart from the Data Type,  especially the Constraints determine the Role of the Column. These Constraints are described in the following Sections. 

Allow Null-Values?

Relational Databases have a special Value "NULL", which can be set to demonstrate that the Cell is not filled or the Value is unknown. NULL can be used in Columns of any Data Type. NULL is unequal to any other Value including itself, i.e. NULL <> NULL, different from Programming Languages. Comparing with NULL always yields "false", because You cannot tell with unknown Values. Actually this would require ternary Logic, that allows for NULL, in addition to true and false. Instead, SQL offers special Operators: IS NULL, IS NOT NULL and the Function isNull().

Not-Null-Constraints ensure, that there always is a Value entered into this Column. This is usually the Case for Key Columns and for Fields and Relations that are mandatory due to Business Reasons.

Unique Constraints

Unique Constraints make sure, that no duplicate Values are entered into the specified Column(s).

Uniqueness should be configured for every Key: 

  • For the technical Key this ensures that References are unique.
  • For the Business Keys this prevents Duplicates. 

According to the SQL Standards, You can define Unique Constraints with Nullable Columns. Rows with NULL-Values are exempt from Uniqueness, because comparing with NULL always yields false (see above). SQL Server (Microsoft) only allows for a single Row with NULL, except You activate SET ANSI_NULL ON. Alternatively You can define a "filtered Unique Constraint" that ignores Rows with NULL Values.

Eindeutigkeit sollte für jeden Schlüssel eingestellt werden, sowohl für den technischen wie auch für alle fachlichen Schlüssel. Gemäß SQL Standard kann man Unique Constraints auch auf Nullable Spalten definieren. SQL Server (Microsoft) erlaubt allerdings nur eine Zeile mit NULL, außer man aktiviert SET ANSI_NULL ON. Alternativ kann man dort auch einen "filtered Unique Constraint" setzen.

Foreign Key Constraints

Foreign Key Constraints make sure, that References from this Column always exist. Defined on the Foreign Key Column, they only allow for NULL or Values, that exist in the referenced ID-Column.

Vice versa, Foreign Key Constraints also prevent accidental Deletion or ID-Modification of Rows while there are Cells pointing to them. Instead of preventing Deletion or Modification the followind Behaviors can be configured:

Behavior Result
CASCADE
  • with UPDATE this also updates the referencing Cells
  • with DELETE this also deletes all Rows containing Cells that refer to the deleted Rows. Cycles in the References are usually not allowed though, so recursive Deletion is usually not possible. This Option is risky, because possibly large Amounts of Data can be deleted. 
SET DEFAULT resets the referencing Cells to the Default-Values defined on their Columns
SET NULL resets the referencing Cells to NULL
  prevents Update and Deletion
 

Constraints on Foreign Key Columns

Constraints on Foreign Key Columns are very important: They essentially formen the Cardinalities of the allowed Relations.

The Tabelle to the right shows the different Combinations of Constraints for a Foreign-Key-Column and the emerging Types of Relations.

FK Unique Nullable Relation
X  X  X 0/1 : 0/1
X X  O 1 : 0/1
X O X 0/1 : N
X  O O 1 : N