DE10.p000066600003200000027000001422630547220560000127100ustar00liddlpstudent00000000000000.nr LL 6.5i
.ll 6.5i
.nr PO 1i
.po 1i
.nr PS 12
.ps 12
.nr VS 16
.vs 16
.EH ''''
.OH ''''
.EF ''%''
.OF ''%''
.EQ
delim %%
gsize 12
define subset '\(sb'
define notsubset '\(sb back 35 size +2 / fwd 10'
define subsetorequal '\(ib'
define notsubsetorequal '\(ib back 35 size +2 / fwd 10'
define semijoin '| back 30 down 2 times'
define join '| back 59 down 3 times back 67 |'
define oj '\(ci back 80 times'
define oj '\(ci back 80 times'
define loj 'C back 50 times back 38 |'
define notin '\(mo back 30 / fwd 10'
define includes '\(sp'
define includesorequal '\(ip'
define in '\(mo'
define ninter 'up 25 inter'
define nunion 'up 25 union'
define smallunion 'size 7 up 25 union'
define smallinter 'size 7 up 25 inter'
define nsum 'back 65 sum'
define square '\(sq'
define dash '\(em'
define md '- back 10 down 30 *'
define <-> '<- back 20 ->'
define notFD '-> back 38 / fwd 10'
define |= '| back 55 = back 50 ='
define ->> '-> back 20 ->'
define w->> '-> back 20 -> back 90 down 35 w'
define => 'roman {up 7 = back 30 up 7 = back 30 up 7 = back 70 >}'
define tilde-not 'size +8 down 35 "~"'
define <=> 'roman {< back 70 up 7 = back 30 up 7 = back 30 up 7 = back 30 up 7 = back 70 >}'
define |- '| back 29 - back 25 -'
define there-exists 'roman {fwd 25 | back 95 up 44 - back 52 up 9 - back 54 down 23 - back 10 ~}'
define not '\(no'
define /= '\(!='
define != '\(!='
define one-half '\(12'
define division '\(di'
define unknown '| back 81 _'
define emptyset '\(es'
define down-arrow '\(da'
define up-arrow '\(ua'
define smallcircle 'down 20 \(de'
define star '\(**'
define infinity '\(if'
define notimplies 'roman {= back 40 = back 70 / >}'
define left-co '- back 20 - back 20 ('
define right-co ') back 18 - back 20 - back 67 up 5 roman >'
define disj '{roman {back 20 \e / back 20 ~}}'
define conj '{roman {/ fwd 9 up 5 \e}}'
define line '^ down 8 \(br ^'
define for-all '{roman {\e back 3 / back 48 up 16 size -2 - back 10 ~}}'
define Nquant 'size +1 | back 34 / back 33 size +1 | back 60 ~'
.EN
.B
.ce 2
UNIFYING NORMALIZATION THEORY
UNDER A NEW DEFINITION FOR NESTED NORMAL FORM
.sp
.I
.ce 2
David W. Embley
Yiu-Kai Ng
.R
.ce
Brigham Young University
.sp
.I
.ce
Wai Yin Mok
.R
.ce
Hong Kong Polytechnic
.sp
.ce
.I
ABSTRACT
.sp
.QP
This paper gives
a new definition for nested normal form that unifies normalization
theory for both flat and nested relations. In particular,
it is shown that, under this new
definition, normalization theory for flat
relations is a special case of normalization theory for nested
relations. Moreover, this new definition also
precisely captures the notion of redundancy in the sense that a relation scheme
(nested or flat) that is consistent with a given set of functional and
multivalued dependencies has no potential
redundancy with respect to the given dependencies
if and only if it is in nested normal
form as defined here. These facts present a strong case for the adoption
of this new nested normal form.
.QP
Keywords: normalization theory, nested normal form, fourth normal form,
Boyce-Codd normal form, data redundancy, database design.
.sp
.NH
Introduction
.PP
Normalization theory for flat relations began with Codd
.[ [
Codd
.]]
and has a long history. Normalization theory for nested relations is much
more recent
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth Silberschatz
.],
.[
Roth Korth nested normal form
.]].
Their unification as a single consistent theory for normalization
is the subject of this paper.
.PP
To unify normalization concepts for flat and nested relations,
we must first settle
on definitions. This may appear to be straightforward, but is not.
We can, however, agree that a primary objective is to
remove redundancy (and the corresponding update anomalies that accompany
redundancy). Because
normalization theory has had this objective from the beginning, it is a little
unusual that none of the papers or texts, with some recent exceptions
.[ [
Vincent Srinivasan
.]],
formally address the question of redundancy.
Here, we focus exclusively on this goal and
directly address the redundancy question. Indeed, an important
contribution of this paper is
to show that redundancy, as defined here, fully characterizes
our unified normalization theory for both flat and nested relations.
.PP
When we approach normalization from this redundancy
point of view, we can settle on definitions.
A first result of approaching normalization from this
point of view is that we cannot accept the
definition for Fourth Normal Form (4NF)
that is commonly found in most text books (e.g.,
.[ [
Korth Silberschatz Database System Concepts
.],
.[
Maier
.]])
because it does not guarantee redundancy-free relations for all cases;
neither is it equivalent to the original definition given by Fagin
.[ [
Fagin
.]],
which does guarantee redundancy-free relations.
Here, we use Fagin's definition of
4NF in our unification of normalization theory.
A second result is that none of the previously suggested
definitions for Nested Normal Form (NNF)
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]]
is acceptable because none of them guarantees that relations are free of
redundancy for all cases.
We therefore provide a new one, which, in addition to unifying
normalization theory, is a major contribution of this paper.
.PP
In Section 2 we present and illustrate our definitions for nested relations
and NNF. Proofs that our NNF definition subsumes Fagin's 4NF and
Boyce-Codd Normal Form (BCNF) as special cases are in Section 3. In Section 4
we give a formal definition for redundancy and prove
that our definitions (those chosen by us, together with those
developed by us in this paper) precisely capture the idea of
redundancy-free relations. Finally, in Section 5 we state our conclusions.
We also mention in Section 5 some concurrent work on developing algorithms for
our NNF definition, especially for practical cases, and on investigating
the properties of these algorithms including time complexity,
lossless decompositions, dependency preservation,
minimality, and maximal clustering,
all of which is too extensive to be discussed in detail in this paper.
.NH
Nested Normal Form
.PP
We begin by defining nested relation schemes, nested relations, scheme trees,
and their properties. This background sets the stage for our NNF definition,
which follows.
.NH 2
Nested Relation Schemes and Nested Relations
.PP
A nested relation allows each tuple component to be either atomic or
another nested relation, which may itself be nested several levels deep.
As in
.[ [
Abiteboul Bidoit
.],
.[
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]],
we are only interested in Partition Normal Form (PNF)
.[ [
Roth Korth Silberschatz
.]].
Thus, in our nested relations, for each nesting, there can never be distinct
tuples that agree on the atomic attributes
.[ [
Atzeni DeAntonellis Relational Database Theory
.]].
.sp
.LP
\fBDefinition 1.\fR
Let %U% be a set of attributes.
A \fInested relation scheme\fR is recursively defined as follows:
.IP (1)
If %X% is a nonempty subset of %U%, then %X% is a
\fInested relation scheme\fR over the set of attributes %X%.
.IP (2)
If %X%, %X sub 1%, ..., %X sub n% are nonempty subsets of %U%,
and %R sub 1%, ..., %R sub n% are nested relation schemes over
%X sub 1%, ..., %X sub n% respectively,
such that %X%, %X sub 1%, ..., and %X sub n% are pairwise disjoint, then
%X% %(R sub 1 )*% ... %(R sub n )*% is a \fInested relation scheme\fR over
%XX sub 1% ... %X sub n%.
.sp
.LP
\fBDefinition 2.\fR
Let %R% be a nested relation scheme over a nonempty set of
attributes %Z%. Let the domain of an attribute %A~in~Z% be denoted by %dom(A)%.
A \fInested relation over R\fR is recursively defined as follows:
.IP (1)
If %R% has the form %X% where %X% is a set of attributes {%A sub 1%, ...,
%A sub n%}, %n~>=~1%, then %r% is a \fInested relation over
R\fR if \fIr\fR is a (possibly empty) set of functions
{%t sub 1%, ..., %t sub m%} where each function %t sub i%, %1~<=~i~<=~m%,
maps %A sub j% to an element in %dom(A sub j )%, %1~<=~j~<=~n%.
.IP (2)
If %R% has the form %X% %(R sub 1 )*% ... %(R sub m )*%, %m~>=~1%, where %X%
is a set of attributes {%A sub 1%, ..., %A sub n%}, %n~>=~1%, then %r% is
a \fInested relation over R\fR if
.RS
.IP (a)
\fIr\fR is a (possibly empty) set of functions {%t sub 1%, ...,
%t sub p%} where each function %t sub i%, %1~<=~i~<=~p%, maps %A sub j%
to an element in %dom(A sub j )%, %1~<=~j~<=~n%, and maps %R sub k%
to a nested relation over %R sub k%, %1~<=~k~<=~m%, and
.IP (b)
%t sub i~in~r% and %t sub j~in~r% and %t sub i (X)% = %t sub j (X)% implies
%t sub i% %=% %t sub j%, %1~<=~i,~j~<=~p%.
.RE
.LP
Each function of a nested relation %r'% over nested relation scheme %R'%
is a \fInested tuple\fR of %r'%.
.sp
.LP
\fBExample 1.\fR
Figure 1 shows a nested relation. Its scheme is %A~(B)*~(C~(D)*)*%,
and it contains two nested tuples. As in
.[ [
Abiteboul Bidoit
.]],
we draw a bucket for each nested relation except the outermost nested relation.
Each bucket also has nested tuples of its own.
The first bucket under %C~(D)*%, for example,
contains two nested tuples, <4, {5, 6}> and <7, {8}>. Observe that the values
for the atomic attribute %A% differ, and in each bucket the values for
the atomic attributes differ. The nested relation in Figure 1 is thus in
PNF.
.sp
.LP
\fBDefinition 3.\fR
Let %R% be a nested relation scheme. Let %r% be a nested relation on %R%.
The \fItotal unnesting\fR of %r% is recursively defined as follows:
.IP (1)
If %R% has the form %X%, then %r% is the total unnesting of %r%.
.IP (2)
If %R% has the form %X% %(R sub 1 )*% ... %(R sub n )*%, where %X sub i%
is the set of attributes in %R sub i%, %1~<=~i~<=~n%,
then the total unnesting of
%r% = { %t% | there exists a nested tuple %u~in~r% such that %t(X)% =
%u(X)% and %t(X sub i )% is a
tuple in the total unnesting of %u(R sub i )%, %1~<=~i~<=~n%. }
.sp
.LP
\fBDefinition 4.\fR
Let %R% be a nested relation scheme. Let %r% be a nested relation on %R%.
Let %t% be a nested tuple of %r%.
The \fItotal unnesting\fR of %t% is defined as the total unnesting of
%q%, where %q% is a nested relation containing the single nested
tuple %t%.
.sp
.LP
\fBExample 2.\fR
Figure 2 shows the total unnesting of the nested relation in Figure 1.
Observe that the first two tuples in the total unnesting contain the
total unnesting of the nested tuple <4, {5, 6}>, the first nested tuple
in the first bucket under nested relation scheme %C~(D)*% in Figure 1.
.NH 2
Scheme Trees
.PP
We can graphically represent a nested relation scheme by a tree, called a
scheme tree.
A scheme tree captures the logical structure of a nested relation scheme
and explicitly represents a set of multivalued dependencies (MVDs).
Scheme trees have been used for earlier definitions of NNF
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]].
We use them here for the same purpose.
.sp
.LP
\fBDefinition 5.\fR
A \fIscheme tree\fR %T% corresponding to a nested relation scheme
%R% is recursively defined as follows:
.IP (1)
If %R% has the form %X%, then %T% is a single node scheme tree whose
root node is the set of attributes %X%.
.IP (2)
If %R% has the form %X% %(R sub 1 )*% ... %(R sub n )*%, then
the root node of %T% is the set of attributes %X%, and
a child of the root of %T% is the root of the scheme tree %T sub i%,
where %T sub i% is the corresponding scheme tree for the nested
relation scheme %R sub i%, %1~<=~i~<=~n%.
.sp
.PP
The one-to-one correspondence between a scheme tree and a nested
relation scheme along with the definition of a nested relation
scheme impose several properties on a scheme tree. Let %T% be a scheme tree.
We denote the set of attributes in %T% by %Aset(T)%.
Observe that since Definition 1 requires nonempty sets of
attributes, every node in %T% consists of a nonempty
set of attributes. Furthermore, since the sets of attributes corresponding
to nodes are pairwise disjoint and include all the attributes of %T%,
the nodes in %T% are pairwise disjoint, and their union is %Aset(T)%.
.PP
Let %N% be a node in %T%. Notationally, %Ancestor(N)% denotes
the union of attributes in all ancestors of %N%, including %N%.
Similarly, %Descendent(N)% denotes the union of attributes in all
descendants of %N%, including %N%.
.PP
In a scheme tree %T% each edge %(V,~W)%, where %V% is the parent of %W%,
denotes an MVD %Ancestor(V)% %->>% %Descendent(W)%. Notationly, we use
%MVD(T)% to denote the union of all the MVDs represented by the edges in %T%.
By construction, each MVD in %MVD(T)% is satisfied in the total unnesting of any
nested relation for %T%. Since functional dependencies (FDs) are also of
interest, we use %FD(T)% to denote any set of FDs equivalent to all FDs %X~->~Y%
implied by a given set of FDs and MVDs over a set of attributes %U% such that
%Aset(T)~subsetorequal~U% and %XY~subsetorequal~Aset(T)%. (Note that because
a set of FDs %F% together with a set of MVDs %M% can imply FDs not implied by
%F% alone, %FD(T)%, in general,
is not equivalent to the set of FDs in %F% whose left-hand side is
a subset of %Aset(T)% and whose right-hand side is restricted to %Aset(T)%.)
.sp
.LP
\fBExample 3.\fR
Figure 3 shows the scheme tree %T% for the scheme of the nested relation
in Figure 1. Figure 3 also gives %Aset(T)% and
the set of MVDs that constitute %MVD(T)%. Observe that each of the MVDs in
%MVD(T)% is satisfied in the unnested relation in Figure 2.
.NH 2
Nested Normal Form Definition
.PP
Before giving our NNF definition, we first define what it means for an
MVD and an FD to hold for a scheme tree. Our definition for holds generalizes
the definition given by Fagin in his original paper defining 4NF
.[ [
Fagin
.]]
and is critical for our unification of normalization theory.
.sp
.LP
\fBDefinition 6.\fR
Let %U% be a set of attributes. Let %M% be a set of MVDs over %U% and
%F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)~subsetorequal~U%.
An MVD %X~->>~Y% \fIholds\fR for %T% with respect
to %M% and %F% if %X~subsetorequal~Aset(T)% and
there exists a set of attributes %Z~subsetorequal~U% such that
%Y~=~Z~smallinter~Aset(T)% and %M~smallunion~F% implies %X~->>~Z%
on %U%. An FD %X~->~Y% \fIholds\fR for %T% with respect
to %M% and %F% if %XY~subsetorequal~Aset(T)% and %M~smallunion~F% implies
%X~->~Y% on %U%.
.sp
The following lemmas provide the motivation for this definition. The
first motivates the MVD definition for holds, and
the second motivates the FD definition.
We point out that Lemma 1 below is Theorem 5 in
.[ [
Fagin
.]].
For the sake of completeness and to make our paper self contained, we provide
a proof for Fagin's theorem.
.sp
.LP
\fBLemma 1.\fR
Let %U% be a set of attributes and %R~subsetorequal~U%.
Let %M% be a given set of MVDs over %U% and
let %F% be a given set of FDs over %U%.
Let %X~subsetorequal~R%, %Z~subsetorequal~U%, and %Y% = %Z~smallinter~R%.
If %M~smallunion~F% implies %X~->>~Z% on %U%, then
%M~smallunion~F% implies %X~->>~Y% on %R%.
.LP
\fBProof.\fR
Let %r sub U% be a relation over %U% that satisfies %X~->>~Z% and let
%r sub R% = %pi sub R (r sub U )%. Let %t sub 1% and %t sub 2% be
tuples in %r sub R% such that %t sub 1 (X)% = %t sub 2 (X)%.
Since %r sub R% = %pi sub R (r sub U )%, there exist tuples
%t' sub 1% and %t' sub 2% in %r sub U% such that
%t' sub 1 (R)% = %t sub 1% and %t' sub 2 (R)% = %t sub 2%.
Since %r sub U% satisfies %X~->>~Z%,
there exists a tuple %t' sub 3% in %r sub U% such that %t' sub 3 (XZ)% =
%t' sub 1 (XZ)% and %t' sub 3 (U~-~(XZ))% = %t' sub 2 (U~-~(XZ))%.
Consider the tuple %t sub 3% in %r sub R% where %t sub 3% = %t' sub 3 (R)%.
Since %Y~subsetorequal~Z%, %t sub 3 (XY)% = %t sub 1 (XY)% and
%t sub 3 (R~-~(XY))% = %t sub 2 (R~-~(XY))%.
Hence, %r sub R% satisfies %X~->>~Y%. %square%
.sp
.LP
\fBLemma 2.\fR
Let %U% be a set of attributes and %R~subsetorequal~U%.
Let %M% be a given set of MVDs over %U% and
let %F% be a given set of FDs over %U%.
Let %XY~subsetorequal~R%. If %M~smallunion~F% implies %X~->~Y% on %U%, then
%M~smallunion~F% implies %X~->~Y% on %R%.
.LP
\fBProof.\fR
Immediate.
.sp
.PP
We now give our definition for NNF.
.sp
.LP
\fBDefinition 7.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)~subsetorequal~U%.
%T% is in \fINested Normal Form (NNF)\fR with respect to %(M~smallunion~F)%
if the following conditions are satisfied.
.IP (1)
If %D% is the set of MVDs and FDs that hold for %T%,
then %D% is equivalent to %MVD(T)% %smallunion% %FD(T)% on %Aset(T)%.
.IP (2)
For each nontrivial FD %X~->~A% that holds for %T%,
%X~->~Ancestor(N sub A )%, where %N sub A% is the node in %T% that contains %A%.
.IP (3)
For every node %N~in~T%, if %Ancestor(N)% %->% %Y% holds for %T%,
%Y% %subsetorequal% %Ancestor(N)%.
.sp
.LP
\fBExample 4.\fR
Suppose we are given %U% = %ABCDE%, %M% = {%A~->>~B%, %A~->>~CDE%, %C~->>~AB%,
%C~->>~DE%}, and %F% = {%C~->~AE%}. Under these assumptions, the scheme tree
%T% in Figure 3 is in NNF. Since %E~notin~Aset(T)%,
{%A~->>~B%, %A~->>~CD%, %C~->>~AB%, %C~->>~D%} is a set of MVDs
equivalent to the set of MVDs that
hold for %T%, and {%C~->~A%} is a set of FDs equivalent to the set of FDs
that hold for %T%. Thus, in the NNF definition, %D% is equivalent to
{%A~->>~B%, %A~->>~CD%, %C~->>~AB%, %C~->>~D%, %C~->~A%}.
For the scheme tree in Figure 3, a set of MVDs and FDs equivalent to
%MVD(T)% %smallunion% %FD(T)% on %Aset(T)%
is {%A~->>~B%, %A~->>~CD%, %AC~->>~D%, %C~->~A%},
which can easily be shown equivalent to %D% on %Aset(T)% by applying a
few derivation rules. Thus, Condition 1 is satisfied.
The only nontrivial FD that holds for %T%
is %C~->~A%. Since %Ancestor(A)% = %A%,
%C~->~Ancestor(A)%, and thus Condition 2 is satisfied.
The FDs that hold for %T% and have %Ancestor(N)% as a left-hand side
for some node %N% in %T% are all trivial, and thus, Condition 3 is satisfied.
For example, %Ancestor(C)% = %AC%, and the only FDs with left-hand side %AC%
that hold are %AC~->~A%, %AC~->~C%, %AC~->~AC%, and %AC~->~emptyset%.
.sp
.PP
We now give three examples, one that violates Condition 1 of our NNF definition,
one that violates Condition 2, and one that violates Condition 3. In these
examples we also point out what problems the violation causes.
.sp
.LP
\fBExample 5.\fR
Suppose %U% = %ABCD%, %M% = {%A~->>~B%, %B~->>~A%}, and %F% = %emptyset%.
Then the scheme of the nested relation
in Figure 4 is not in NNF. Here, %MVD(T)% %smallunion% %FD(T)% is equivalent to
{%A~->>~B%, %A~->>~CD%}, which is not equivalent to the set of dependencies
that hold for %T%. Thus, Condition 1 is not satisfied. The problem with the
nested relation in Figure 4 is that the MVD %B~->>~A% and the 2
appearing in separate buckets under %B% cause the tuples in
the buckets under %CD% to be redundant. If we delete <4, 5> from the
first bucket, we must also delete it from the second. If we add a new tuple,
say <6, 7>, to the first, we must also add it to the second.
.sp
.PP
We observe, in passing, that all the earlier NNF definitions
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]]
say that the scheme of the nested relation in Figure 4 is in NNF. We can see
this as follows. These earlier definitions of NNF are based on ``keys''
for nested relation schemes,
``partial MVDs,'' and ``transitive MVDs,'' which are defined
analogous to keys and partial and transitive dependencies for flat relation
schemes.
The differences among the three earlier definitions lies in how they handle FDs,
but when there are no FDs, as in Example 5, all three earlier NNF
definitions are equivalent. Here, all three earlier definitions declare
%A% and %B% to be ``keys,'' find that %M% implies %MVD(T)%, and observe that
since %A% and %B% are single attributes, there are no ``partial MVDs,''
and that since %A~->>~B% and %B~->>~A%, there are no ``transitive
MVDs.'' In the earlier definitions, there is also a technical point
about ``node restructuring,'' which does not apply for
our example. Thus, the nested relation scheme in Example 5 satisfies all three
earlier NNF definitions. Since the scheme satisfies these earlier NNF
definitions, but
allows redundancy as illustrated in Example 5, these earlier NNF definitions do
not guarantee redundancy-free relations for all cases.
.sp
.LP
\fBExample 6.\fR
Suppose %U% = %ABCD%, %M% = {%A~->>~BC%, %AB~->>~C%, %A~->>~D%}, and
%F% = {%C~->~B%}. Then the scheme of the nested relation in Figure 5 is not
in NNF. Here, %C~->~B% is a nontrivial FD that holds for %T%. However,
%Ancestor(B)% = %AB%, but %C~notFD~AB%. Thus, Condition 2 is not satisfied.
The problem with the nested relation in Figure 5
is that the FD %C~->~B% and the 1 appearing in separate buckets under %C%
cause the 1's under %B% to be redundant. If we alter the value of one of the
1's under %B%, we must alter the value of the other.
.sp
.LP
\fBExample 7.\fR
Suppose %U% = %ABCD%, %M% = {%A~->>~CD%}, and %F% = {%A~->~B%}. Then the
scheme of the nested relation in Figure 6 is not in NNF. Here, since
%Ancestor(A)% = %A% and %Ancestor(A)~->~B% holds for %T%,
%A~->~B% should be trivial, but is not.
Thus, there is a violation of Condition 3.
The problem with the nested relation in
Figure 6 is that all buckets under %B% \fImust\fR
contain one value (zero is also possible if we allow nulls). There is therefore
no need for the additional structure; scheme %A~B~(CD)*% provides all
the advantages without additional structure.
This requirement is in harmony with earlier NNF definitions
.[ [
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]],
which also disallow buckets that never have more than one value.
.NH
Nested Normal Form Subsumes BCNF & 4NF
.PP
In this section we
show that 4NF and BCNF are special cases of NNF as we defined it.
That is, when a scheme tree %T% is degenerate, having only a root node, and
is in NNF, then it is in Fagin's 4NF and conversely. Furthermore, if there are
only FDs, no given MVDs, then if %T% is degenerate and in NNF,
then it is in BCNF and conversely.
.PP
We use the definitions below for BCNF and 4NF. The definition for BCNF
is standard. The definition for 4NF is Fagin's
.[ [
Fagin
.]].
Unless otherwise explicitly stated, from this point on, when we say 4NF,
we mean Fagin's 4NF.
.sp
.LP
\fBDefinition 8.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
A relation scheme %R~subsetorequal~U% is in \fIBCNF\fR
with respect to %M~smallunion~F% if whenever a nontrivial FD
%X~->~Y% holds for %R%, %X~->~R% also holds.
.sp
.LP
\fBDefinition 9.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
A relation scheme %R~subsetorequal~U% is in \fI4NF\fR
with respect to %M~smallunion~F% if whenever a nontrivial MVD %X~->>~Y% holds
for %R%, %X~->~R% also holds.
.sp
.PP
Here, observe that we use Definition 6 to define holds for both BCNF and
4NF. This is the difference between Fagin's 4NF and the common text-book
definition for 4NF.
The common text-book definition only considers MVDs, given or implied,
whose attributes are subsets of the scheme under consideration. Fagin's
definition, on the other hand, is broader because it
includes not only implied or given MVDs whose attributes are
subsets of the scheme under consideration, but also those that hold by Lemma 1.
.PP
We now proceed by
showing that 4NF and BCNF are special cases of NNF as defined here.
Observe, in preparation, that
for a degenerate scheme tree %T%, %T% consists only of atomic attributes
and thus we use %T% as a flat relation scheme in the following proof.
.sp
.LP
\fBTheorem 1.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U% and
%T% consists of only one node, namely the root.
%T% is in 4NF with respect to %M~smallunion~F%
if and only if %T% is in NNF with respect to %M~smallunion~F%.
.LP
\fBProof.\fR
(only-if)
Assume that %T% is in 4NF with respect to
%M~smallunion~F%. We proceed by showing that %T% satisfies Conditions
1, 2, and 3 of our NNF definition.
.PP
(%T% satisfies Condition 1.)
We partition the set %D% into two sets
%M'% and %F'% where %M'% is the set of MVDs in %D% and %F'% is the
set of FDs in %D%. Since %T% consists of only one node, %MVD(T)%
is empty. Therefore, we only need to prove that
%FD(T)% is equivalent to %M'~smallunion~F'%.
By the definitions of %FD(T)% and %D%, %FD(T)% and %F'% are equivalent. Hence,
%M'~smallunion~F'% implies %FD(T)%.
To prove that %FD(T)% implies %M'~smallunion~F'%, since %FD(T)% and %F'%
are equivalent, we only need to prove that %FD(T)% implies %M'%.
Let %X~->>~Y% be an MVD in %M'%. Since %M'% is the set of MVDs in %D% and
every MVD in %D% holds for %T%, %XY~subsetorequal~T%.
If %X~->>~Y% is a trivial MVD on %T%, then %FD(T)% implies %X~->>~Y%.
Therefore, we assume that %X~->>~Y% is a nontrivial MVD on %T%.
Since %T% is in 4NF with respect to %M~smallunion~F%,
by the definition of 4NF, %X~->~T%. Since
%Y~subsetorequal~T%, %X~->~Y%. Since %X~->~Y% is implied by
%M~smallunion~F% and %XY~subsetorequal~T%, %X~->~Y% is implied by %FD(T)%.
Since %X~->~Y% is implied by %FD(T)% and %X~->~Y% implies %X~->>~Y%, %FD(T)%
implies %X~->>~Y%. Since %X~->>~Y% was arbitrarily chosen from %M'%,
%FD(T)% implies %M'%.
.PP
(%T% satisfies Condition 2.)
Assume not, then there exists an FD %X~->~A% that holds for %T%
such that %A~notin~X% and %X~notFD~Ancestor(N sub A )%,
where %N sub A% is the node in %T% that contains %A%. Since %T%
consists of only one node, the root, and %N sub A% is a node in %T%,
%N sub A% is the root and %Ancestor(N sub A )% = %T%.
Hence, %X~notFD~T% and thus %XA~/=~T%.
%X~->~A% implies %X~->>~A%.
Since %A~notin~X% and %XA~/=~T%, %X~->>~A% is nontrivial.
Since %X~->>~A% is nontrivial and %X~notFD~T%,
%T% is not in 4NF with respect to %M~smallunion~F% %dash%
a contradiction.
.PP
(%T% satisfies Condition 3.)
Since %T% contains a single node %N%, %N% = %T% = %Ancestor(N)%.
If %Ancestor(N)% %->% %Y% holds for %T%, then %Y~subsetorequal~T%.
Since %T% = %Ancestor(N)%, %Y~subsetorequal~Ancestor(N)%.
.PP
(if)
Assume that %T% is in NNF with respect to %M~smallunion~F%.
Let %X~->>~Y% be a nontrivial MVD that holds for %T%.
Since %T% consists of only one node, %MVD(T)% is
empty. Thus, by Condition 1,
we have %FD(T)% equivalent to %D%. Since %X~->>~Y% holds for %T%,
%X~->>~Y% %in% %D%.
Since %FD(T)% is equivalent to
%D% and %X~->>~Y% %in% %D%, %FD(T)% implies %X~->>~Y%.
Since %FD(T)% consists only of FDs and %FD(T)% implies %X~->>~Y%, it must
be that %FD(T)% implies %X~->~Y%. Since %FD(T)% implies %X~->~Y%, %FD(T)% is
equivalent to %D%, and %M~smallunion~F% implies %D%, %M~smallunion~F%
implies %X~->~Y%. Since %X~->>~Y% is nontrivial, there exists an
attribute %A% in %Y% that is not in %X%. Hence, %X~->~A% is nontrivial
and holds for %T% since %XA% %subsetorequal% %T% and %X~->~A% is implied
by %M~smallunion~F%.
Since %T% consists of only one node and %A% is in %T%, %Ancestor(N sub A )%
= %T%, where %N sub A% is the node in %T% that contains %A%.
Thus, by Condition 2, since %X~->~A% is nontrivial and holds for %T% and
%Ancestor(N sub A )% = %T%, we have %X~->~T% and thus
%T% is in 4NF with respect to %M~smallunion~F%. %square%
.sp
.LP
\fBTheorem 2.\fR
Let %U% be a set of attributes.
Let %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U% and
%T% consists of only one node, namely the root.
%T% is in BCNF with respect to %F%
if and only if %T% is in NNF with respect to %F%.
.LP
\fBProof.\fR
If we are only given FDs, Definition 8 and Definition 9 are equivalent.
Therefore, the result follows immediately from Theorem 1. %square%
.NH
Redundancy
.PP
Except in rare cases, such as
.[ [
Vincent Srinivasan
.]],
papers and text books on normalization fail to provide rigorous definitions
for redundancy and thus also fail to prove that normalization guarantees
redundancy-free relations
as expected. Offered instead, are motivating examples to illustrate redundancy
detection and removal.
Thus, in the vast body of research literature on normalization,
we mostly only have rigorous syntactic justifications for
normalization; what we are missing are rigorous semantic justifications.
Besides only providing for syntactic characterizations, a
danger of not treating redundancy formally is that
the examples may be misleading. Indeed, as we show below, the
definition for 4NF found in most text books does not detect potential
redundancy for all
cases even though some readers of these books are led to believe that it does.
.PP
We proceed in this section as follows. First we give formal definitions for
redundancy. We then present theorems giving fundamental results that
relate redundancy to BCNF, Fagin's 4NF, and our NNF.
When we are finished, we will have a precise
characterization for the heart of normalization theory in terms of redundancy.
.NH 2
Redundancy Definitions
.PP
We want our redundancy definition to be simple and intuitive, but creating a
simple and intuitive definition for redundancy is more difficult than one might
at first think. Any definition will involve a sophisticated statement,
and there are many possible approaches one might use. We have chosen to base
our notion of redundancy on the idea that an atomic value %v% in a nested or
flat relation is redundant if we can erase %v%, and then from what remains
and from a single FD or MVD that holds determine what %v% must have been.
.PP
Our definition of redundancy depends on a nested relation being valid. Thus,
before defining redundancy, we first define what it means for a nested
relation to be valid for a given set of FDs and MVDs.
.sp
.LP
\fBDefinition 10.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree (which may be degenerate) such that
%Aset(T)~subsetorequal~U%. Let %r% be a nested relation on %T%.
Nested relation %r% is \fIvalid\fR with respect to %M~smallunion~F% if in the
total unnesting of %r%, every FD and every MVD that holds for %T% with
respect to %M% and %F% is satisfied.
.sp
.LP
\fBDefinition 11.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree (which may be degenerate) such that
%Aset(T)~subsetorequal~U%. Let
%XY~subsetorequal~Aset(T)%, and let %X~md~Y% be an FD or an MVD
that holds for %T% with respect to %M% and %F% and has
an attribute %A~in~Y% and %A~notin~X%.
Let %r% be a non-empty nested relation on %T%
that is valid with respect to %M~smallunion~F%.
Let %S% be the subtree of %T% that contains %A% as an atomic attribute, and
let %s sub 1%, ..., %s sub n% be the nested relations over %S% in %r%.
Let %u sub 1% and %u sub 2% be distinct nested tuples of
%s sub i% and %s sub j% respectively, %1~<=~i,j~<=~n%,
such that %u sub 1 (A)% = %v%, %u sub 2 (A)% = %v'%, and %v% = %v'%.
(Note that %i% = %j% is possible so that %s sub i% and %s sub j% may either be
the same nested relation under %S% or may be different nested relations
under %S%.) Let %t sub 1% and %t sub 2% be distinct tuples in the
total unnesting of %r% such that %t sub 1 (Aset(S))% and %t sub 2 (Aset(S))%
are tuples in the total unnesting of %u sub 1% and %u sub 2% respectively.
.IP
FD redundancy, when %X~md~Y% is %X~->~Y%:
If %t sub 1 (X)% = %t sub 2 (X)%, then atomic value %v%
is a \fIredundant atomic value\fR in %r% caused by %X~->~Y%.
.IP
MVD redundancy, when %X~md~Y% is %X~->>~Y%:
If %t sub 1 (X)% = %t sub 2 (X)%, %t sub 1 (Y)% = %t sub 2 (Y)%,
%t sub 1 (Z)% %/=% %t sub 2 (Z)% where %Z% = %Aset(T)~-~(XY)%, then, atomic
value %v% is a \fIredundant atomic value\fR in %r% caused by %X~->>~Y%.
.sp
.LP
\fBExample 8.\fR
In Example 6, we claimed that the 1's in the B column in Figure 5
are redundant because
of the given FD %C~->~B%. We can see formally that they are redundant by
considering the total unnesting of the nested relation in Figure 5,
which is given in Figure 7.
Letting %t sub 1% be the first tuple and %t sub 2% be the last tuple in
Figure 7, we see that %t sub 1 (BC)% and %t sub 2 (BC)% come respectively
from unnesting the first nested tuple in the first nested relation under
%B~(C)*% in Figure 5 and the first and only nested tuple in the second nested
relation under %B~(C)*% in Figure 5 and that
%t sub 1 (C)% = %t sub 2 (C)%. Therefore, the first 1 under %B% in the
nested relation in Figure 5 is redundant.
Swapping %t sub 1% and %t sub 2% makes the other 1 under %B% redundant.
.sp
.LP
\fBExample 9.\fR
In Example 5, we claimed that all the values in the buckets under %CD%
in Figure 4 are
redundant because of the given MVD %B~->>~A%. We can see formally that they
are redundant by considering the total unnesting of the nested relation
in Figure 4, which is given in Figure 8. We can see, as an example,
that the 5 under %D% in the first tuple in Figure 4 is redundant as follows.
Since %B~->>~A% is given and since in Example 5 %U% = %ABCD%,
the MVD %B~->>~CD% is implied by complementation.
Letting %t sub 1% be the first tuple and %t sub 2% be the next to last tuple in
Figure 8, we see that %t sub 1% comes from the first nested tuple and %t sub 2%
from the second nested tuple in the nested relation in Figure 4. But we have
%t sub 1 (B)% = %t sub 2 (B)%, %t sub 1 (CD)% = %t sub 2 (CD)%, and
%t sub 1 (A)% %/=% %t sub 2 (A)%. Therefore, the first 5 under %D% in the
nested relation in Figure 4 is redundant.
.sp
.PP
As a final example, we consider a flat relation that has redundancy under
a given set of dependencies. The example is interesting because
Fagin's 4NF definition rejects the relation scheme as not being in 4NF,
whereas the common text-book definition for 4NF accepts the relation
scheme as satisfying 4NF.
.sp
.LP
\fBExample 10.\fR
Let %U% = %ABCDE% and let %M% = {%A~->>~BC%, %A~->>~DE%} and %F% = %emptyset%.
Now, consider the flat relation scheme %R% = %ABD%.
Observed that %R% is not in 4NF according to Fagin's definition
because %A~->>~B% is a nontrivial MVD that holds for %R%,
but %A~->~ABD% does not hold. However, it is in 4NF according to
the common text-book definition for 4NF
because no nontrivial MVD, given or implied, has both its left- and right-hand
sides covered by %ABD%.
Now, consider the sample relation in Figure 9. Here, we can see, for
example, that the 4 under %D% in the first tuple is redundant as follows.
Since %A~->>~DE% is given, %A~->>~D% holds for %R%. But now if we let
%t sub 1% be the first tuple in Figure 9 and %t sub 2% be the last, we have
%t sub 1 (A)% = %t sub 2 (A)%, %t sub 1 (D)% = %t sub 2 (D)%, and
%t sub 1 (B)% %/=% %t sub 2 (B)%. Intuitively, we can see that the first 4
under %D% in Figure 9 is redundant because we can erase it, and then, from
what is left and from %A~->>~D%, we can determine that the erased value must
have been a 4, assuming that the relation is valid.
.sp
.PP
We have defined what it means for an atomic data value to be redundant because
of an FD and because of an MVD. Before continuing, we define what it means
for a scheme to have potential redundancy.
.sp
.LP
\fBDefinition 12.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree (which may be degenerate) such that
%Aset(T)~subsetorequal~U%.
%T% is said to have \fIpotential redundancy\fR with respect to %M~smallunion~F%
if there exists a redundant atomic value in any valid nested relation
for %T% caused by either
an FD or an MVD that holds for %T% with respect to %M% and %F%.
.NH 2
Redundancy Characterization of 4NF, BCNF, and NNF
.sp
.LP
\fBTheorem 3.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %R% be a relation scheme such that %R~subsetorequal~U%.
%R% has no potential redundancy with respect to %M~smallunion~F% if and only
if %R% is in 4NF with respect to %M~smallunion~F%.
.LP
\fBProof.\fR
(if)
Assume %R% has potential redundancy with respect to %M~smallunion~F%.
Then there exists a non-empty, valid relation %r% over %R% such that
%r% has a redundant atomic value %v% caused by an FD or an MVD that holds
for %R%. Since an FD is also an MVD, without loss of generality, we
assume that an MVD %X~->>~Y% that holds for %R% causes the redundancy.
Since %r% has a redundant atomic value %v% caused by %X~->>~Y% and the
total unnesting of a flat relation is the identity mapping,
%r% must have distinct tuples %t sub 1% and %t sub 2% such that
%t sub 1 (XY)% = %t sub 2 (XY)% and %t sub 1 (Z)% %!=% %t sub 2 (Z)%,
where %Z% = %R~-~(XY)%, and there exists an attribute of %A~in~Y% and
%A~notin~X% such that %t sub 1 (A)% = %v%, %t sub 2 (A)% = %v'%, and %v% = %v'%.
Since %A~in~Y% and %A~notin~X%, %Y~notsubsetorequal~X%.
Since %t sub 1 (XY)% = %t sub 2 (XY)% and %t sub 1 (Z)% %!=% %t sub 2 (Z)%,
%R~-~(XY)% %!=% %emptyset%.
Since %Y~notsubsetorequal~X% and %R~-~(XY)% %!=% %emptyset%,
%X~->>~Y% is nontrivial.
Since %R% is in 4NF with respect to %M~smallunion~F% and
%X~->>~Y% is nontrivial, %X~->~R%.
Since %t sub 1 (X)% = %t sub 2 (X)% and %X~->~R%, %t sub 1 (R)% = %t sub 2 (R)%.
Since %t sub 1 (R)% = %t sub 2 (R)% and %Z% = %R~-~(XY)%,
%t sub 1 (Z)% %=% %t sub 2 (Z)% %dash% a contradiction. Hence,
%R% has no potential redundancy with respect to %M~smallunion~F%.
.PP
(only if)
We shall prove the contrapositive: if %R% is not in 4NF
with respect to %M~smallunion~F%, then %R% has potential redundancy with
respect to %M~smallunion~F%. Since %R% is not in 4NF with respect to
%M~smallunion~F%, there is a nontrivial MVD %X~->>~Y% that holds for
%R% such that %X~notFD~R%.
Since %X~notFD~R%, there is an attribute %A~in~R% such that %X~notFD~A%. Since
%R~subsetorequal~U%, %A~in~U%. Since %A~in~U% and %M~smallunion~F% is
a set of dependencies over %U%, there exists a %Z'% in
DEP%"" sub {M~smallunion~F}%(%X%), the dependency basis for %X% with respect to
%M~smallunion~F%, that contains %A%. Since %A~in~R% and
%A~in~Z'%, %Z'~smallinter~R% %!=% %emptyset%. Let %Z% = %Z'~smallinter~R%.
Now, consider a relation %r% on %R% that has two tuples %t sub 1% and %t sub 2%.
Let %t sub 1% be a row of all 1's, %t sub 2 (R~-~Z)% = %t sub 1 (R~-~Z)%, and
%t sub 2 (Z)% be all 2's.
Since %Z'% is in DEP%"" sub {M~smallunion~F}%(%X%), which is a
partition of %U%, and for each attribute %A'% in %X%, %A'% is in
DEP%"" sub {M~smallunion~F}%(%X%) and %Z'% contains %A%
that is not in %X%,
therefore, %Z'% and %X% are disjoint.
Since %Z~=~Z'~smallinter~R%, %X~subsetorequal~R%, and
%Z'~smallinter~X% = %emptyset%, %X~subsetorequal~(R~-~Z)%.
Since %X~subsetorequal~(R~-~Z)% and %t sub 1 (R~-~Z)% = %t sub 2 (R~-~Z)%,
%t sub 1 (X)% = %t sub 2 (X)%.
.PP
We claim that %r% is valid with respect to %M~smallunion~F%.
We prove this claim by contradiction as follows.
Since an FD is also an MVD, without loss of generality, we assume that %r% does
not satisfy an MVD %V~->>~W% on %R% implied by %M~smallunion~F%. By the
construction of %r% and since %r% does not satisfy %V~->>~W%,
%V~subsetorequal~(R~-~Z)% and both %Z~smallinter~(W~-~V)% and
%Z~smallinter~(R~-~(VW))% are nonempty proper subsets of %Z%.
Since %V~->>~W% is an MVD implied by %M~smallunion~F% on %R%,
there exists an MVD %V~->>~W'% in %(M~smallunion~F) sup +% such that %W% =
%W'~smallinter~R%.
Since %V~subsetorequal~(R~-~Z)%, %V~smallinter~Z% = %emptyset%.
Since %V~smallinter~Z% = %emptyset% and %Z~smallinter~(W~-~V)% is a nonempty
proper subset of %Z%, %Z~smallinter~W% is a nonempty proper subset of %Z%.
Since %Z~smallinter~W% is a nonempty proper subset of %Z%, %Z% =
%Z'~smallinter~R%, and %W% = %W'~smallinter~R%, %Z'~smallinter~W'%
is a nonempty proper subset of %Z'%.
Since %X~->>~Z'%, %X~->>~X(U~-~(XZ'))%.
Since %Z~=~Z'~smallinter~R% and %V~subsetorequal~(R~-~Z)%,
%V~smallinter~Z'% = %emptyset%. Since %V~smallinter~Z'% = %emptyset%,
%V~subsetorequal~X(U~-~(XZ'))%. Since %V~->>~W'% and
%V~subsetorequal~X(U~-~(XZ'))%, %X(U~-~(XZ'))~->>~W'%.
Since %X~->>~X(U~-~(XZ'))% and %X(U~-~(XZ'))~->>~W'%, %X~->>~W'~-~X(U~-~(XZ'))%.
Also, since for any given sets of attributes %X%, %W'%, %Z'%, and %U%
such that %X%, %W'%, and %Z'% are subsets of %U%, %W'~-~(X(U~-~(XZ')))% =
%(Z'~smallinter~W')~-~X%, and hence, %X~->>~(Z'~smallinter~W')~-~X%.
Since %Z'% and %X% are disjoint, %(Z'~smallinter~W')~-~X% =
%Z'~smallinter~W'% and thus
%X~->>~Z'~smallinter~W'%. However, %Z'~smallinter~W'% is a nonempty
proper subset of %Z'% which implies that %Z'% is not an element in
DEP%"" sub {M~smallunion~F}%(%X%) %dash% a contradiction. Hence, %r% is
valid with respect to %M~smallunion~F%.
.PP
Since %X~->>~Y% on %R% is implied by %M~smallunion~F%,
there exists an MVD %X~->>~Y'% in %(M~smallunion~F) sup +% such that
%Y% = %Y'~smallinter~R%. Since %X~->>~Y'% is in %(M~smallunion~F) sup +%,
%Y'% is a union of some elements in DEP%"" sub {M~smallunion~F}%(%X%).
Now we have two cases to consider, either %Z'~subsetorequal~Y'% or
%Z'~notsubsetorequal~Y'%.
.IP (1)
Suppose %Z'~subsetorequal~Y'% and consider the MVD %X~->>~U~-~(XZ')%. Since
%Z% = %Z'~smallinter~R% and %X~subsetorequal~R%, %(U~-~(XZ'))~smallinter~R% =
%R~-~(XZ)%. Thus, by Lemma 1, %X~->>~R~-~(XZ)%
is an MVD implied by %M~smallunion~F% on %R%. Since %X~->>~Y% is a nontrivial
MVD on %R%, %R~-~(XY)% is nonempty. Since %Y% = %Y'~smallinter~R%, %Z% =
%Z'~smallinter~R%, and %Z'~subsetorequal~Y'%,
%(R~-~(XY))% %subsetorequal% %(R~-~(XZ))%. Since %R~-~(XY)% %!=% %emptyset% and
%(R~-~(XY))% %subsetorequal% %(R~-~(XZ))%, %R~-~(XZ)% %!=% %emptyset%.
Since %t sub 1 (R~-~Z)% = %t sub 2 (R~-~Z)%,
%t sub 1 (R~-~(XZ))% = %t sub 2 (R~-~(XZ))%.
Since %R~-~(XZ)% is nonempty, let %B% be an attribute in %R~-~(XZ)%.
Since %t sub 1 (R~-~(XZ))% = %t sub 2 (R~-~(XZ))% and %t sub 1% is a row of all
1's, %t sub 1 (B)% = %t sub 2 (B)% = 1. Since %r% is valid,
%t sub 1 (X)% = %t sub 2 (X)%, %t sub 1 (R~-~(XZ))% = %t sub 2 (R~-~(XZ))%, and
%t sub 1 (Z)% %/=% %t sub 2 (Z)%, the 1 in %t sub 1 (B)%
is a redundant atomic value in %r% caused by %X~->>~R~-~(XZ)%.
Hence, %R% has potential redundancy with respect to %M~smallunion~F%.
.IP (2)
Suppose %Z'~notsubsetorequal~Y'%.
Since %Z'~notsubsetorequal~Y'%, %Z'% is in DEP%"" sub {M~smallunion~F}%(%X%),
and %X~->>~Y'%, %Z'~smallinter~Y'% = %emptyset%; otherwise %Z'% is not minimal.
Since %Y~subsetorequal~Y'% and %Z'~smallinter~Y'% = %emptyset%,
%Z'~smallinter~Y% = %emptyset%.
Since %Z% = %Z'~smallinter~R% and %Z'~smallinter~Y% = %emptyset%,
%Z~smallinter~Y% = %emptyset%. Thus,
since %Y~subsetorequal~R% and %Z~smallinter~Y% = %emptyset%,
%Y~subsetorequal~(R~-~Z)%.
Since %t sub 1 (R~-~Z)% = %t sub 2 (R~-~Z)% and %Y~subsetorequal~(R~-~Z)%,
%t sub 1 (Y)% = %t sub 2 (Y)%.
Since %X~->>~Y% is nontrivial, %Y~-~X% is nonempty.
Let %B% be an attribute in %Y~-~X%.
Since %Y~subsetorequal~(R~-~Z)%, %t sub 1 (R~-~Z)% = %t sub 2 (R~-~Z)%,
and %t sub 1% is a row of all 1's, %t sub 1 (B)% = %t sub 2 (B)% = 1.
Since %Z'% and %X% are disjoint and %Z~subsetorequal~Z'%,
%Z~smallinter~X% = %emptyset%.
Since %Z~smallinter~Y% = %emptyset%, %Z~smallinter~X% = %emptyset% and
%Z~subsetorequal~R%,
%Z~subsetorequal~(R~-~(XY))%. Since %t sub 1 (Z)% %/=% %t sub 2 (Z)% and
%Z~subsetorequal~(R~-~(XY))%, %t sub 1 (R~-~(XY))% %!=% %t sub 2 (R~-~(XY))%.
Since %r% is valid, %t sub 1 (X)% = %t sub 2 (X)%, %t sub 1 (Y)% =
%t sub 2 (Y)%, and %t sub 1 (R~-~(XY))% %/=% %t sub 2 (R~-~(XY))%,
the 1 in %t sub 1 (B)% is a redundant atomic value in %r% caused by %X~->>~Y%.
Hence, %R% has potential redundancy with respect to %M~smallunion~F%. %square%
.sp
.LP
\fBTheorem 4.\fR
Let %U% be a set of attributes.
Let %F% be a set of FDs over %U%.
Let %R% be a relation scheme such that %R~subsetorequal~U%.
%R% has no potential redundancy with respect to %F% if and only
if %R% is in BCNF with respect to %F%.
.LP
\fBProof.\fR
If we are only given FDs, Definition 8 and Definition 9 are equivalent.
Therefore, the result follows immediately from Theorem 3. %square%
.sp
.PP
To obtain a redundancy characterization for NNF, we must first concern
ourselves with consistent scheme trees, as we call them.
Consistency concerns do not arise for flat relation schemes because,
unlike nested relation schemes, flat relation schemes imply no nontrivial
FDs and no nontrivial MVDs.
Indeed, since nested relation schemes imply no nontrivial
FDs either, we only need to be
concerned about MVDs to ensure that we have consistent scheme trees.
.sp
.LP
\fBDefinition 13.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
Let %D% be the set of MVDs and FDs that hold for %T% with respect to %M% and
%F%. A scheme tree %T% is \fIconsistent\fR with %M% and %F% if %D% implies
%MVD(T)% on %Aset(T)%.
.sp
.PP
Intuitively, we should only be interested in consistent scheme trees.
We should not want any scheme tree that implies nontrivial MVDs
unless the implied nontrivial MVDs are given or implied by given constraints.
The next example illustrates this point and also shows that, if we allow
inconsistent scheme trees, then our definition of redundancy does not
give a precise characterization of NNF.
.sp
.LP
\fBExample 11.\fR
Let %T% be the scheme tree in Figure 3, which implies that
%MVD(T)% = {%A~->>~B%, %A~->>~CD%, %AC~->>~D%}.
Let %M% be the empty set of MVDs and %F% be the empty set of FDs.
Since both %M% and %F% are empty, the set of dependencies %D% that hold for %T%
includes only trivial dependencies. However, %MVD(T)% contains nontrivial MVDs,
and thus, %D% is not equivalent to %MVD(T)% %smallunion% %FD(T)%. Hence,
Condition 1 is not satisfied, and thus
%T% is not in NNF. However, since the only constraints that hold are trivial,
there is no potential redundancy. Thus, we have a scheme tree that has no
potential redundancy, but is not in NNF. %T%, however, is not consistent
with %M% and %F%. Indeed, %T% adds semantics that should not be there
because %MVD(T)% causes some MVDs to hold that do not hold
for %T% with respect to %M% and %F%.
.sp
.PP
If we can assume that scheme trees are consistent with a given set of
MVDs and FDs,
we are able to obtain a precise characterization of redundancy in nested
relations. For then, as Theorem 5 below shows, a
scheme tree %T% has no potential redundancy
if and only if %T% satisfies the first two conditions of our NNF definition.
.sp
.LP
\fBTheorem 5.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%, and let %T%
be consistent with %M% and %F%. %T% has no potential redundancy
with respect to %M~smallunion~F% if and only if %T% satisfies Conditions 1
and 2 of our NNF definition.
.LP
\fBProof.\fR
(Sketch; this is lengthy and can be found in full in
.[ [
Embley Mok Ng Technical Report
.]].)
.PP
(if)
If all the nontrivial FDs that hold for %T% satisfy Condition 2,
there can be no redundancy caused by FDs.
We can establish this based on an argument that
assumes the contrary and then constructs a counterexample along the lines of
Example 6. Using a similar approach for Condition 1
along the lines of Example 5, we can establish that no MVD implied by
%MVD(T)~smallunion~FD(T)% can cause redundancy.
.PP
(only if)
This is similar to the proof of the only-if part of Theorem 3.
%square%
.sp
.PP
We observe here that since
our definition of NNF implies both 4NF and BCNF, Theorems 3 and 4
are immediate consequences of Theorem 5. However, space limitations do not
allow us to proceed in this way. Thus, we have included proofs for
Theorems 3 and 4, and we leave the full proof of Theorem 5 for another paper.
.NH
Conclusions and Related/Future Work
.PP
In this paper, we defined a new nested normal form, and
we formalized the notion of redundancy for both flat and nested relations.
Moreover, we have found necessary and sufficient conditions to detect
redundancy in nested relations. Since flat relations are also nested relations,
we have found necessary and sufficient conditions to detect redundancy
in flat relations as well. Furthermore, we have proved that if a
scheme tree is degenerate, then our NNF definition is equivalent to
Fagin's definition
of 4NF for a given set of MVDs and FDs and is equivalent to BCNF for a given
set of FDs only. We have therefore generalized and unified the established
results of normalization theory.
.PP
Concurrent with our investigation into unifying normalization theory,
we are also studying algorithms for obtaining NNF schemes for a given set of
attributes and a given set of MVDs and FDs.
Since our NNF definition provides a high degree of flexibility, it naturally
leads to many different algorithms for generating nested relation
schemes in NNF. The algorithms would have different
characteristics depending on their input assumptions. For example, if we
assume that in practical cases the MVDs are generated from acyclic hypergraphs
and the FDs are embedded in hypergraph edges, we can create an algorithm
that more efficiently generates nested relation schemes in NNF than if we have
no assumptions. For practical cases, we should also be concerned about
the failure of the universal relation assumption and about application-dependent
choices for attribute clustering, for example, maximal
clustering or clustering with respect to user-chosen roots.
We have written an initial paper that begins to address the issues for
the practical cases
.[ [
Mok Embley Ng Hong Kong
.]],
and have given some thought about how to proceed when the universal
relation assumption is not satisfied
.[ [
Light in preparation
.]],
but we leave to later papers a full exploration of these ideas. These later
papers should also explore issues such as lossless-join decompositions,
dependency preservation, and heuristics for obtaining good results under
given application assumptions.
.[
$LIST$
.]
an initial paper that begins to address the issues for
the practical cases
.[ [
Mok Embley Ng Hong Kong
.]],
and have given some thought about how to proceed when the universal
relation assumption is not satisfied
.[ [
Light in preparation
.]],
but we leave to later papers a full exploration of these ideas. These later
papers shouDE10.refs000066600003200000027000000047640547220561000134140ustar00liddlpstudent00000000000000%A P. Atzeni
%A V. DeAntonellis
%T Relational Database Theory
%I The Benjamin/Cummings Publishing Company, Inc.
%C Redwood City, California
%D 1993
%A S. Abiteboul
%A N. Bidoit
%T Non-first normal form relations: an algebra allowing data restructuring
%J Journal of Computer and System Sciences
%V 33
%D 1986
%P 361-393
%A E.F. Codd
%T Further normalization of the data base relational model
%B Data Base Systems
%E R. Rustin
%I Prentice-Hall
%C Englewood Cliffs
%D 1972
%P 33-64
%A D.W. Embley
%A W.Y. Mok
%A Y.K. Ng
%T An Improved Normal Form for Nested Relations
%R Technical Report BYU-CS-93-2, Department of Computer Science, Brigham Young University
%C Provo, Utah
%D February 1992
%A R. Fagin
%T Multivalued dependencies and a new normal form for relational databases
%J ACM Transactions on Database Systems
%V 2
%N 3
%D 1977
%P 262-278
%A H.F. Korth
%A A. Silberschatz
%T Database System Concepts, Second Edition
%I McGraw-Hill, Inc.
%C New York, New York
%D 1991
%A J. Light
%T A Relational Database Design Tool for Object-Relationship Models
%R Master's Thesis, Department of Computer Science, Brigham Young University
%C Provo, Utah
%O (in preparation)
%A D. Maier
%T The Theory of Relational Databases
%I Computer Science Press
%C Rockville, Maryland
%D 1983
%A W.Y. Mok
%A Y.K. Ng
%A D.W. Embley
%T An improved nested normal form for use in object-oriented software systems
%I Proceedings of the Second International Computer Science Conference: Data and Knowledge Engineering: Theory and Applications
%C Hong Kong
%D 13-16 December 1992
%P 446-452
%A Z.M. Ozsoyoglu
%A L. Yuan
%T A new normal form for nested relations
%J ACM Transactions on Database Systems
%V 12
%N 1
%D March 1987
%P 111-136
%A Z.M. Ozsoyoglu
%A L. Yuan
%T On the normalization in nested relational databases
%J Lecture Notes in Computer Science #361
%I Springer Verlag
%D 1989
%P 243-271
%A M.A. Roth
%A H.F. Korth
%T The design of non-1NF relational databases into nested normal form
%J Proceedings of the 1987 ACM-SIGMOD Conference
%C San Francisco
%D May 1987
%P 143-159
%A M.A. Roth
%A H.F. Korth
%A A. Silberschatz
%T Extended algebra and calculus for nested relational databases
%J ACM Transactions on Database Systems
%V 13
%N 4
%D December 1988
%P 389-417
%A M.W. Vincent
%A B. Srinivasan
%T Redundancy and the justification for fourth normal form in relational databases
%J Proceedings of the Second International Computer Science Conference: Data and Knowledge Engineering: Theorey and Applications
%C Hong Kong
%D December 1992
%P 432-438
Francisco
%ise for flat relation schemes because,
unlike nested relation schemes, flat relation schemes imply no nontrivial
FDs and no nontrivial MVDs.
Indeed, since nested relation schemes imply no nontrivial
FDs either, we only need to be
concerned about MVDs to ensure that we have consistent scheme trees.
.sp
.LP
\fBDefinition 13.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
Let %D% be the set of MVDs and FDs that hold for %T% with respect to %M% and
%F%. A scheme tree %T% is \fIconsistent\fR with %M% and %F% if %D% implies
%MVD(T)% on %Aset(T)%.
.sp
.PP
Intuitively, we should only be interested in consistent scheme trees.
We should not want any scheme tree that implies nontrivial MVDs
unless the implied nontrivial MVDs are given or implied by given constraints.
The next example illustrates this point and also shows that, if we allow
inconsistent scheme trees, then our definition of redundancy does not
give a precise characterization of NNF.
.sp
.LP
\fBExample 11.\fR
Let %T% be the scheme tree in Figure 3, which implies that
%MVD(T)% = {%A~->>~B%, %A~->>~CD%, %AC~->>~D%}.
Let %M% be the empty set of MVDs and %F% be the empty set of FDs.
Since both %M% and %F% are empty, the set of dependencies %D% that hold for %T%
includes only trivial dependencies. However, %MVD(T)% contains nontrivial MVDs,
and thus, %D% is not equivalent to %MVD(T)% %smallunion% %FD(T)%. Hence,
Condition 1 is not satisfied, and thus
%T% is not in NNF. However, since the only constraints that hold are trivial,
there is no potential redundancy. Thus, we have a scheme tree that has no
potential redundancy, but is not in NNF. %T%, however, is not consistent
with %M% and %F%. Indeed, %T% adds semantics that should not be there
because %MVD(T)% causes some MVDs to hold that do not hold
for %T% with respect to %M% and %F%.
.sp
.PP
If we can assume that scheme trees are consistent with a given set of
MVDs and FDs,
we are able to obtain a precise characterization of redundancy in nested
relations. For then, as Theorem 5 below shows, a
scheme tree %T% has no potential redundancy
if and only if %T% satisfies the first two conditions of our NNF definition.
.sp
.LP
\fBTheorem 5.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%, and let %T%
be consistent with %M% and %F%. %T% has no potential redundancy
with respect to %M~smallunion~F% if and only if %T% satisfies Conditions 1
and 2 of our NNF definition.
.LP
\fBProof.\fR
(Sketch; this is lengthy and can be found in full in
.[ [
Embley Mok Ng Technical Report
.]].)
.PP
(if)
If all the nontrivial FDs that hold for %T% satisfy Condition 2,
there can be no redundancy caused by FDs.
We can establish this based on an argument that
assumes the contrary and then constructs a counterexample along the lines of
Example 6. Using a similar approach for Condition 1
along the lines of Example 5, we can establish that no MVD implied by
%MVD(T)~smallunion~FD(T)% can cause redundancy.
.PP
(only if)
This is similar to the proof of the only-if part of Theorem 3.
%square%
.sp
.PP
We observe here that since
our definition of NNF implies both 4NF and BCNF, Theorems 3 and 4
are immediate consequences of Theorem 5. However, space limitations do not
allow us to proceed in this way. Thus, we have included proofs for
Theorems 3 and 4, and we leave the full proof of Theorem 5 for another paper.
.NH
Conclusions and Related/Future Work
.PP
In this paper, we defined a new nested normal form, and
we formalized the notion of redundancy for both flat and nested relations.
Moreover, we have found necessary and sufficient conditions to detect
redundancy in nested relations. Since flat relations are also nested relations,
we have found necessary and sufficient conditions to detect redundancy
in flat relations as well. Furthermore, we have proved that if a
scheme tree is degenerate, then our NNF definition is equivalent to
Fagin's definition
of 4NF for a given set of MVDs and FDs and is equivalent to BCNF for a given
set of FDs only. We have therefore generalized and unified the established
results of normalization theory.
.PP
Concurrent with our investigation into unifying normalization theory,
we are also studying algorithms for obtaining NNF schemes for a given set of
attributes and a given set of MVDs and FDs.
Since our NNF definition provides a high degree of flexibility, it naturally
leads to many different algorithms for generating nested relation
schemes in NNF. The algorithms would have different
characteristics depending on their input assumptions. For example, if we
assume that in practical cases the MVDs are generated from acyclic hypergraphs
and the FDs are embedded in hypergraph edges, we can create an algorithm
that more efficiently generates nested relation schemes in NNF than if we have
no assumptions. For practical cases, we should also be concerned about
the failure of the universal relation assumption and about application-dependent
choices for attribute clustering, for example, maximal
clustering or clustering with respect to user-chosen roots.
We have written an initial paper that begins to address the issues for
the practical cases
.[ [
Mok Embley Ng Hong Kong
.]],
and have given some thought about how to proceed when the universal
relation assumption is not satisfied
.[ [
Light in preparation
.]],
but we leave to later papers a full exploration of these ideas. These later
papers should also explore issues such as lossless-join decompositions,
dependency preservation, and heuristics for obtaining good results under
given application assumptions.
.[
$LIST$
.]
an initial paper that begins to address the issues for
the practical cases
.[ [
Mok Embley Ng Hong Kong
.]],
and have given some thought about how to proceed when the universal
relation assumption is not satisfied
.[ [
Light in preparation
.]],
but we leave to later papers a full exploration of these ideas. These later
papers shou