database design - Relational data modeling for sub types -
i learning relational model , data modeling.
, have confusion in mind regarding sub types.
i know data modeling iterative process , there many different ways model things.
don't know how choose between different options.
example
suppose want model particles (molecule, atom, proton, neutron, electron, ...).
let's ignore quark , other particles simplicity.
since particles of same type behave same, not going model individual particles.
put in way, not going store every hydrogen atom.
instead, store hydrogen, oxygen , other atom types.
going model particle types , relationships between them.
i using word "type" carelessly.
hydrogen atom instance. hydrogen type. hydrogen type of atom.
yes, there hierarchy of types involved. , ignoring lowest level (individual particles).
approaches
i can think of several approaches model them.
1. 1 table (relation, entity) each type of things (particle types).
1.1 first approach comes mind.
proton (proton)
neutron (neutron)
electron (electron)
atom (atom)
atom_proton (atom, proton, quantity)
atom_neutron (atom, neutron, quantity)
atom_electron (atom, electron, quantity)
molecule (molecule)
molecule_atom (molecule, atom, quantity)
1.2 since there 1 kind of proton/neutron/electron, can simplify it.
atom (atom, protonquantity, neutronquantity, electronquantity)
molecule (molecule)
molecule_atom (molecule, atom, quantity)
in simplified model, facts proton lost.
2. things in 1 table, associative tables representing relationships between them.
2.1 1 associative table each relationship
particle (particle)
atom_proton(particle, particle, protonquantity)
atom_neutron(particle, particle, neutronquantity)
atom_electron(particle, particle, electronquantity)
molecule_atom (particle, particle, atomquantity)
2.2 single associative table
particle (particle)
particlecomposition (particle, particle, quantity)
this simplification doesn't lose anything. think it's better.
if there're facts specific atom_proton/atom_neutron/atom_electron, 2.1 may better.
2.3 combine 2.1 , 2.2
particle (particle)
atom_proton (particle, particle, other attributes)
atom_neutron (particle, particle, other attributes)
atom_electron (particle, particle, other attributes) molecule_atom (particle, particle, other attributes)
particlecomposition(particle, particle, quantity, other attributes)
in approach, common attributes particle composition go in particlecomposition,
while special attributes particle composition go in special tables.
3. use sub type tables.
3.1 table base type particle, , additional tables sub types (atom, molecule, ...).
particle (particle)
proton (particle, other attributes)
neutron (particle, other attributes)
electron (particle, other attributes)
atom (particle, other attributes)
molecule (particle, other attributes)
atom_proton (particle, particle, protonquantity)
atom_neutron (particle, particle, neutronquantity)
atom_electron (particle, particle, electronquantity)
molecule_atom (particle, particle, atomquantity)
3.2 can combine atom_xxxquantity tables in atom , remove pronton/neutron/electron.
particle (particle)
atom (particle, protonquantity, neutronquantity, electronquantity)
molecule (particle, other attributes)
molecule_atom (particle, particle, atomquantity)
it's simpler, information proton/neutron/electron lost in 1.2.
3.3 can change name of molecule_atom make more generic.
particle (particle)
atom (particle, protonquantity, neutronquantity, electronquantity)
molecule (particle, other attributes)
particlecomposition (particle, particle, quantity)
this looks 2.2, additional tables sub types (atom, molecule).
seems 2.2 special case of 3.3.
3.4 can combine above approaches , generic model.
particle (particle)
proton (particle, other attributes)
neutron (particle, other attributes)
electron (particle, other attributes)
atom (particle, other attributes)
molecule (particle, other attributes)
particlecomposition (particle, particle, quantity, other attributes)
atom_proton (particle, particle, other attributes)
atom_neutron (particle, particle, other attributes)
atom_electron (particle, particle, other attributes)
molecule_atom (particle, particle, other attributes)
it seems atom_proton, atom_neutron, atom_electron , molecule_atom can thought of sub types of particlecomposition.
this approach complex one, contains many tables each table has role.
questions
- does of above designs break rules of relational model?
- which approach best? depend on how think data? depend on requirements?
if depends on requirements, shall choose simplest design @ first , make more generic accommodate new requirements?
although resulting data models share lot of similarities, initial design may influence naming of tables/columns, , domains of keys different.- if choose use 1 table each type of things, choose incompatible keys atom , molecule, such atom weight atom , molecule name molecule.
- if choose use generic approach, may choose common key particles.
changing keys may have greater impact on system, may not easy evolve simple design generic one.
think?
ps: may not appropriate example , solutions may problematic, , there may more variations of approaches, can point hopefully.
if have better designs, please share me.
update 1
what data model?
initially, trying model particles because
- i think there sub-typing relationships between them, looking for.
- they well-understood (?) people.
- it example of how people understand world.
here picture in mind.
i didn't state because not clear trying model either.
firstly thought atom parent of proton/neutron/electron, , molecule parent of atom.
realized composition, not subtyping, , not type hierarchy.
types
i have been thinking types while, grouping , classification.
here quote "sql , relational theory":
so type, exactly? in essence, it’s a named, finite set of values ─ possible values of specific kind: example, possible integers, or possible character strings, or possible supplier numbers, or possible xml documents, or possible relations heading (and on).
people coined name "integer" represent set of integer values.
actually, people coined concepts , names identify things, grouping things can understand/model world.
proton set of real protons, hydrogen set of hydrogen atoms, , on.
in sense, real particles stay @ lowest level of type hierarchy.
trying model particles @ first, stuck because
- i couldn't think of appropriate key identify each real particle;
- there many of them store in database.
so decided ignore real particles , model types instead.
when "a molecule composed of atoms", means "a real h2o molecule composed of 2 real hydrogen atoms , 1 oxygen atom", means "any (type of) molecule composed of (some types of) atoms".
instead of stating every fact real particles, can state facts particle types.
benefit grouping things , coined names (types).
particle type hierarchy sets
the hierarchy can translated set definitions.
second level - types above real particles:
s_proton = { p | p satisfied definition of proton } s_neutron = { n | n satisfied definition of neutron } s_electron = { e | e satisfied definition of electron } s_hydrogen = { h | h satisfied definition of hydrogen } s_oxygen = { o | o satisfied definition of oxygen } s_h2o = { w | w satisfied definition of h2o } s_o2 = { o | o satisfied definition of o2 }
higher levels
using terminology of set theory, type subtype of b if subset of b.
i first thought define atom type as:
s_atom = s_hydrogen union s_oxygen union ...
however, sets relations , elements tuples, union doesn't work if tuples in relations incompatible.
the approaches use subtype tables solve problem , models subset relationship.
but in subtyping approach, atom still in second level.
higher level types defined sets of sets.
s_atom = { s_hydrogen, s_oxygen, ... } s_molecule = { s_h2o, s_o2, ... } s_particle = { s_proton, s_neutron, s_electron, s_atom, s_molecule }
which means particle type of atom, , atom type of hydrogen.
this way, relationships between particles can represented @ high level.
the new data model
4. treat types hierarchy of types
particletype (particletype, name)
particletypehierarchy (particletype, parenttype)
particlecomposition (partiletype, subparticletype, quantity)
sample data:
particletype | particletype | name | |--------------+----------| | particle | particle | | proton | proton | | neutron | neutron | | electron | electron | | atom | atom | | molecule | molecule | | h | hydrogen | | o | oxygen | | h2o | water | | o2 | oxygen | particletypehierarchy | particletype | parenttype | |--------------+------------| | proton | particle | | neutron | particle | | electron | particle | | atom | particle | | molecule | particle | | hydrogen | atom | | oxygen | atom | | h2o | molecule | | o2 | molecule | particlecomposition | partiletype | subparticletype | quantity | |-------------+-----------------+----------| | h | proton | 1 | | h | electron | 1 | | | proton | 2 | | | neutron | 2 | | | electron | 2 | | h2o | h | 2 | | h2o | h | 2 | | h2o | o | 1 | | co2 | c | 1 | | co2 | o | 2 |
for comparison, sample data subtype table approach.
particle | particleid | particlename | |------------+----------------| | h | hydrogen | | | helium | | li | lithium | | | beryllium | | h2o | water | | o2 | oxygen | | co2 | carbon dioxide | molecule | moleculeid | some_attribute | |------------+----------------| | h2o | ... | | o2 | ... | | co2 | ... | atom | atomid | protonquantity | neutronquantity | electronquantity | |--------+----------------+-----------------+------------------| | h | 1 | 0 | 1 | | | 2 | 2 | 2 | | li | 3 | 4 | 3 | | | 4 | 5 | 4 | particlecomposition | particleid | componentid | quantity | |------------+-------------+----------| | h2o | h | 2 | | h2o | o | 1 | | co2 | c | 1 | | co2 | o | 2 | | o2 | o | 2 |
sub-atom
these particle types defined people , people keep defining new concepts model new aspects of reality.
can define "sub-atom" , hierarchy like:
approach 4 can accommodate type hierarchy change more easily.
update 2
the facts record
- there different types of particles in world: protons, neutrons, electrons, atoms, molecules.
- atoms composed of protons, neutrons, , electrons.
- molecules composed of atoms.
- there many different types of atoms: hydrogen, oxygen, ....
- there many different types of molecules: h2o, o2, ....
- a hydrogen atom composed of 1 proton , 1 electron; ...
- a h2o molecule composed of 2 hydrogen atoms , 1 oxygen atom; ...
- different types of particles may have special properties, e.g. atom has atom weight, etc.
- ...
preliminary
good question, thoughtful learner. think after discussion, in order obtain clarity, , data modelling exercise.
i understand progression , including 3.3. what, how, 3.4 (after step-wise progression 3.3) ? me, combine above not equal generic.
rather following progression, , erecting model each step, let me respond trd relevant steps, per discussion.
trd tables, identified keys, , relationships relevant @ stage, think aware of attributes, if any, , keys deployed with. after achieve stable trd, can expand full dm.
after erecting model progression previous one, , upon evaluation, if clear loses information, can safely discarded. there value contemplating such models, step not incorrect. continued discussion of waste. believe demonstrated in previous question.
consider set of table relation diagrams.
1.x
from perspective, a first first reasonable trd worth contemplation.
i don't see how or why proton/neutron/electron independent tables. not exist on own, weights; etc fixed. exist in context of atom.
since every atom comprises @ least 1 proton/neutron/electron, proton/neutron/electron columns can deployed in atom. not drawn. later.
2.x
your progression fine, except 1 glaring error.
common attributes particle composition go in particlecomposition, while special attributes particle composition go in special tables.
no. common attributes particle go in particle. attributes specific relationship (ie. not common) go in particlecomposition. , there no "special attributes particle composition", no "special tables".
3.x
consider b subtype. [3.1] correct, except for:
i don't see how particle has children such proton/neutron/electron. atom has that.
i don't see how particles related other particles (ie. ?). data discussed, molecule made of atoms; atom made of proton/neutron/electron; , particle either molecule xor atom (exclusive subtype).
please correct me if not correct.
refer subtype document full details on subject.
that can c reduced, have stated. holds notion proton/neutron/electron information fixed per atom: there 1 entry each. eg. each shell/energy level not differentiated; 0 acceptable neutrons (instead of null).
- i have discussed great value of predicates previously. main point here is, model identifies predicates. , predicates verify model; great feedback loop. have given predicates, can evaluate them yourself, , check validity of model.
3.3
if d normalised: atom has @ least proton entry; neutron entry optional; , each shell/energy level differentiated.
note difference in predicates.
note although reduction valid technique, not equate normalisation.
3.4
that appears sum total of everything, laid out flat, or flattened view (derived relations, perspective, result set). such fine, understanding. if proposed set of tables, horribly incorrect, due various normalisation errors. which, if corrected, progress [3.3] , [d normalised].
question
does of above designs break rules of relational model?
all of them except [3.3] break number of rules. in category of normalisation errors. there associated identification errors, if have given full model, or create table statements.
but not matter if context data modelling exercise, understanding. if exercise serious, paragraph above stands.
this section presented in accordance guidelines, specifically: correct misinformation whenever see it. did comment on subject post, keep disappearing. have placed here.
erwin smout:
when cut down bare essence, relational model of data has no more 1 "rule" : information in database must represented values of attributes in tuples in relations.
that 1 of rules, yes, enclosing statement patently false.
first, there many essential or first-order rules in relational model. memory, forty.
second, there many second-order rules, ones logically implied first-order rules.
people have technical qualifications , experience, can understand rm, , follow spirit , intent, follow of them.
others may not recognise of first-order rules, or of implied rules.
and there are, evidenced books allege rm, yet others, people actively subvert , diminish rm. ignore second-order rules, , worse, use pharisaic "logic" undermine first-order rules.
here, erwin, well-known efforts regarding rm on comp.databases.theory , ttm, reduces rm 1 pithy rule, , undermines full set of rules, , rm itself. in answer question, if not response, lead readers believe rm makes out be: 1 rule, everything, relational non-relation, "satisfies".
the relational model freely available, can read yourself. let me know if copy. caveat is, terminology out-dated, , needs explained.
second, if 1 boil down 1 rule (impossible, reductionist) or important rule (possible, demeaning), rule not it. 1 of forty or first-order rules, not ranked close top.
however, grant other people may have different ranking, suit own purposes.
what people understand rm do discuss, main difference (not rule) between rm , predecessors, this:
it first have complete mathematical definition (which forms basis, , in flows that).
whereas predecessors related records using physical record ids, rm demands (a) logical keys, made data, , (b) relating rows (not records) logical keys.
it must mentioned, basis upon systems characterised record ids in every file, declared "primary key", non-relational, regression pre-1970 isam record filing systems, thing rm made obsolete. notice also, how primitive systems can made appear "relational", because schizophrenic "logic", "satisfy" 1 quoted rule. honest logic destroys such nonsense.
such record id based systems have become noram in lower end of industry precisely due misinformation. hence willingness correct it.
end misinformation correction section.
which approach best?
formal data modelling, including relational normalisation. method, science, principle, not fragments of nf definitions.
i not perceive proposals different approaches, rather laying out all thoughts in 1 single modelling exercise. , point model starts take serious, feasible shape [3.3].
does depend on how think data?
of course. marriage succeed or fail based on perception have wife, because perception seat of actions. model succeed or fail based on perception of data.
one of great things relational model teaches view (perceive, think about, model) data, data, , nothing data. 1 thing, forms logical key concept.
does depend on requirements?
the first answer is, no, should not depend on requirements. should consider data, scope of limited enterprise (requirement, yes, not functional requirement), , data.
and of course, reasons have detailed elsewhere, data model should match real world, should not limited the functional need agaist data.
the massive error, common reason, failure in oo/orm model, perceives data tiny lens of oo/orm model. fails separate data vs process, , treats data mere "persistence" slave of objects. there many other errors in model, not enumerate here, point is, start position of requirement, , ignore data.
the second answer is, project not commissioned until requirement set, reality if funds requirement-based. mature project leader makes sure requirement contains enough justification analyse , model data, data, separate functions.
if depends on requirements, shall choose simplest design @ first , make more generic accommodate new requirements?
you could, cost awful lot. mature sequence data model right, possible.
if data model matches real world, when changes , additions come up, easy extend. conversely, if data model minimum functional requirement, or if not match real world, changes difficult , costly.
although resulting data models share lot of similarities, initial design may influence naming of tables/columns, , domains of keys different.
of course.
if choose use 1 table each type of things, choose incompatible keys atom , molecule, such atom weight atom , molecule name molecule.
that horrible error. never place in container not match label. never place 2 different things in 1 container (which has 1 label). correct method use common identifier name (which atom- or molecule- or particle-name), , use subtypes.
if choose use generic approach, may choose common key particles.
only if there one. , if there not, stands sign entities not same, generic model cannot used.
changing keys may have greater impact on system, may not easy evolve simple design generic one.
well, idea choose data items stable (not static) form key. , yes, key design important aspect of modelling exercise. if follow relational model, keys form logical structure of database. domain important (i think realise that). , yes, costly change.
which brings main point. precisely reason why keys have modelled , chosen correctly, each table, children.
update 1 & 2
i noticed 2 updates now. not full response, short 1 now.
- up now, understood particle set of atoms plus set of molecules. modelled in d normalised. both have name, common key. subtyped.
but now, given hierarchy diagrams, , sample data (thank you), realise thought meant, , meant, 2 different things. consider updated trd & hierarchy:
your particle set of molecules plus set of atoms plus set of subatomic particles.
that incorrect
there hierarchy, yes, far, exists in sequence of tables, not hierarchy within 1 table.
stated otherwise, 2 sets (atoms, molecules) discrete, each has own set of components, different. there no set includes (except theoretical universal set).
the updated table relation model e normalised • update 2. subtypes have been removed, along particle. supplies requirements stated in update 2. note updated predicates.
your hierarchy diagram incorrect.
your error, have combined hierarchy of classifiers (the structure, container) data (the instances of classifiers; content). can't that. need 2 separate diagrams, 1 container, , second content.
this typical error of oo/orm mindset. failure observe scientific principle separate data vs process. exact same error detailed in response hidders, in previous question. results complex objects, never work.
so hierarchy diagram illegal, 2 different diagrams combined one.
f hierarchy (classification) depicts that, , that.
g hierarchy (sample data) illustrates that, , that.
there difference in style between way depict hierarchies (organisation chart) , way depict them (explorer). 1 ends being wide, other more compact. think can figure out
you had clarity @ end of previous question. novel notion of type in poisonous book has got confused. problem, these issues, have nothing type.
more words called for, respond more time permits.
Comments
Post a Comment