Data modeling

The process model, through the use of DFDs, showed the flow of data within a process, but it told us few specifics about the nature of the data itself.

Typically, we supplement the process model with a data model - something geared to very precisely describe all the data elements in a system and the relationships between them.

For modeling the data, we use an entity relation diagram, or ERD. As with the DFDs, there is generally a logical model, produced during the analysis phase, and a physical model produced during the design phase.

In creating an ERD, we first group all the information in the system into logical units. For example, if the system was for booking appointments at an auto repair shop, we might decide that the logical units are

In doing this, we are creating an entity name for the logical group of data, and a list of attributes associated with the entity.

We can then show relationships between different entities by drawing labelled lines between them.

Entity
An entity represents some thing in our system, and is shown using an (uppercase) identifier within a rectangle.

We usually subdivide the entities into three subgroups: the associative entities, independent entities, and dependent entities.

Independent entities can exist on their own: e.g. a client entity can exist independent of the other entities we create.

Associative entities typically exist to store information about many-to-many relationships between two other entities. For example, suppose we have an entity to represent the actual problem in a car, another entity to represent a symptom, and we wish to also represent the severity of a symptom. The severity can't really be made an attribute of symptoms, since it's different depending on which problem is causing the symptom. It also can't be made an attribute of the problem, since then all the problem symptoms would have the same severity ... The solution is to make the severity an entity on its own, one which is "attached" to the relationship between the problem and the symptom:

            +--------+
            |SEVERITY|
            +--------+
                |
    +-------+   |     +-------+
    |PROBLEM|---+-----|SYMPTOM|
    +-------+         +-------+
Dependent entities cannot exist without certain other entities. Many associative entities fal into this category: the idea of severity makes no sense if not related to a symptom.

Attribute
An attribute is a property of an entity. The key point with attributes is to only break them down to their "most useful level of detail" - i.e. the finest grain at which their values actually make a difference to system behaviour.

The attribues are usually listed within the entity block, e.g.

     +-----------+
     |APPOINTMENT|
     +-----------+
     | start time|
     | duration  |
     | location  |
     +-----------+

Relationship
A relationship shows the association between two entities, and is indicated using a line between two entities, with a label to indicate the nature of the relationship.

For example, suppose we have a car dealership, and we include one entity called VENDOR and another called CAR:

    +------+
    |VENDOR|
    +------+
       |
       | sells
       |
     +---+
     |CAR|
     +---+

Relationships have cardinality, which may be one-to-one, one-to-many, or many-to-many, indicating how many different types of entity may hold the relationship to other entities. For example, one mother may have many children, but each child has only one mother. The one-to-many relationship is indicated as shown below:

         +------+
         |mother|
         +------+
            |       
            |
           /|\
         +-----+
         |child|
         +-----+
Observe that a single line is used to indicate one, while a "fork" is used to indicate many.

Relationships also have modality, indicating the minimum number of relationships that must exist for the data to be valid. For example, in our auto-repair case we don't include people who have never had an appointment with our shop, i.e. to be a valid client you must have had at least one appointment -- therefore the modality is 1. Typically we are most interested in tracking whether the modality is 0 or 1 (i.e. does the relationship have to hold in at least one case for the data to be valid?). The modality is usually represented by annotating the relationship with a 0 or 1.

Data Design Questions

Some of the questions you may ask yourself when trying to create a data model include:

  1. What types of people, places, things, materials, events, etc. does your business need to maintain information about?

  2. For each type of thing you identified in question 1, what specific items of information does your business need to keep track of? (For instance, you may need to keep track of your employees' S.I.N. numbers, but you probably don't need to keep track of non-work-related facts about them.)

  3. When your business needs to keep track of information on different items of the same type (e.g. different customers, different salespeople, etc), what unique characteristic or combination of characteristics allows you to identify them individually (e.g. an employee number, a customer id, a customer's name+address, etc)?

  4. Where/when is the data created (from the point of view of your business)?

  5. When/where/how is the data used?

  6. How are the various types of item related to one another (from the perspective of your business)?

  7. Are there special cases with any of the data items (either in the treatment, generation, or use of the data)?

(Note: I've expressed these questions in the context of a business system, but similar ideas hold for whatever kind of system you are trying to model.)

Building ERDs

The ERDs are typically established by going through the process model to identify all the relevant entities.

Once the entities have been established we can go through and attach appropriate attributes, and identify relationships between the entities.

This is very much an iterative process: refining our description of exactly what the data elements are, what properties they have, and how they are related.

Some common guidelines when building your ERDs include

The Data Dictionary

Once again, the ERD is a tool to clarify the structure and relationships of data within the system, but the diagram by itself does not convey everything about the data.

We need to include supporting text (often using standard forms for each entity and/or relationship) describing all the relevant details.

It is common practice to supplement your ERDs with a full data dictionary, which is essentially a lookup table with precise information on each data item in the system: its name, where it is produced, where it is used, relationships to other data items, a clear and intuitive text description, and a full set of cross references to appropriate DFDs and ERDs.


Example ERD

The (very simple) ERD below considers the logical data necessary in a prescription billing system. It assumes there are five logical data entities we are interested in: doctors, patients, insurers, treatments, and prescriptions.

In the example shown, each of the relationships is one-to-many (indicated by the 1 or N on the connections representing relationships).

Note from Dave:

the fields below are pretty sparsely described - this might be the ERD after a "first pass". On subsequent passes the descriptions would become much more detailed.

In terms of supporting text for such an ERD, we might include something like:

Overview:

Our system, designed to maintain prescription billing information, models the interaction of patients with their doctors and insurers, and the associated treatments and prescriptions.

The diagram above provides an intuitive view of the model, with further details provided in the descriptions below. Additional information on the data involved can be found in the Data Dictionary (include page number or link).

Entities:

Our system includes five data entities, representing the doctors, their patients, the insurers of the patients, the patients' treatments, and the associated prescriptions. Each entity is described below:

Relationships:

Our system includes four relationships: a doctor treats a patient, a patient is insured by an insurer, a treatment is given to a patient, and a treatment includes a prescription. Each relationship is described below:


Example ERD

As a more complex example, suppose we were to consider an ERD for campus timetabling and room booking along the following lines:

The system we plan on developing will take care of timetabling all the courses and selecting all the rooms each year after a list of course offerings, sections, and instructors is determined.

For the ERD, we are simply concerned with how to effectively model all the information listed above, so the rest of our system can use it efficiently.