CSCI 375 Spring 2014 Final exam: Question 6

Question 6. Data validation

Suppose you are working on the user interface for a system that needs to capture Canadian postal codes as part of the information entered by the user. Unfortunately, your boss is unwilling to pay for Canada Post's validation services (they provide lists of valid/active postal codes as well as code validation software), and wants you to handle this "in house" instead.

Describe how you would recommend handling the reading and validation of user postal code data given that the basic format rules are as follows:

The code contains six alphanumeric characters plus optional whitespace between the third and fourth characters.
The first character is alphabetic, and then it alternates between digits and alphabetic characters (except for the optional whitespace mentioned above).
The letters W and Z are not used in the first position, and the letters D, F, I, O, Q, and U are never used.

For example X3A 4B7 or M2C3L5 would each be valid under the code format rules.

SAMPLE SOLUTION

The key considerations are that you need to find a way to
capture the user input and a way to validate it, plus
ways to ensure the user understands what to enter and
understands what they have done wrong if they make a
mistake.  Furthermore, you need to justify why your choices
are the best (or at least good) ones.

Of course, how you choose to capture it will have a huge
impact on how clear the options are to the user and on
how you can best validate it - some capture formats
and techniques will be must more difficult to validate.

Perhaps one of the first steps should be to carry out some
research to see if there are existing algorithms/techniques
commonly used to do this, or additional information that can
be used.  For instance, such investigation would reveal that
the first letter identifies a broad geographic region, which
could be validated against the rest of the address information
entered: A is Newfoundland, B is Nova Scotia, ..., V is B.C.,
X is Nunavut & NWT, Y is Yukon, etc.  Furthermore, you would 
find you can submit postal codes to the Google Maps API, and
it will either return a map location (which you could possibly
check against the rest of the address information) or will flag
the postal code as invalid (doing the checking for you!).

There are many possibly capture mechanisms, each with pros and
cons - ideally your answer should reflect why you regard the one
you chose as "superior".

E.g. capture mechanism         Pros                  Cons
(a) Variable-length text  Easy to capture       Widest potential for invalid input
                          Fast typed entry      (punctuation, misplaced alpha vs digit, etc)
                          Can be validated
                          with modest regex
(b) 6-char text           Ensures fixed format  Wrong if user includes space
                                                Not much easier than (b) for validation
(c) 2x3-char text         Clarifies input       Still allows wrong char types in input
                          uses simpler (3char) 
                          regexes for validation
(d) 6 single-char fields  Simplest regexes      Still allows invalid char types
                          per field
(e) Field-selection       Ensures valid input   Selection is slower than
    (1 of 18 char,                              typing for users in a hurry
     digit,
     1 of 20 char,
     digit,
     1 of 20 char,
     digit)