CSCI 260 Fall 2022: Lab 1: Union Find ADT Part 1



Due: Friday Sept 22


Goal:
Task:
  1. Implement the basic (no heuristic), and then the union-by-height implementation of Union-Find ADT (description below. It is also called the Disjoint Sets ADT).
  2. Link it to the testapp program provided.
  3. Test it on data generated by the provided perl program, and design your own input for testing.
  4. Use "git" to submit the tested, debugged code for grading. You need to submit only the file unionfind1.cpp
You are provided with
  1. A testapp program
  2. A C++ file with a stub for the find operation
  3. A C++ header file
  4. A Perl program to generate random test data.
  5. A sample data file for input to the testapp


The Union-Find ADT In this ADT, you have a set of $n$ elements, each (in this case) being an integer. Initially, each element is in its own set, by itself. The operations on the collection of sets are: "Find(i)" must return the "name" of the set containing i. We name a set with some member in the set -- but that name must be the same for any element in that set at that time (the name may change later if the set gets unioned with another set).

Suppose you start with 10 elements. Then doing "find" on 6, say, will return just 6; "find 7" will return 7, etc.

Now suppose you execute a "union 6 7" command, which merges the set containing 6 with the set containing 7, then whichever name you give the set (either 6 or 7), it must be the case that, forever after within the lifetime of this union-find data structure, find(6)=find(7). There is no way to break sets apart once they have been merged (unioned). Convention: For the purposes of this lab, you must always name the set as the first named in the merge command, unless union-by-height is used, and then the first named is used when there is a tie.

The code you have been provided with:

The "find(p)" operation returns an integer name of the set containing item p. Your testapp, however, is not required to print it out.

The "merge(p,q)" operation achieves the following change to the data: if x is in a set named after element p, and y is in a set named afer element q before the call, then after the call, x and y are together in the same -- indeed, all elements in p's and q's sets are now in the same set, and it is named either p or q (whichever is smaller); elements that are not in either p's or q's sets are unchanged. In other words, the sets containing p and q are merged. "merge" rather than "union" is used to preserve the reserve word "union".

The test application ("testapp") will process a number of commands of the form, "u 16 3" and "f 4". Note that in this first part of the lab, the effect of performing the "f 4" or find(4) operation is nil -- i.e., it has no effect. In the next lab, there will be side effects of the find operation.

The operaton "debugprint" prints out the contents of the array -- it is useful for checking correctness and debugging.

Tasks

  1. Generate inputs

    The testapp program will: First, let us find a way to create input files of this time at random.

    Here is an example of an input file, called in.11.8
    11
    u 5 7
    f 7
    f 5
    u 9 10
    f 2
    u 1 10
    u 2 8
    f 1
    d
    The first number is the number of elements. "u 5 7" means execute a "merge(5,7)" operation. "f 7" means execute a "find(7)" operation. The little perl script "gen" provided in your git repository will help you generate input files for testing. Remember to change the mode of the file so it is executable:
    chmod a+rx gen

    To invoke the perl script, use the command
    ./gen

    Do the following on the command line:
    ./gen 20 6
    ./gen 6 20

    What is the difference? Edit the gen file. Note that the hash sign comments out some lines of code (except the first line -- that line is necessary so that the file may be understood as a perl file -- don't change it!) You can uncomment the lines that seem to prompt the user for input, then run the instructions again. What is the first parameter in a call "gen 20 6"? What is the second parameter?

    To direct the output to a file, use ./gen > filenameHere You will want to comment out the user prompts for input (for easy outputting to a file) but it does require two inputs, the first is the number of elements, the second is the number of operations. Note that all the files end in "d", for debugprint.

  2. Implement the Union-Find data structures and algorithms



  3. Challenge: Refine your output

    This is not necessary for Part 1, but if you have time, try to figure out how it can be done.

    The "d" option dumps the array to the screen. What would be more interesting is to see the collection of sets. For example,

    1: 1 3 5 6 8 10 11 12 47 48 53 56 63 68 74 96
    2: 2 4 19 20 21 23 24 25 26 27 29 30 32 37 40 41 44 45 60 64 65 66 70 73 76 92 93
    7: 7 67 75 86 89 91
    14: 14 15 16 17 18 22 33 34 38 39 49
    28: 28
    36: 36
    52: 52
    58: 58
    84: 84
    90: 90


    To time an execution of testappUF on input and put the output to a file called out.11.8, use the command "time testappUF > out.11.8"

Submit your results:

Use "git" to submit your program. The deadline is midnight Sept 29, 2022. Git will not accept late submissions.

Explore Experimental Results:




Use the "time" option on linux to capture the experimental run-time of your algorithm, on various sizes of inputs. The sizes to report are 100 elements, 100 commands; 100 elements 1000 commands; 1000 elements 1000 commands; and 1000 elements 1000000 commands. NOTE: you should remove the "d" from the bottom of each input file before you time your program on it, as the "debugprint" function adds too much to the running time.

Give estimates of the big-O running times that are suggested by these results. Predict running times for other file sizes and check the actual running time. Disable the "union by height" function, and test on the same files for benchmarking. Does "union by height" yeild better results? Give Big-O estimates for both implmentation: union by height, and union with no height heuristic. Remember to enable your union-by-height before you submit. Report these results in comments at the top of your code.