CSCI 260 Fall 2022: Assignement 2

Assignment 2: Union Find Implementation



Due: Thursday October 6 noon


Goal:
Task:
  1. Implement the Union-Find ADT (description below. It is also called the Disjoint Sets ADT). Use the array implementation, with path-compression and union-by-rank.
  2. Link it to the testapp program provided.
  3. Test it on data generated by the provided perl program, and design your own input for testing.
  4. Use "git" to submit the tested, debugged code for grading. You need to submit only the file unionfind.cpp
You are provided with the following in your Lab1. Use Lab1 to develop and submit your Assignment 2. You do not need to create a new repository or a new directory; using git, you will submit to Lab1
  1. A testapp program
  2. A C++ file with a stub for the find operation
  3. A C++ header file
  4. A Perl program to generate random test data.
  5. A sample data file for input to the testapp


The Union-Find ADT In this ADT, you have a set of n elements, each (in this case) being an integer. Initially, each element is in its own set, by itself. The operations on the collection of sets are: "Find(i)" must return the "name" of the set containing i. We name a set with some member in the set -- but that name must be the same for any element in that set at that time (the name may change later if the set gets unioned with another set).

Suppose you start with 10 elements. Then doing "find" on 6, say, will return just 6; "find 7" will return 7, etc.

Now suppose you execute a "union 6 7" command, which merges the set containing 6 with the set containing 7, then whichever name you give the set (either 6 or 7), it must be the case that, forever after within the lifetime of this union-find data structure, find(6)=find(7). There is no way to break sets apart once they have been merged (unioned). Convention: For the purposes of this lab, unioning sets of the same rank will always make the first-named set provide the root.

The code you have been provided with:

The "find(p)" operation returns an integer name of the set containing item p. Your testapp, however, is not required to print it out.

The "merge(p,q)" operation achieves the following change to the data: if x is in a set named after element p, and y is in a set named afer element q before the call, then after the call, x and y are together in the same -- indeed, all elements in p's and q's sets are now in the same set, and it is named either p or q (whichever is smaller); elements that are not in either p's or q's sets are unchanged. In other words, the sets containing p and q are merged. "merge" rather than "union" is used to preserve the reserve word "union".

The test application ("testapp") will process a number of commands of the form, "u 16 3" and "f 4". Note that in this first part of the lab, the effect of performing the "f 4" or find(4) operation is nil -- i.e., it has no effect. In the next lab, there will be side effects of the find operation.

The operaton "debugprint" prints out the contents of the array -- it is useful for checking correctness and debugging. You are required to alter debugprint so that it prints out each set in turn; see below.

Tasks

  1. Generate inputs

    The testapp program will: First, let us find a way to create input files of this time at random.

    Here is an example of an input file, called in.11.8
    11
    u 5 7
    f 7
    f 5
    u 9 10
    f 2
    u 1 10
    u 2 8
    f 1
    d
    The first number is the number of elements. "u 5 7" means execute a "merge(5,7)" operation. "f 7" means execute a "find(7)" operation. The little perl script "gen" provided in your git repository will help you generate input files for testing. Remember to change the mode of the file so it is executable:
    chmod a+rx gen

    To invoke the perl script, use the command
    ./gen

    Do the following on the command line:
    ./gen 20 6
    ./gen 6 20

    What is the difference? Edit the gen file. Note that the hash sign comments out some lines of code (except the first line -- that line is necessary so that the file may be understood as a perl file -- don't change it!) You can uncomment the lines that seem to prompt the user for input, then run the instructions again. What is the first parameter in a call "gen 20 6"? What is the second parameter?

    To direct the output to a file, use ./gen > filenameHere You will want to comment out the user prompts for input (for easy outputting to a file) but it does require two inputs, the first is the number of elements, the second is the number of operations. Note that all the files end in "d", for debugprint.

  2. Implement the Union-Find data structures and algorithms



  3. Required: Refine your output



    The "d" option dumps the array to the screen. What would be more interesting is to see the collection of sets. For example,

    1: 1 3 5 6 8 10 11 12 47 48 53 56 63 68 74 96
    2: 2 4 19 20 21 23 24 25 26 27 29 30 32 37 40 41 44 45 60 64 65 66 70 73 76 92 93
    7: 7 67 75 86 89 91
    14: 14 15 16 17 18 22 33 34 38 39 49
    28: 28
    36: 36
    52: 52
    58: 58
    84: 84
    90: 90


    The elements within a set do not need to be in sorted order. To time an execution of testappUF on input and put the output to a file called out.11.8, use the command "time testappUF > out.11.8"

    Explore Experimental Results:


    Use the "time" option on linux to capture the experimental run-time of your algorithm, on various sizes of inputs. The sizes to report are 100 elements, 100 operations; 100 elements 1000 operations; 1000 elements 1000 operations; and 1000 elements 1000000 operations. NOTE: you should remove the "d" from the bottom of each input file before you time your program on it, as the "debugprint" function adds too much to the running time.

    Give estimates of the Big-O running times that are suggested by these results. Predict running times for other file sizes and check the actual running time. Disable the "union by height" function, and test on the same files for benchmarking. Does "union by height" yeild better results? Give Big-O estimates for both implementation: union by height, and union with no height heuristic. Remember to enable your union-by-height before you submit. Report these results in comments at the top of your code.

    Submit your results:

    Use "git" to submit your program. Git not accept late submissions.

Grading:
out of 20.
10 Correctness:
-5 if does not compile and link with testappUF.o
10/10 if compiles and outputs are correct, and implementation meets requirements (e.g., uses array implementation with path compression and union-by-rank)
4 Experimental Results
6 Style, readability, commenting, name at top, appropriate use of whitespace. See the CSCI Coding Philosophy