CSCI 260 Fall 2022: Assignment 2

CSCI 260 Fall 2022: Assignement 2

Assignment 2: Union Find Implementation

Due: Thursday October 6 noon

Goal:

To use git correctly. (FAQ and a short history of the minimal set of instructions you need).
to implement an efficient data structure and algorithm;
to explore the Union-Find ADT (a.k.a., Disjoint Set);
to compile and test a C++ program under Linux;
to use a small perl program for randomized data generation.

Task:

Implement the Union-Find ADT (description below. It is also called the Disjoint Sets ADT). Use the array implementation, with path-compression and union-by-rank.
Link it to the testapp program provided.
Test it on data generated by the provided perl program, and design your own input for testing.
Use "git" to submit the tested, debugged code for grading. You need to submit only the file unionfind.cpp

You are provided with the following in your Lab1. Use Lab1 to develop and submit your Assignment 2. You do not need to create a new repository or a new directory; using git, you will submit to Lab1

A testapp program
A C++ file with a stub for the find operation
A C++ header file
A Perl program to generate random test data.
A sample data file for input to the testapp

The Union-Find ADT In this ADT, you have a set of n elements, each (in this case) being an integer. Initially, each element is in its own set, by itself. The operations on the collection of sets are:

union two sets
find the set containing a given element.

"Find(i)" must return the "name" of the set containing i. We name a set with some member in the set -- but that name must be the same for any element in that set at that time (the name may change later if the set gets unioned with another set).

Suppose you start with 10 elements. Then doing "find" on 6, say, will return just 6; "find 7" will return 7, etc.

Now suppose you execute a "union 6 7" command, which merges the set containing 6 with the set containing 7, then whichever name you give the set (either 6 or 7), it must be the case that, forever after within the lifetime of this union-find data structure, find(6)=find(7). There is no way to break sets apart once they have been merged (unioned). Convention: For the purposes of this lab, unioning sets of the same rank will always make the first-named set provide the root.

The code you have been provided with:

The "find(p)" operation returns an integer name of the set containing item p. Your testapp, however, is not required to print it out.

The "merge(p,q)" operation achieves the following change to the data: if x is in a set named after element p, and y is in a set named afer element q before the call, then after the call, x and y are together in the same -- indeed, all elements in p's and q's sets are now in the same set, and it is named either p or q (whichever is smaller); elements that are not in either p's or q's sets are unchanged. In other words, the sets containing p and q are merged. "merge" rather than "union" is used to preserve the reserve word "union".

The test application ("testapp") will process a number of commands of the form, "u 16 3" and "f 4". Note that in this first part of the lab, the effect of performing the "f 4" or find(4) operation is nil -- i.e., it has no effect. In the next lab, there will be side effects of the find operation.

The operaton "debugprint" prints out the contents of the array -- it is useful for checking correctness and debugging. You are required to alter debugprint so that it prints out each set in turn; see below.

Tasks

Generate inputs
The testapp program will:
- prompt for a file name
- open the file and read the number of elements, followed by a bunch of commands of the form "f 5" and "u 7 3". As it reads each command, it will perform the indicated "find" or "merge" (union) on the union-find data structure that your program creates.
First, let us find a way to create input files of this time at random.

Here is an example of an input file, called in.11.8
11 u 5 7 f 7 f 5 u 9 10 f 2 u 1 10 u 2 8 f 1 d The first number is the number of elements. "u 5 7" means execute a "merge(5,7)" operation. "f 7" means execute a "find(7)" operation. The little perl script "gen" provided in your git repository will help you generate input files for testing. Remember to change the mode of the file so it is executable: chmod a+rx gen
To invoke the perl script, use the command
./gen
Do the following on the command line:
./gen 20 6 ./gen 6 20
What is the difference? Edit the gen file. Note that the hash sign comments out some lines of code (except the first line -- that line is necessary so that the file may be understood as a perl file -- don't change it!) You can uncomment the lines that seem to prompt the user for input, then run the instructions again. What is the first parameter in a call "gen 20 6"? What is the second parameter?

To direct the output to a file, use ./gen > filenameHere You will want to comment out the user prompts for input (for easy outputting to a file) but it does require two inputs, the first is the number of elements, the second is the number of operations. Note that all the files end in "d", for debugprint.
Implement the Union-Find data structures and algorithms
- Use an array data structure. The interface is given in the header file.
- A test application is provided for you. The test application prompts the user for a file name; it then opens that file and reads the input and execute the operations given in the file.
- Implement union-by-rank and path-compression.
  
  Initializing
  
  The first thing you will have to do is dynamically allocate an array of size n, once you have read your value n. Remember to do this by the instructions:
  parent = new int[k];
  
  In this implementation, the "parent" of an element i, parent[i], is the parent of i in the forest of union-find trees.
  for (int i=0; i<n; i++) parent[i]=i;
  
  Initialize "rank" to be 0 for each element at initialization, and update it as necessary during a union operation, which we have implemented as ``merge" so as not to overlap with reserve words in C++.
- find i
  
  find should return the name of the set i belongs to. TestappUF receives this information but does nothing with it; it does not even print it out. That's okay. When you are debugging, of course you can have your testapp print it out for debugging purposes, but remove such statements for your final version. Even if we don't print out the results of the find, Find operations can change the shape of our trees, so they need to be executed.
- union i j
  
  "union i j" should first "find(i)" and "find(j)". If the ranks of the roots are equal, always make the root of the tree of the first argument be the new root. Then perform the union.
Required: Refine your output

The "d" option dumps the array to the screen. What would be more interesting is to see the collection of sets. For example,

1: 1 3 5 6 8 10 11 12 47 48 53 56 63 68 74 96
2: 2 4 19 20 21 23 24 25 26 27 29 30 32 37 40 41 44 45 60 64 65 66 70 73 76 92 93
7: 7 67 75 86 89 91
14: 14 15 16 17 18 22 33 34 38 39 49
28: 28
36: 36
52: 52
58: 58
84: 84
90: 90

The elements within a set do not need to be in sorted order. To time an execution of testappUF on input and put the output to a file called out.11.8, use the command "time testappUF > out.11.8"

Explore Experimental Results:

Use the "time" option on linux to capture the experimental run-time of your algorithm, on various sizes of inputs. The sizes to report are 100 elements, 100 operations; 100 elements 1000 operations; 1000 elements 1000 operations; and 1000 elements 1000000 operations. NOTE: you should remove the "d" from the bottom of each input file before you time your program on it, as the "debugprint" function adds too much to the running time.

Give estimates of the Big-O running times that are suggested by these results. Predict running times for other file sizes and check the actual running time. Disable the "union by height" function, and test on the same files for benchmarking. Does "union by height" yeild better results? Give Big-O estimates for both implementation: union by height, and union with no height heuristic. Remember to enable your union-by-height before you submit. Report these results in comments at the top of your code.
Submit your results:
Use "git" to submit your program. Git not accept late submissions.

Grading:
out of 20.
10 Correctness:
-5 if does not compile and link with testappUF.o
10/10 if compiles and outputs are correct, and implementation meets requirements (e.g., uses array implementation with path compression and union-by-rank)
4 Experimental Results
6 Style, readability, commenting, name at top, appropriate use of whitespace. See the CSCI Coding Philosophy

CSCI 260 Fall 2022: Assignement 2

Assignment 2: Union Find Implementation

The code you have been provided with:

Tasks

Generate inputs

Implement the Union-Find data structures and algorithms

Required: Refine your output

Explore Experimental Results:

Submit your results: