CSCI 260 Fall 2022: Assignement 2
Assignment 2: Union Find Implementation
Due: Thursday October 6 noon
Goal:
- To use git correctly.
(FAQ
and a short history of the minimal set of instructions you need).
- to implement an efficient data structure and algorithm;
- to explore the
Union-Find ADT (a.k.a., Disjoint Set);
- to compile and test a C++ program under Linux;
- to use a small perl program for
randomized data generation.
Task:
-
Implement the Union-Find ADT (description below. It is also called the Disjoint Sets ADT). Use the array implementation, with path-compression and union-by-rank.
-
Link it to the testapp program provided.
-
Test it on data generated by the provided perl program, and design your own input for testing.
-
Use "git" to submit the tested, debugged code for grading. You need to submit only the file
unionfind.cpp
You are provided with the following in your Lab1. Use Lab1 to develop
and submit your Assignment 2. You do not need to create a new repository or
a new directory; using git, you will submit to Lab1
-
A testapp program
-
A C++ file with a stub for the find operation
-
A C++ header file
-
A Perl program to generate random test data.
-
A sample data file for input to the testapp
The Union-Find ADT
In this ADT,
you have a set of n elements, each (in this case) being an integer.
Initially, each element is in its own set, by itself.
The operations on the collection of sets are:
- union two sets
- find the set containing a given element.
"Find(i)" must return the "name" of the set containing i. We name a
set with some member in the set -- but that name must be the same for
any element in that set at that time (the name may change later if the
set gets unioned with another set).
Suppose you start with 10 elements. Then doing "find" on 6, say, will return
just 6; "find 7" will return 7, etc.
Now suppose you execute a "union 6 7" command, which merges the set containing 6
with the set containing 7, then whichever name you give the set (either 6 or 7),
it must be the case that, forever after within the lifetime of this union-find
data structure, find(6)=find(7). There is no way to break sets apart once they
have been merged (unioned).
Convention: For the purposes of this lab,
unioning sets of the same rank will always make the first-named set
provide the root.
The code you have been provided with:
The "find(p)" operation
returns an integer name of the set containing item p.
Your testapp, however, is not required to print it out.
The "merge(p,q)" operation achieves the following change to the data:
if x is in a set named after element p, and y is in a set named afer element q
before the call, then after the call, x and y are
together in the same -- indeed, all elements in p's and q's sets are
now in the same set, and it is named either p or q (whichever is smaller); elements that are not
in either p's or q's sets are unchanged. In other words, the sets containing
p and q are merged. "merge" rather than "union" is used to preserve the
reserve word "union".
The test application ("testapp")
will process a number of commands of the form, "u 16 3" and "f 4".
Note that in this first part of the lab, the effect of performing the
"f 4" or find(4) operation is nil -- i.e., it has no effect. In the next
lab, there will be side effects of the find operation.
The operaton "debugprint"
prints out the contents of the array -- it is
useful for checking correctness and debugging.
You are required to alter debugprint so that it prints out each
set in turn; see below.
Tasks
-
Generate inputs
The testapp program will:
-
prompt for a file name
-
open the file and read the number of elements, followed by a bunch of commands
of the form "f 5" and "u 7 3".
As it reads each command, it will perform the indicated "find" or "merge" (union) on the union-find data structure that your program creates.
First, let us find a way to create input files of this time at random.
Here is an example of an input file, called in.11.8
11
u 5 7
f 7
f 5
u 9 10
f 2
u 1 10
u 2 8
f 1
d
The first number is the number of elements. "u 5 7" means execute a
"merge(5,7)" operation. "f 7" means execute a "find(7)" operation.
The little perl script "gen" provided in your git repository
will help you generate input files for testing. Remember to change the
mode of the file so it is executable:
chmod a+rx gen
To invoke the perl script,
use the command
./gen
Do the following on the command line:
./gen 20 6
./gen 6 20
What is the difference? Edit the gen file. Note that the hash sign comments out
some lines of code (except the first line -- that line is necessary so that the
file may be understood as a perl file -- don't change it!) You can uncomment the
lines that seem to prompt the user for input, then run the instructions again. What
is the first parameter in a call "gen 20 6"? What is the second parameter?
To direct the output to a file, use
./gen > filenameHere
You will want to comment out the user prompts for input (for easy outputting to a file) but it does require two inputs,
the first is the number of elements, the second is the number of operations.
Note that all the files end in "d", for debugprint.
-
Implement the Union-Find data structures and algorithms
-
Use an array data structure. The interface is given in the header file.
-
A test application is provided for you. The test application prompts the user for a file name; it
then opens that file and reads the input and execute the operations given in the
file.
-
Implement union-by-rank and path-compression.
Initializing
The first thing you will have to do is dynamically allocate an array of
size n, once you have read your value n. Remember to do this
by the instructions:
parent = new int[k];
In this implementation, the "parent" of an element i, parent[i],
is the parent of i in the forest of union-find trees.
for (int i=0; i<n; i++) parent[i]=i;
Initialize "rank"
to be 0
for each element
at initialization, and update it as necessary during a union operation,
which we have implemented as ``merge" so as not to overlap with reserve words
in C++.
-
find i
find should return the name of the set i belongs to. TestappUF receives
this information but does nothing with it; it does not even print it out.
That's
okay. When you are debugging, of course you can have your testapp print it out for
debugging purposes, but remove such statements for your final version.
Even if we don't print out the results of the find, Find operations
can change the shape of our trees, so they need to be executed.
-
union i j
"union i j" should first "find(i)" and "find(j)".
If the ranks of the roots are equal, always make the root of
the tree of the first argument be the new root.
Then perform the union.
-
Required: Refine your output
The "d" option dumps the array to the screen. What would
be more interesting is to see the collection of sets. For example,
1: 1 3 5 6 8 10 11 12 47 48 53 56 63 68 74 96
2: 2 4 19 20 21 23 24 25 26 27 29 30 32 37 40 41 44 45 60 64 65 66 70 73 76 92 93
7: 7 67 75 86 89 91
14: 14 15 16 17 18 22 33 34 38 39 49
28: 28
36: 36
52: 52
58: 58
84: 84
90: 90
The elements within a set do not need to be in sorted order.
To time an execution of testappUF on input and put the output to
a file called out.11.8, use the command "time testappUF > out.11.8"
Explore Experimental Results:
Use the "time" option on linux to capture the experimental run-time of
your algorithm, on various sizes of inputs. The sizes
to report are 100 elements, 100 operations; 100 elements 1000 operations;
1000 elements 1000 operations; and 1000 elements 1000000 operations.
NOTE: you should remove the "d" from the bottom of each input file before you
time your program on it, as
the "debugprint" function adds too much to the running time.
Give estimates of the Big-O running times that are suggested by these
results. Predict running times for other file sizes and check the actual
running time. Disable the "union by height" function, and test on the same
files for benchmarking. Does "union by height" yeild better results? Give
Big-O estimates for both implementation: union by height, and union with no
height heuristic. Remember to enable your union-by-height before you submit.
Report these results in comments at the top of your code.
Submit your results:
Use "git" to submit your program. Git not accept late submissions.
Grading:
out of 20.
10 Correctness:
-5 if does not compile and link with testappUF.o
10/10 if compiles and outputs are correct, and implementation meets requirements (e.g., uses array implementation with path compression and union-by-rank)
4 Experimental Results
6 Style, readability, commenting, name at top, appropriate use of whitespace.
See the CSCI Coding Philosophy