CSCI 260 Fall 2022: Labs

CSCI 260 Fall 2022: Lab 1: Union Find ADT Part 1

Due: Friday Sept 22

Goal:

To use git correctly. (FAQ and a short history of the minimal set of instructions you need).
to implement a simple data structure and algorithm;
to explore the Union-Find ADT (a.k.a., Disjoint Set);
to compile and test a C++ program under Linux;
to use a small perl program for randomized data generation.

Task:

Implement the basic (no heuristic), and then the union-by-height implementation of Union-Find ADT (description below. It is also called the Disjoint Sets ADT).
Link it to the testapp program provided.
Test it on data generated by the provided perl program, and design your own input for testing.
Use "git" to submit the tested, debugged code for grading. You need to submit only the file unionfind1.cpp

You are provided with

A testapp program
A C++ file with a stub for the find operation
A C++ header file
A Perl program to generate random test data.
A sample data file for input to the testapp

The Union-Find ADT In this ADT, you have a set of $n$ elements, each (in this case) being an integer. Initially, each element is in its own set, by itself. The operations on the collection of sets are:

union two sets
find the set containing a given element.

"Find(i)" must return the "name" of the set containing i. We name a set with some member in the set -- but that name must be the same for any element in that set at that time (the name may change later if the set gets unioned with another set).

Suppose you start with 10 elements. Then doing "find" on 6, say, will return just 6; "find 7" will return 7, etc.

Now suppose you execute a "union 6 7" command, which merges the set containing 6 with the set containing 7, then whichever name you give the set (either 6 or 7), it must be the case that, forever after within the lifetime of this union-find data structure, find(6)=find(7). There is no way to break sets apart once they have been merged (unioned). Convention: For the purposes of this lab, you must always name the set as the first named in the merge command, unless union-by-height is used, and then the first named is used when there is a tie.

The code you have been provided with:

The "find(p)" operation returns an integer name of the set containing item p. Your testapp, however, is not required to print it out.

The "merge(p,q)" operation achieves the following change to the data: if x is in a set named after element p, and y is in a set named afer element q before the call, then after the call, x and y are together in the same -- indeed, all elements in p's and q's sets are now in the same set, and it is named either p or q (whichever is smaller); elements that are not in either p's or q's sets are unchanged. In other words, the sets containing p and q are merged. "merge" rather than "union" is used to preserve the reserve word "union".

The test application ("testapp") will process a number of commands of the form, "u 16 3" and "f 4". Note that in this first part of the lab, the effect of performing the "f 4" or find(4) operation is nil -- i.e., it has no effect. In the next lab, there will be side effects of the find operation.

The operaton "debugprint" prints out the contents of the array -- it is useful for checking correctness and debugging.

Tasks

Generate inputs
The testapp program will:
- prompt for a file name
- open the file and read the number of elements, followed by a bunch of commands of the form "f 5" and "u 7 3". As it reads each command, it will perform the indicated "find" or "merge" (union) on the union-find data structure that your program creates.
First, let us find a way to create input files of this time at random.

Here is an example of an input file, called in.11.8
11 u 5 7 f 7 f 5 u 9 10 f 2 u 1 10 u 2 8 f 1 d The first number is the number of elements. "u 5 7" means execute a "merge(5,7)" operation. "f 7" means execute a "find(7)" operation. The little perl script "gen" provided in your git repository will help you generate input files for testing. Remember to change the mode of the file so it is executable: chmod a+rx gen
To invoke the perl script, use the command
./gen
Do the following on the command line:
./gen 20 6 ./gen 6 20
What is the difference? Edit the gen file. Note that the hash sign comments out some lines of code (except the first line -- that line is necessary so that the file may be understood as a perl file -- don't change it!) You can uncomment the lines that seem to prompt the user for input, then run the instructions again. What is the first parameter in a call "gen 20 6"? What is the second parameter?

To direct the output to a file, use ./gen > filenameHere You will want to comment out the user prompts for input (for easy outputting to a file) but it does require two inputs, the first is the number of elements, the second is the number of operations. Note that all the files end in "d", for debugprint.
Implement the Union-Find data structures and algorithms
- Use an array data structure. The interface is given in the header file.
- A test application is provided for you. The test application prompts the user for a file name; it then opens that file and reads the input and execute the operations given in the file.
- Initializing
  
  The first thing you will have to do is dynamically allocate an array of size n, once you have read your value n. Remember to do this by the instructions:
  parent = new int[k];
  
  In this implementation, the "parent" of an element i, parent[i], should always contain the name of the set that i is in. Therefore, at initialization, parent[i]=i for all i. Make it so!
  for (int i=0; i<n; i++) parent[i]=i;
  
  ... or something like that (depending on your variable names).
  
  In week one, you should also initialize ``rank" (for Lab 1, we will use that field to represent the height of the tree rooted at this node, if the node is a root -- i.e., is it's own parent). It should be 0 for each element at initialization, and update it so that it always reflects the
- find i
  
  find should return the name of the set i belongs to. Your application receives this information but does nothing with it; it does not even print it out. That's okay. When you are debugging, of course you can have your testapp print it out for debugging purposes, but remove such statements for your final version.
- union i j
  
  "union i j" should, in this first implementation, decide which of them will be the new representative, using the one with the greatest rank; if they are equal, always choose the first named element. Then perform the union.
Challenge: Refine your output
This is not necessary for Part 1, but if you have time, try to figure out how it can be done.

The "d" option dumps the array to the screen. What would be more interesting is to see the collection of sets. For example,

1: 1 3 5 6 8 10 11 12 47 48 53 56 63 68 74 96
2: 2 4 19 20 21 23 24 25 26 27 29 30 32 37 40 41 44 45 60 64 65 66 70 73 76 92 93
7: 7 67 75 86 89 91
14: 14 15 16 17 18 22 33 34 38 39 49
28: 28
36: 36
52: 52
58: 58
84: 84
90: 90

To time an execution of testappUF on input and put the output to a file called out.11.8, use the command "time testappUF > out.11.8"

Submit your results:

Use "git" to submit your program. The deadline is midnight Sept 29, 2022. Git will not accept late submissions.

Explore Experimental Results:

Use the "time" option on linux to capture the experimental run-time of your algorithm, on various sizes of inputs. The sizes to report are 100 elements, 100 commands; 100 elements 1000 commands; 1000 elements 1000 commands; and 1000 elements 1000000 commands. NOTE: you should remove the "d" from the bottom of each input file before you time your program on it, as the "debugprint" function adds too much to the running time.

Give estimates of the big-O running times that are suggested by these results. Predict running times for other file sizes and check the actual running time. Disable the "union by height" function, and test on the same files for benchmarking. Does "union by height" yeild better results? Give Big-O estimates for both implmentation: union by height, and union with no height heuristic. Remember to enable your union-by-height before you submit. Report these results in comments at the top of your code.

CSCI 260 Fall 2022: Lab 1: Union Find ADT Part 1

The code you have been provided with:

Tasks

Generate inputs

Implement the Union-Find data structures and algorithms

Challenge: Refine your output

Submit your results:

Explore Experimental Results: