CS201: Data Structures and Algorithms

Assignment 3

Rough Draft (expect changes)

Printable Version

## Introduction

Your task is to implement Kruskal's minimum spanning tree algorithm for undirected graphs, using a disjoint set data structure.

## Input

Your program will process a file containing a description of a graph. A graph description contains an arbitrary number of edge descriptions. An edge description consists of two vertices (optionally followed by a weight) followed by a semicolon. All tokens will be separated by whitespace. A vertex is simply a non-negative integer. If a weight is omitted, a weight of 1 should be assumed. A weight is a positive integer.

The file should be free format; whitespace may appear anywhere. Here a sample graph description:
```    1 5 ;
2
10
23 ;

214 33 1
;
```
which is equivalent to:
```    1 5 ;
2 10 23 ;
214 33 1 ;
```
In this example there are six vertices, named 1, 2, 5, 10, 33, and 214, and three edges, 1 to 5, 2 to 10 and 214 to 33, with weights 1, 23, and 1, respectively.

The name of your executable must be kruskal and the name of the file describing the graph will be passed to your program as command line arguments, as in:
```    \$ cat g1
1 2 1 ;
2 3 2 ;
3 1 3 ;
\$ kruskal g1
0: 1
1: 2(1)1
2: 3(2)2
total weight: 3
----
\$
```
Your program should interpret the graph description as an undirected graph and should report a level-order traversal, starting with the smallest vertex in the spanning tree. For each level, you should print the vertices from smallest to largest vertex. When printing a vertex, you should print its number, its parent's number (in parentheses), and the edge weight from the vertex and its parent. After printing the traversal, you should print the weight of the tree, followed by a line of four ASCII hyphens.

## Execution

In order for your program to run on a randomly created graph, if an edge is given more than once, replace the edge weight of the vertices if the subsequent edge weight is smaller. Note: a $u,v$ edge is the same edge as a $v,u$ edge.

If more than one vertex is eligible to be added to the spanning tree at any given point, chose the vertex with the smaller vertex number.

## Output

The output of your program should be a minimum spanning forest, with each tree displayed as a breadth-first traversal. Trees are presented in the order determined by the smallest vertex number in the tree. Here is an example with multiple spanning trees:
```    \$ cat g2
10 0 9 ;
6 7 11 ;
5 9 ;
4 8 7 ; 10 8 6 ;
11 12 2 ;
1 6 5 ;
0 6 10 ; 0 5 3 ; 0 4 8 ; 0 3 2 ; 0 1 4 ;
\$ kruskal g2
0 : 0
1 : 1(0)4 3(0)2 4(0)8 5(0)3
2 : 6(1)5 8(4)7 9(5)1
3 : 7(6)11 10(8)6
total weight: 47
----
0 : 11
1 : 12(11)2
total weight: 2
----
\$
```
The multiple trees arise from having a disconnected graph. Note that a single vertex may appear as a spanning tree.

A single newline must immediately follow the last printable character on a line of output. You must follow the format exactly as diff will be used to assess your output.

## The disjoint set module

You must implement your disjoint set as a separate module named set.c and set.h. Here is a conforming header file:
```    #ifndef __SET_INCLUDED__
#define __SET_INCLUDED__

#include <stdio.h>

typedef struct set SET;

extern SET *newSET(void (*d)(FILE *,void *));
extern int makeSET(SET *d,void *value);
extern int findSET(SET *d,int index);
extern int unionSET(SET *d,int index1,int index2);
extern int countSET(SET *d);
extern int displaySET(FILE *,SET *d);

#endif
```
Note that values placed in the structure via makeSET need to be wrapped in a node structure that has a parent pointer (with perhaps other additional fields).

### Method behavior

Here are some of the behaviors your methods should have. This listing is not exhaustive; you are expected, as a computer scientist, to complete the implementation in the best possible and most logical manner. The dynamic array uses zero-based indexing.
• newSET - The constructor is passed a function that knows how to display the generic values stored in the structure.
• makeSET - The makeSET method inserts a wrapped value into the underlying dynamic array. It returns the index of the inserted value. Presumably, the caller of makeSET saves the return value so that findSET and unionSET can be called in the future.
• findSET - The findSET method returns the index of the representative of the value stored at the given index.
• unionSET - The union method runs findSET on each of the arguments. If the arguments have different representatives, the two disjoint sets are unioned and a value of one is returned. If they have the same representatives, then a zero is returned. If the two differing representatives have the same rank, then the representative with the lower index becomes the parent of the other.
• countSET - The count method returns current number of representatives in the set.
• displaySET - The method should print each value in the underlying dynamic array. See the example below.
The constructor newSET takes in a display function, which allows the data structure to display itself. If one were to execute the following operations:
```    SET *d = newSET(displayINTEGER);
int u = makeSET(d,newINTEGER(4)); int v = makeSET(d,newINTEGER(8));
int w = makeSET(d,newINTEGER(3)); int x = makeSET(d,newINTEGER(1));
int y = makeSET(d,newINTEGER(7)); int z = makeSET(d,newINTEGER(9));
unionSET(d,u,v);
unionSET(d,w,x);
unionSET(d,v,x);
displaySET(stdout,d);
```
The output would be:
```    0: 4
1: 8 4
2: 3 4
3: 1 3 4
4: 7
5: 9
```
where the number before the colon refers to the array index and the number(s) following the colon represents the chain of parent pointers from the value at that index to its representative.

## Other details

You must implement your program in C99. Only the most foolish student would not recompile and thoroughly test the implementation on a Linux system. You must use the test3 dropbox to make sure the individual testing of your modules goes as desired.

## Makefile

You must provide a makefile which responds properly to the commands make, make test, and make clean. The make command builds the kruskal executable. Compilation must proceed with no errors or warnings and it must compile with the highest level of error checking (the -Wall and -Wextra options). The make test command should run your program through some test files of your choosing. The make clean command should remove all intermediate files, including the executable kruskal. A call to make should never result in unnecessary compilation.

## Restrictions

You cannot use any fixed-sized data structures for storing the graph (unless you preprocess the input to determine its extent) or any other data whose size changes depending upon the input. You may not use built-in data structures, other that C arrays and structs. All data structure methods must run as efficiently as commonly expected.

## Documentation

All code you hand in should be attributed to its author. Comment sparingly but well. Do explain the purpose of your program. Do not explain obvious code. If code is not obvious, consider rewriting the code rather than explaining what is going on through comments.

## Handing in results

For preliminary testing, delete all intermediate files and executables ( `make clean`). Then, send me all the files in your directory by running the command:
```    submit cs201 lusth test3
```
For your final submission, use the command:
```    submit cs201 lusth assign3
```
Again, your implementation may be developed on other hardware and operating systems, but it must also compile and run cleanly and correctly on a Linux system. You may submit as many times as you like, up to the deadline.