CS201: Data Structures and Algorithms
Rough Draft (expect changes)
Your task is to implement Kruskal's minimum spanning tree
algorithm for undirected graphs,
using a disjoint set data structure.
Your program will process a file containing a description of a graph. A
graph description contains an arbitrary number of edge descriptions. An
edge description consists of two vertices (optionally followed
by a weight) followed by a semicolon. All tokens will be separated
A vertex is simply a
If a weight is omitted, a weight of 1 should be assumed.
A weight is a positive integer.
The file should be free format; whitespace may appear
Here a sample graph description:
1 5 ;
214 33 1
which is equivalent to:
1 5 ;
2 10 23 ;
214 33 1 ;
In this example
there are six vertices, named 1, 2, 5, 10, 33, and 214, and three
edges, 1 to 5, 2 to 10 and 214 to 33, with weights 1, 23, and 1,
The name of your executable must be kruskal
the name of the file describing the graph
will be passed to your program as
command line arguments,
$ cat g1
1 2 1 ;
2 3 2 ;
3 1 3 ;
$ kruskal g1
total weight: 3
Your program should interpret the graph description
as an undirected graph and should report
a level-order traversal, starting with the smallest
vertex in the spanning tree.
For each level,
you should print the vertices from smallest to largest vertex.
When printing a vertex, you should print its number, its parent's number
(in parentheses), and the edge weight from the vertex and its parent.
After printing the traversal, you should print the
weight of the tree, followed by a line of four ASCII hyphens.
In order for your program to run on a randomly created graph, if an edge
is given more than once, replace the edge weight of the vertices if the
subsequent edge weight is smaller.
Note: a edge is the same edge as a edge.
If more than one vertex is eligible to be added to the spanning
tree at any given point,
chose the vertex with the smaller vertex number.
The output of your program
should be a minimum spanning forest, with each tree
displayed as a breadth-first traversal.
Trees are presented in the order determined by the
smallest vertex number in the tree.
Here is an example with multiple spanning trees:
$ cat g2
10 0 9 ;
6 7 11 ;
5 9 ;
4 8 7 ; 10 8 6 ;
11 12 2 ;
1 6 5 ;
0 6 10 ; 0 5 3 ; 0 4 8 ; 0 3 2 ; 0 1 4 ;
$ kruskal g2
0 : 0
1 : 1(0)4 3(0)2 4(0)8 5(0)3
2 : 6(1)5 8(4)7 9(5)1
3 : 7(6)11 10(8)6
total weight: 47
0 : 11
1 : 12(11)2
total weight: 2
The multiple trees arise from having a disconnected graph. Note that
a single vertex may appear as a spanning tree.
A single newline must immediately follow the last printable character on
a line of output.
You must follow the format exactly as diff will be
used to assess your output.
The disjoint set module
You must implement your disjoint set as a separate module named
set.c and set.h.
Here is a conforming
typedef struct set SET;
extern SET *newSET(void (*d)(FILE *,void *));
extern int makeSET(SET *d,void *value);
extern int findSET(SET *d,int index);
extern int unionSET(SET *d,int index1,int index2);
extern int countSET(SET *d);
extern int displaySET(FILE *,SET *d);
Note that values placed in the structure via makeSet need to be
wrapped in a node structure that has a parent pointer (with perhaps
other additional fields).
Here are some of the behaviors your methods should have. This listing
is not exhaustive; you are expected, as a computer scientist, to
complete the implementation in the best possible and most logical manner.
The dynamic array uses zero-based indexing.
newDISJOINT takes in a display function, which allows the
to display itself.
If one were to execute the following operations:
The constructor is passed a function that knows how to
display the generic values stored in the structure.
The makeSET method inserts a wrapped value into the
underlying dynamic array. It returns the index of the
inserted value. Presumably, the caller of makeSet
saves the return value so that findSet and unionSET
can be called in the future.
The findSET method returns the index of the representative
of the value stored at the given index.
The union method runs findSet on each of the arguments.
If the arguments have different representatives, the two
disjoint sets are unioned and a value of one is returned.
If they have the same representatives, then a zero is returned.
If the two differing representatives have the same rank,
then the representative with the lower index becomes the
parent of the other.
The count method returns current number of representatives
in the set.
If the given index is equal to the size, the value is inserted via
the insert method.
The method returns the replaced value or the null pointer
if no value was replaced.
It runs in constant time if the array does not need to grow and
amortized constant time if it does.
SET *d = newSET(displayINTEGER);
int u = makeSET(d,newINTEGER(4)); int v = makeSET(d,newINTEGER(8));
int w = makeSET(d,newINTEGER(3)); int x = makeSET(d,newINTEGER(1));
int y = makeSET(d,newINTEGER(7)); int z = makeSET(d,newINTEGER(9));
The output would be:
1: 8 4
2: 3 4
3: 1 3 4
where the number before the colon refers to the array index
and the number(s) following the colon represents the chain of
parent pointers from the value at that index to its representative.
You must implement your program in C99.
Only the most foolish student
would not recompile and thoroughly test the implementation on a Linux
system. You must use the test3 dropbox to make sure the individual
your modules goes as desired.
You must provide
a makefile which responds properly to the commands make,
make test, and make clean. The make
command builds the kruskal executable.
must proceed with no errors or warnings and it must
compile with the highest level of error checking (the -Wall and
The make test command should run your program through some test
files of your choosing. The make clean command should remove
all intermediate files, including the executable kruskal.
A call to make should never result in unnecessary compilation.
You cannot use any fixed-sized data structures for storing the
graph (unless you preprocess the input
to determine its extent) or any other data whose size changes
depending upon the input. You may not use built-in data structures,
other that C arrays and structs.
All data structure methods must run as efficiently as commonly expected.
All code you hand in should be attributed to its author. Comment
sparingly but well. Do explain the purpose of your program. Do not
explain obvious code. If code is not obvious, consider rewriting the
code rather than explaining what is going on through comments.
Handing in results
For preliminary testing,
delete all intermediate files and executables (
Then, send me all the files in your directory by running the command:
submit cs201 lusth test3
For your final submission, use the command:
submit cs201 lusth assign3
Again, your implementation may be developed on other hardware and
operating systems, but it must also compile and run cleanly and
correctly on a Linux system. You may submit as many times as
you like, up to the deadline.