Implementation of Data Migration

The data migration problem is a generalization of the gossiping and broadcasting problems. In the gossiping problem, there are n people, each person knows one distinct piece of gossip at the beginning, and every person needs to know every piece of gossip at the end. In the broadcasting problem, only one person knows a piece of gossip at the beginning, and every person needs to know it at the end. There are two generalizations. In the first case, each piece of gossip needs to be communicated to a subset of people, rather than to all n people. In the second case, a person may know several pieces of gossip at the beginning, rather than just one piece. In all of these cases, communication happens in rounds. In one round, a person may talk to at most one other person, and either send or receive only one piece of gossip. The goal is to complete the communication in as few rounds as possible. This is the data migration problem, in which each person represents a disk and each piece of gossip represents a file that needs to be transferred from one disk to another.

In the paper "On Generalized Gossiping and Broadcasting", Samir Khuller, Yoo-Ah Kim, and Yung-Chun (Justin) Wan give a high-level description of algorithms for the general data migration problem and for the subproblems. The number of rounds used by the general data migration algorithm is upper bounded by a constant factor approximation, and the algorithm usually needs fewer rounds than the worst case. To measure the performance of this algorithm, we implement the algorithm in C. The implementation of the algorithm involves an implementation of the graph edge coloring algorithm given by Berge and Fournier in "A Short Proof for a Generalization of Vizing's Theorem" and an implementation of the scheduling algorithm given by Shmoys and Tardos in "Scheduling Unrelated Machines with Costs". Then we create instances of the data migration problem, run the data migration algorithm on these input instances, and observe how many rounds it takes to complete the migration. For comparsion, we use several simple heuristics. In the first heuristic, at each round, a person who knows a particular piece of gossip pairs with someone who needs to know that piece of gossip, using weighted matching. In the second heuristic, we create a graph in which each person is represented by a vertex, with edges from a person who initially knows a piece of gossip to some of the people who need to know that piece of gossip. We color this graph, and each color determines a round. Comparing how many rounds the heuristic takes to complete the migration on the same input instance of the problem, we expect that the data migration algorithm should be faster than the heuristics, and the difference in performance will vary depending on the kind of problem instance. There may be extreme problem instances on which the data migration algorithm performs especially well or especially badly, both in comparison with the worst case bound and in comparison with the heuristic.

At this point, we are still in the process of debugging the implementation and generating test instances. The source code and the results of the testing will appear on this page soon. Meanwhile, here are summary explanations of the edge-coloring and the shmoys-tardos rounding algorithms, along with the corresponding papers.

problem summary explanation original paper
edge coloring berge.pdf
shmoys-tardos rounding shmoys.pdf

David B. Shmoys and Eva Tardos. Scheduling unrelated machines with costs. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 448-454, Austin, Texas, 25-27 January 1993.

Berge, C., Fournier, JC, A short proof for a generalization of Vizing's theorem, J. Graph. Theory 15, 1991, 333-336.