The 1998 paper by Deelman and Szymanski, Dynamic Load Balancing in Parallel Discrete Event Simulation for Spatially Explicit Problems, contains much of interest.
It discusses pretty much exactly the problem we are working on here: a fixed-size landscape, with "mobile objects" which are the cause of load, and comes to some of the same conclusions we already have. Check this out:
However, their solutions are not appropriate to ours, because they are working with several different constraints:
An interesting paper, nonetheless.
It discusses pretty much exactly the problem we are working on here: a fixed-size landscape, with "mobile objects" which are the cause of load, and comes to some of the same conclusions we already have. Check this out:
- They partition a landscape into rectangles ("logical processes" or LPs, corresponding to our nodes) and load-balance by reallocating the coverage of these LPs on the landscape. They object to the irregular shapes resulting from two degrees of freedom in reshaping and thus restrict reallocation to a single dimension, resulting in rectangles that can grow or shrink only horizontally... sound familiar? :)
- A theoretical metric they use to attempt to quantify the amount of communication between LPs is the average size of the LPs' perimeters. I think this is reasonable and it suggests that extreme distortions of nodes that result in a very high perimeter should be avoided when possible, and tend to be reduced. We can implement that preference in our rules for selecting appropriate action -- include a heuristic to prefer, of two otherwise more-or-less equal choices, the one which reduces the perimeter of the space more.
- They note that "sometimes it is impossible to perfectly balance the load of a system in one round of communications." We are not even striving for that. This comment, however, does make it seem worthwhile to mention that fact explicitly in the final writeup.
However, their solutions are not appropriate to ours, because they are working with several different constraints:
- Their load-balancing solution includes centralized measures of load across the entire landscape, which (once calculated at each timestep) is broadcast to all nodes.
- Their system assumes foreknowledge of the load demands which are going to be placed on individual objects, and therefore also the loads in each individual LP.
- They make assumptions about synchronization of LPs which will not hold in our landscape (namely, that work will continue to be executed if an LP gets ahead of its neighbors but that a rollback mechanism be executed if the work done turns out to be invalidated.)
An interesting paper, nonetheless.
Leave a comment
