Home

Advertisement

Customize
pdcs_ttg
03 May 2007 @ 10:50 pm
The 1998 paper by Deelman and Szymanski, Dynamic Load Balancing in Parallel Discrete Event Simulation for Spatially Explicit Problems, contains much of interest.

It discusses pretty much exactly the problem we are working on here: a fixed-size landscape, with "mobile objects" which are the cause of load, and comes to some of the same conclusions we already have. Check this out:

  1. They partition a landscape into rectangles ("logical processes" or LPs, corresponding to our nodes) and load-balance by reallocating the coverage of these LPs on the landscape. They object to the irregular shapes resulting from two degrees of freedom in reshaping and thus restrict reallocation to a single dimension, resulting in rectangles that can grow or shrink only horizontally... sound familiar? :)

  2. A theoretical metric they use to attempt to quantify the amount of communication between LPs is the average size of the LPs' perimeters. I think this is reasonable and it suggests that extreme distortions of nodes that result in a very high perimeter should be avoided when possible, and tend to be reduced. We can implement that preference in our rules for selecting appropriate action -- include a heuristic to prefer, of two otherwise more-or-less equal choices, the one which reduces the perimeter of the space more.

  3. They note that "sometimes it is impossible to perfectly balance the load of a system in one round of communications." We are not even striving for that. This comment, however, does make it seem worthwhile to mention that fact explicitly in the final writeup.


However, their solutions are not appropriate to ours, because they are working with several different constraints:

  • Their load-balancing solution includes centralized measures of load across the entire landscape, which (once calculated at each timestep) is broadcast to all nodes.

  • Their system assumes foreknowledge of the load demands which are going to be placed on individual objects, and therefore also the loads in each individual LP.

  • They make assumptions about synchronization of LPs which will not hold in our landscape (namely, that work will continue to be executed if an LP gets ahead of its neighbors but that a rollback mechanism be executed if the work done turns out to be invalidated.)


An interesting paper, nonetheless.
 
 
pdcs_ttg
03 May 2007 @ 01:55 am
Another potential issue:

Suppose the nodes get so distorted that the boundary lines are very far diagonal and the joints anchoring them are very close together. It is possible that the calculations currently envisioned for the identification of the contents of a node will, in that case, cause two different nodes both to assume one landscape location is part of their jurisdiction.

Enumeration of the landscape gridpoints must therefore take into account the gridpoints of each nearest neighbor, to identify whether there will be overlaps...

ow.
 
 
pdcs_ttg
01 May 2007 @ 04:53 pm
From the prior post:

Consider permitting individual joints to move up and down, i.e. biaxial movement of any non-edge joint (edge joints would of course still be constrained to remain on their edges.)


Two side implications:

  1. Would joints then be able to move diagonally? I think there is no need for this. This would vastly complicate the "bitmap" or whatever solution is supposed to identify the correct load-balancing operation for a node. I don't see that diagonal movement adds anything -- and in fact it flies in the face of the KISS dictum Thilo advocated in the meeting; a diagonal movement is the same as two axial movements on different axes. On the other hand, simple vertical movement actually simplifies the bitmap slightly as rotations can be true rotations and not constrained by the asymmetry along different axes.

  2. Permitting the vertical movement of joints means that there is no longer a vertically-non-balancing problem. Does this mean that nodes no longer need to be transferred between computers?

    That would be a wonderful outcome. I have never been happy with the idea of transferring nodes, both in terms of implementation and simply the idea that it is necessary, as it seems like a kludgey solution -- which in turn suggests that the algorithm that requires such a solution isn't very good.

    So why didn't we do this in the first place, and what's holding us back? Let's revisit. There were two considerations in this decision: the prevention of negative-area-containing nodes, and the concern about the calculation of what would constitute the content of a node under some sets of node boundaries. Rethinking these:

    1. Prevention of negative-area nodes. Negative area nodes can be generated by solely horizontal sliding as well, if the joint on one side slides so far inward that it passes the joint on the other side. We planned to guard against this eventuality by validating any movement of nodes adjacent and in the same stripe.

      "But wait!", I hear you cry. "Any node which moves a joint of B's will know B's joint locations already and can make the decision locally on B's behalf!" Not so fast. If B's geometry has been altered by another node farther away, A may not have gotten the message yet -- and a move which A thinks is OK may instead cause a problem. So there has to be communication, as B is the only official arbiter of B's own jurisdiction. But that's not so bad -- since if there is movement, there will need to be communication anyway. We can implement an "optimistic" algorithm for the movement, only backtracking in the case that the move gets vetoed during the operation -- and logging the event, so we can see whether this is an efficient choice or not!

      So any movement was already going to have to be validated by direct communication. The difference is that the movements which will require communication are no longer limited to those made by nodes in the same stripe; nodes in the vertical stripe must also be given veto power -- but only if the movement is vertical, in which case the horizontal vetos need not be checked! As long as the movement is along one axis or the other, no additional vetos (by count) need be granted -- and the communication count remains the same, as the landscape movements always require synchronization of all of the up-to-four nodes that share that joint. Aside from programming concerns, this is a non-issue.

    2. Calculation of contents of a node. Worrying about this was a silly mistake on my part. It is easy to determine the contents of a node by using graphics techniques even if the node is concave along any of the four axes -- it simply requires a test of the relative locations of the four joints in order to determine whether to enumerate the points in an X-major or Y-major fashion. Not an issue.

    So with these things worked out I am inclined to permit the vertical movement in place of the transferring of nodes between computers for load-balancing purposes.

    Is there any other reason to transfer nodes among computers? The only issue I can think of is the zero-area node propagation -- and that's a performance thing more than any other. I think with the exposure of the cost of zero-area node propagation (see prior post, point 10) this is not as much of an issue anymore -- and transferring a zero-area node between computers is likely to require less code than transferring any possibly-populated node.


Therefore, I propose: that movement along a vertical axis be permitted, and that load-balancing by transfer of nodes between computers be eliminated.

Comments?
 
 
pdcs_ttg
29 April 2007 @ 01:59 am
In the meeting with Thilo a few days ago the following points came up. (I documented the points at the time, but took more time to make this entry as several of them required research or thought for completeness.)

  1. The concept of using a "bitmap" to make a determination of what action to take must use trinary logic, i.e. there must be three values including "roughly equal". Response: Of course, it was silly not to document it this way before.

  2. Consider the possibility that nodes only make landscape increasing moves, or only make landscape-decreasing moves. The purpose of this would be to simplify the logic used to determine correct action. There is too much focus on "getting the balance right immediately" and not enough acceptance that a dynamic system, properly driven, will work itself out in time. Response: I'm not sure of the value of this. I understand the concern about the focus but that is a fuzzy line to draw. However, there is definitely something to be said for reduction of the system to simpler atomic operations (hence the closely-related original emphasis in the Overview document on "push-only operations" v1.1 p.3), and it could certainly be experimented with easily if the bitmaps are coded correctly. I will take as an action item the need to code these rules in such a way that they are easily replaceable and, if there is time, do some experiments to see whether the more limited model is as effective.

  3. Consider the possibility of having entire lines of joints move up and down, i.e. dynamically changing the height of individual stripes. Response: This introduces serious problems with locking if the movements need to be synchronous, and data collisions if they are asynchronous. While the goal of vertical movement to permit better load balancing is good, this seems to be a messy solution. I prefer to reconsider the next point instead.

  4. Consider permitting individual joints to move up and down, i.e. biaxial movement of any non-edge joint (edge joints would of course still be constrained to remain on their edges.) This is a large issue and will be discussed in its own blog entry.

  5. Identify the state of the art in Discrete Event Simulation load balancing. Research in progress.

  6. Consider a graphical representation for diagnostic purposes (based on logs, i.e. asynchronous.) Draw node boundaries as polygons and color-code load levels. Response: Makes sense to me. Will do as diagnostic tool if there is time.

  7. Note that the canonical rule of thumb for balancing work units between threads is that the number of work units should be at least an order of magnitude greater than the number of threads. "Work units" in this context maps to nodes. Response: Noted.

  8. In the experimentation phase, run two end-of-spectrum node sizes: one node per computer vs. one gridpoint per node. Response: Already in the plans, although not documented.

  9. Thilo thinks the Phase II project may not be necessary, that it's borrowing trouble and that the basic algorithm, if the rules are properly done, should be able to manage the load balancing without adaptation. The suggestion is to choose Phase III instead of II if the basic algorithm works well enough -- if the basic algorithm works, break it by implementing churn; otherwise, fix it by implementing adaptation. Response: It seems like a reasonable solution to wait until Phase I is complete to determine which following Phase is preferable.

  10. Thilo also notes that the propagation of zero-area nodes is not as cheap as we had originally thought. Even assuming a zero-area node moves to the location of the node it is being moved toward, much of the data in that node is being transmitted to two other nodes -- the ones in stripes above and below the zero-area node -- which are likely to be on separate machines. Response: Reasonable point. The amount of data transmitted to nodes above and below the target node will probably be around 2/3rds of the data transferred. This makes zero-area node propagation more expensive than normal load-balancing operations, not less, as a typical balance op will transfer less than two-thirds of a full node's worth of data and will require all the same synchronization headaches. This suggests that the propagation operation should be executed as few times as possible.

 
 
pdcs_ttg
... via email.   He would like to "spontaneously discuss" the project next week so I sent him an updated copy of the overview.

Also today I received an email from Bart requesting a change in our scheduled meeting dates.  While I was in Scotland last week we agreed on dates for two more trips to Edinburgh:  May 3-4, at the switchover from design to coding, and one of the last three weeks in June, prior to my returning to the U.S.  It turns out that Bart is unavailable May 3-4 and offered any two days from the following week.

I am working to get some company in Edinburgh for May 10-11 (-12-13 weekend) so have not yet bought plane tickets, but that is the current plan.  Will confirm with Bart when plans are more definite.
 
 
pdcs_ttg
Last week (April 10-13) I was in Scotland meeting with Bart Craenen and Ben Paechter to develop the idea of this thesis. An overview has been approved by Bart and Ben and was emailed to Gusz Eiben on Sunday.

Today I spoke with Professor Eiben on the phone. He has reviewed the overview document and we discussed the following:

  • I will need a second reader from within the PDCS group.  Gusz suggests Martin van Steen or Thilo Kielmann.  Since Gusz says that van Steen is overbooked and since I have already had a class with Thilo, I sent an email to Thilo describing the project and asking if he would be interested.
  • Gusz strongly suggests I design in the output features up front, in order to be able to measure the behavior of the system rather than just providing "it runs / it doesn't run" and then having to retrofit for analysis.  Already intended to do it this way.
  • A few other less significant comments and clarifications.
Gusz gave his approval of this as a Masters project.  (He said he was "impressed"!  I am pleased.)  As he was also solicited for his preference between Phase 2 and Phase 3, he reported that Phase 2 was likely to be of more use (as the number of computers in the simulation was likely to remain stable throughout the simulation, under the current circumstances and expectations) and thus was more interesting to him.

There was discussion of the actual timeline for completing the project.  I relayed the plan Bart and I had discussed: eight weeks of coding, to include as the first two weeks a detailed design; the coding would then be followed by approximately one month of experimentation and thesis writeup.  He found this satisfactory.

I am starting on the design and awaiting a response from Thilo about his being a second reader.
 
 
pdcs_ttg
17 April 2007 @ 03:29 pm
Hi.  
This journal is intended to be documentation of developments associated with my Masters Thesis.
 
 
 
 

Advertisement

Customize