Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Parallel Communication Through MDI #6

Open
taylor-a-barnes opened this issue Sep 13, 2018 · 1 comment
Open

Enable Parallel Communication Through MDI #6

taylor-a-barnes opened this issue Sep 13, 2018 · 1 comment
Milestone

Comments

@taylor-a-barnes
Copy link
Collaborator

If possible, provide a means for communicating in parallel between two codes through the MDI interface. This will be a challenging project, and will require considerable planning. One of the main complications is that the data structures for the two codes might be laid out in entirely different ways. One possible solution might be to assign a particular standard data structure layout for MDI. Then, each production code can define a mapping of its data structure onto the MDI data structure for each relevant send or receive command. For example, the production code might define a mapping of the COORD data, which is used whenever it receives the COORD command. Once it knows the mappings, MDI is then responsible for correctly shuttling information between the production code and the driver in the correct way.

@taylor-a-barnes taylor-a-barnes modified the milestones: 1.1, 2.0 Jan 3, 2019
@taylor-a-barnes
Copy link
Collaborator Author

One possible approach is described here.

Every message sent through MDI is prefaced with an integer, recv_dist, which is a flag that indicates whether a map is going to be sent before the remainder of the message.

In order for a driver or an engine to communicate in a distributed manner, one or both of them must first call:

MDI_Create_Map(&my_elements, &map_descriptor)
MDI_Set_Map(“COMMAND”, &map_descriptor, comm)

When a DRIVER does the above, it sends a special command >MAP command through comm, which provides the engine with its map for that command. When an ENGINE does the above, MDI stores the map, but does not send anything to the driver.

  • When the driver sends a command that requests information (a command starting with "<"): On the engine side, MDI flags recv_dist in its message to the driver, and prepends the engine's map to the remainder of the message. On the driver side, MDI sees that the recv_dist flag has been set, and reads in the engine's map before the rest of the message. On both the engine side and the driver side, MDI has enough information to determine the sender and receiver of each element of the message, and can call MPI_Alltoallw to communicate the data.

  • When the driver sends a command that gives information to the engine (a command starting with ">"): On the driver side, MDI sets the recv_dist flag, but does not prepend the driver's map to the remainder of the message (the engine already has this information). The driver-side MDI knows how many MPI ranks are associated with the engine, and sends equal chunks of the message to each rank of the engine in a standardized distribution. On the engine side, MDI sees that the recv_dist flag has been set and receives the message in the standardized distribution through an MPI_Alltoallw. It then uses the map previously set by the engine code to perform a second MPI_Alltoallw to convert from the standardized distribution to the engine's local data distribution.

In order to avoid the need for this second MPI_Alltoallw, it might be possible to have an MDI_Update_Maps() function that synchronizes the maps of all of the codes. The engine-side MDI would then need to keep track of whether the driver-side MDI has the most up-to-date map. If not, the engine-side MDI receives the data in a manner consistent with the most recent map sent to the driver, and then does a second MPI_Alltoallw to convert from the old map to the new map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant