Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New collcommobj #329

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Conversation

mikesoehner
Copy link

Description

Implemented persistent communication class. This class can also be used to replace most of the already present collective communication in the code.

Related Pull Requests

  • #PR

Resolved Issues

  • #Issue

How Has This Been Tested?

Tested with Spinodal Decomposition example and unit tests.

mikesoehner and others added 21 commits May 29, 2024 13:35
Where possible an ifdef was added. So, depending
on the cmake flag for ENABLE_PERSISTENT, the code
is compiled differently.
…onfig

Move cmake configuration for persistent collectives into the mpi module
This updated wrapper has an improved interface
and is more compact due to the environment class
being internal.
This function does not clear the forces in the SoA
structure.
Copy link
Member

@FG-TUM FG-TUM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive template magic :D

Can you explain in the main text of the PR why all of this is "just" behind an ifdef and why we can't simply get rid of the old communicator class?

Comment on lines +143 to +144
class Coll_Comm_Obj
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(most of) the rest of the code places the opening { not on a new line. As soon as #342 will be finalized this will be reformatted so we might as well do it now upon introducing new code.

Comment on lines +142 to +143
template<int tag, int Fn, typename Op, typename Root, typename Comm, typename... Ts>
class Coll_Comm_Obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add doxygen documentation in javadoc style for all classes, functions, and members.

e.g. here:

Suggested change
template<int tag, int Fn, typename Op, typename Root, typename Comm, typename... Ts>
class Coll_Comm_Obj
/**
* Some brief description
*
* Some more verbose extensive description. Maybe just the large block above.
*
* @tparam tag explanation
* @tparam Fn explanation
* ...
*/
template<int tag, int Fn, typename Op, typename Root, typename Comm, typename... Ts>
class Coll_Comm_Obj

Comment on lines +189 to +195
// this function requires that the MPI_Comm and MPI_Op have been setup in the constructor
template<typename T = Comm, typename U = Op>
auto allreduce()
-> std::enable_if_t<std::is_same_v<T, MPI_Comm> && (std::is_same_v<U, add_struct> || std::is_same_v<U, max_struct> || std::is_same_v<U, min_struct>)>
{
MPI_Allreduce( MPI_IN_PLACE, _buffer.data(), 1, _mpi_members->get_type(), _mpi_members->get_op(), _mpi_comm );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder to add javadoc to all functions as described above.

Also, please explain a bit of the technicalities of the template magic. There are mostly non-computer scientists working on this code ;)

Comment on lines +20 to +63
class MPI_Env_Wrapper
{
public:
static auto init_environment(int* argc, char*** argv)
{
static MPI_Environment mpi_env(argc, argv);
_mpi_env = &mpi_env;
}
static auto init_thread_environment(int* argc, char*** argv, int required)
{
static MPI_Environment mpi_env(argc, argv, required);
_mpi_env = &mpi_env;
}
private:
// class to manage MPI_Init and MPI_Finalize
class MPI_Environment
{
public:
MPI_Environment() = default;

MPI_Environment(int* argc, char*** argv)
{
MPI_Init(argc, argv);
}
MPI_Environment(int* argc, char*** argv, int required)
{
int provided;
MPI_Init_thread(argc, argv, required, &provided);
if (provided < required)
{
std::cerr << "Cannot provide requested level of MPI thread support." << std::endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}

~MPI_Environment()
{
MPI_Finalize();
}
};

// storing the class until the end of the program
static inline MPI_Environment* _mpi_env;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere? If not I'd rather not add it to the master branch.

Comment on lines +1 to +2
#ifndef COLLECTIVECOMMUNICATION_PERSISTENT_HELPER_H_
#define COLLECTIVECOMMUNICATION_PERSISTENT_HELPER_H_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer #pragma once but ls1 isn't consistent with this so 🤷

// check if we were given a communicator
if constexpr( std::is_same_v<Comm, MPI_Comm> )
_mpi_comm = comm;
// check if we were can cerate a persistent communication request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check if we were can cerate a persistent communication request
// check if we should create a persistent communication request

Comment on lines +157 to +160
std::array<int, sizeof...(Ts)> b;
b.fill(1);
std::array<MPI_Aint, sizeof...(Ts)> d;
std::array<MPI_Datatype, sizeof...(Ts)> t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use more expressive variable names ;)

Comment on lines +374 to +376
static inline std::vector<MPI_Request*> _allocated_requests;
static inline std::vector<MPI_Op*> _allocated_ops;
static inline std::vector<MPI_Datatype*> _allocated_types;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this correctly, these vectors accumulate all MPI requests, ops, and types created throughout the simulation. Doesn't this mean that in a very long-running simulation, this can accumulate a lot of trash if the persistent communication isn't reusing all of this perfectly? Could this lead to problems? Can we somehow build safeguards / checks against this?

Comment on lines +233 to +239
template<int T = Fn>
auto persistent()
-> std::enable_if_t< T != 0 >
{
MPI_Start( &_mpi_members->get_request() );
MPI_Wait( &_mpi_members->get_request(), MPI_STATUS_IGNORE );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this name. It sounds like it makes the CollCom object persistent (whatever that would mean). Maybe something along executePersistentRequests or shorter.

For consistency, the next two functions then might need renaming as well.

Comment on lines +66 to +70
unsigned long nNumMolsGlobalEnergyGlobal;
double UkinGlobal;
double UkinTransGlobal;
double UkinRotGlobal;
collComm.get(nNumMolsGlobalEnergyGlobal, UkinGlobal, UkinTransGlobal, UkinRotGlobal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already talked about this but it would be really cool if we could do something like this:

Suggested change
unsigned long nNumMolsGlobalEnergyGlobal;
double UkinGlobal;
double UkinTransGlobal;
double UkinRotGlobal;
collComm.get(nNumMolsGlobalEnergyGlobal, UkinGlobal, UkinTransGlobal, UkinRotGlobal);
auto [nNumMolsGlobalEnergyGlobal, UkinGlobal, UkinTransGlobal, UkinRotGlobal] = collComm.get();

Although looking at the implementation behind get() I'm not sure how to easily do that 😬 . So if it would be too complicated leave it be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants