-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variorium Connector: kernel granularity and JSON #265
Conversation
Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide sample output.
#include <inttypes.h> | ||
#include <chrono> | ||
#include <cstdlib> | ||
#include <cstring> | ||
#include <vector> | ||
#include <unordered_set> | ||
#include <string> | ||
#include <regex> | ||
#include <ctime> | ||
#include <cxxabi.h> | ||
#include <dlfcn.h> | ||
#include <ctime> | ||
#include <chrono> | ||
#include <iostream> | ||
#include <fstream> | ||
|
||
#include <inttypes.h> | ||
#include <iostream> | ||
#include <regex> | ||
#include <stdio.h> | ||
#include <string> | ||
#include <unordered_set> | ||
#include <vector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which of these header files are actually required? Please make sure to only use what you need.
int power = -1; | ||
int power1 = 0; | ||
int power2 = 0; | ||
int filemake = -1; | ||
long long time1 = 0; | ||
long long time2 = 0; | ||
uint32_t gdevID = -1; | ||
std::vector<float> gpu_powers; | ||
std::vector<float> gpu_powers2; | ||
std::string output = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are all these global variables used for? Please add comments in the code explaining their usage.
int type_of_profiling = | ||
0; // 0 is for both print power & json, 1 is for print power, 2 is for json | ||
bool usingMPI = false; | ||
bool verbosePrint = false; | ||
bool mpiOutPut = false; | ||
0; // 0 is for both print power & json, 1 is for print power, 2 is for json | ||
bool usingMPI = false; | ||
bool verbosePrint = true; | ||
bool mpiOutPut = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these still used/necessary?
/*void printFile() { | ||
|
||
std::ofstream file("variorumoutput.txt", std::ios::app); // Open in append | ||
mode if (!file) { std::cerr << "Error creating the file!" << std::endl; } else { | ||
std::cout << "File created or opened successfully." << std::endl; | ||
} | ||
file.close(); // Close the file | ||
i}*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove unused code.
file.close(); // Close the file | ||
i}*/ | ||
void printFile() { | ||
const std::string filename = "variorumoutput.txt"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this configurable via an environment variable.
// std::cout << "Number of Sockets: " << num_sockets << | ||
// std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// std::cout << "Number of Sockets: " << num_sockets << | |
// std::endl; |
char *num_sockets_pos = strstr(s, "\"num_gpus_per_socket\":"); | ||
if (num_sockets_pos != nullptr) { | ||
num_sockets_pos += strlen("\"num_gpus_per_socket\":"); | ||
|
||
char *num_sockets_end_pos = strchr(num_sockets_pos, ','); | ||
if (num_sockets_end_pos == nullptr) { | ||
num_sockets_end_pos = strchr(num_sockets_pos, '}'); | ||
} | ||
if (num_sockets_end_pos != nullptr) { | ||
|
||
std::string num_sockets_str(num_sockets_pos, | ||
num_sockets_end_pos - num_sockets_pos); | ||
|
||
num_sockets = std::stoll(num_sockets_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try using jansson here as well (and below).
std::cout << " Gpu start power: " << gpu_powers2[gdevID] | ||
<< " Gpu end power: " << gpu_powers[gdevID] | ||
<< " Energy Estimation: " | ||
<< ((gpu_powers[gdevID] + gpu_powers2[0]) / 2) * | ||
((temp - time1) * .001) | ||
<< std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just not print anything to stdout but only to the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me.
/* std::cout << "kokkos library call\n" << std::endl; | ||
if (usingMPI) { | ||
variorum_call_mpi(); | ||
} else { | ||
variorum_call(); | ||
} | ||
time_t total_time; | ||
time_t end_time; | ||
time(&end_time); | ||
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | ||
<< "\n"; | ||
total_time = end_time - start_time; | ||
|
||
std::cout << "The kokkos library was alive for " << total_time << " | ||
seconds." | ||
<< std::endl;*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* std::cout << "kokkos library call\n" << std::endl; | |
if (usingMPI) { | |
variorum_call_mpi(); | |
} else { | |
variorum_call(); | |
} | |
time_t total_time; | |
time_t end_time; | |
time(&end_time); | |
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | |
<< "\n"; | |
total_time = end_time - start_time; | |
std::cout << "The kokkos library was alive for " << total_time << " | |
seconds." | |
<< std::endl;*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twilk10 Is this suggestion OK with you? If so, can you commit it?
It is good with me.
std::ostream &operator<<( | ||
std::ostream &os, | ||
const Kokkos::Tools::Experimental::ExecutionSpaceIdentifier &identifier) { | ||
|
||
os << " Type: " << identifier.type << ""; | ||
os << " Device ID: " << identifier.device_id << ""; | ||
gdevID = identifier.device_id; | ||
os << " Instance ID: " << identifier.instance_id; | ||
// output += " Device ID: " + std::to_string(identifier.device_id) + | ||
// " Instance ID: " + | ||
// std::to_string(identifier.instance_id); | ||
writeToFile("variorumoutput.txt", output); | ||
return os; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't overload operator<<
but just write to the output file directly.
Please use |
+1 @twilk10 Thanks for submitting this PR! To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add |
Thank you I will work on getting that information up as soon as possible. |
Thanks! Be sure to run clang-format-8 on each code file that you changed in this PR, as @masterleinad mentions in one of his comments. This is a prereq for your PR to be merged. Note that this is the CI check that doesn't pass. |
I ran clang-format 8 on it I will have to evaluate what went wrong and try it again. |
@twilk10 OK. You want to do something like:
You can diff in between to see whether and/or how the file has indeed changed and then If that doesn't work, can you send the error output with the flag |
@twilk10 Can you look at @masterleinad 's comments particularly on printing output and parsing json with jansson? I think they are good suggestions and maybe you can commit those before committing other changes with clang-format. Also, make sure to note to not include any unneeded headers. |
I will take a look and try updating withthe help of these suggestions. |
I am having trouble getting access to clang-format-8 but I can get other versions. |
I have solved this issue. |
Great! All checks have passed for clang-format-8. Can you address the comments by @masterleinad ? (You can commit suggestions if you agree). I also updated the title to capture the changes you made in your code better. @masterleinad Are there any other things to resolve here? |
I am currently working on updating the connector to address the comments made by @masterleinad |
@@ -1,4 +1,5 @@ | |||
//@HEADER | |||
//@HEADER140 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//@HEADER140 | |
//@HEADER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have HEADER140
here.
std::ostream &operator<<( | ||
std::ostream &os, | ||
const Kokkos::Tools::Experimental::DeviceType &deviceType) { | ||
switch (deviceType) { | ||
case Kokkos::Tools::Experimental::DeviceType::Serial: | ||
os << "CPU"; | ||
output += "Type: CPU"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::OpenMP: | ||
os << "OpenMP"; | ||
output += "Type: OpenMP"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::Cuda: | ||
os << "cuda"; | ||
output += "Type: CUDA"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::HIP: | ||
os << "hip"; | ||
output += "Type: HIP"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::OpenMPTarget: | ||
os << "openmptarget"; | ||
output += "Type: openmptarget"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::HPX: | ||
os << "hpx"; | ||
output += "Type: hpx"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::Threads: | ||
os << "threads"; | ||
output += "Type: threads"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::SYCL: | ||
os << "sycl"; | ||
output += "Type: SYCL"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::OpenACC: | ||
os << "openacc"; | ||
output += "Type: OPENACC"; | ||
break; | ||
|
||
default: | ||
os << "Unknown Device Type"; | ||
output += "Type: Uknown Device Type"; | ||
break; | ||
} | ||
time_t total_time; | ||
time_t end_time; | ||
time(&end_time); | ||
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | ||
<< "\n"; | ||
total_time = end_time - start_time; | ||
|
||
std::cout << "The kokkos library was alive for " << total_time << " seconds." | ||
<< std::endl; | ||
return os; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define a function such as
kokkos-tools/profiling/space-time-stack/kp_space_time_stack.cpp
Lines 54 to 67 in 5e895a3
Space get_space(SpaceHandle const& handle) { | |
// check that name starts with "Cuda" | |
if (strncmp(handle.name, "Cuda", 4) == 0) return SPACE_CUDA; | |
// check that name starts with "SYCL" | |
if (strncmp(handle.name, "SYCL", 4) == 0) return SPACE_SYCL; | |
// check that name starts with "OpenMPTarget" | |
if (strncmp(handle.name, "OpenMPTarget", 12) == 0) return SPACE_OMPT; | |
// check that name starts with "HIP" | |
if (strncmp(handle.name, "HIP", 3) == 0) return SPACE_HIP; | |
if (strcmp(handle.name, "Host") == 0) return SPACE_HOST; | |
abort(); | |
return SPACE_HOST; | |
} |
Kokkos::Tools::Experimental::DeviceType
implicitly printable.
This is an update to the Kokkos-tool variorum connector and can be used to get the power data an Compatible with variorum version 6.0 |
You pushed your build directory. You need to remove it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file.
@@ -1,4 +1,5 @@ | |||
//@HEADER | |||
//@HEADER140 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
//#include <cxxabi.h> | ||
//#include <dlfcn.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//#include <cxxabi.h> | |
//#include <dlfcn.h> |
If you don't need it, remove it.
if (gdevID == 0) { | ||
switch (gdevID) { | ||
case 1: { | ||
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_1"); | |
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_" + std::to_string(gdevID)); |
and remove all the switch cases.
output = " Energy Estimation " + std::to_string(((power1 + power2) / 2) * | ||
((temp - time1) * .001)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the unit?
} | ||
} catch (int e) { | ||
std::cout << "No MPI Option provided, not using per rank output" | ||
<< std::endl; | ||
usingMPI = false; | ||
// usingMPI = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// usingMPI = false; |
std::ostream& operator<<( | ||
std::ostream& os, | ||
const Kokkos::Tools::Experimental::ExecutionSpaceIdentifier& identifier) { | ||
gdevID = identifier.device_id; | ||
|
||
output += | ||
" Device ID: " + std::to_string(identifier.device_id) + | ||
" Instance ID: " + std::to_string(identifier.instance_id) + | ||
" DeviceType: " + | ||
deviceTypeToString(static_cast<Kokkos::Tools::Experimental::DeviceType>( | ||
identifier.device_id)); | ||
|
||
writeToFile(filename, output); | ||
return os; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't introduce an overload for operator<<
but use a stringstream
or write to the file immediately instead.
writeToFile(filename, "name: " + std::string(name)); | ||
gdevID = devID; | ||
auto result = Kokkos::Tools::Experimental::identifier_from_devid(devID); | ||
std::cout << result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't print to cout
anywhere but only to the output file.
Please provide output from the incremental tests or so. |
// Initial and final power values for a kernel | ||
double global_power[2] = {0, 0}; | ||
// Initial and final time value for a kernel | ||
long long global_time[2] = {0, 0}; | ||
uint32_t global_device_id = -1; | ||
uint32_t global_instance_id = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obvisously, this tool is not thread-safe.
uint32_t global_instance_id = -1; | ||
std::string global_filename; | ||
std::string global_kernel_name; | ||
std::string global_device_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need so many global variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to record a bunch of information in the "begin kernel" calls that we are using in the "end kernel" calls. We can't pass that through the API interfaces directly.
Might be a good idea to move forward with this PR |
A review would help. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replying to #265 (comment)
I will defer to you.
variorum_call(); | ||
} | ||
auto result = Kokkos::Tools::Experimental::identifier_from_devid(devID); | ||
global_kernel_name = std::string(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global_kernel_name = std::string(name); | |
global_kernel_name = name; |
Same comment everywhere
json_t* my_power_obj = nullptr; | ||
my_power_obj = json_object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_t* my_power_obj = nullptr; | |
my_power_obj = json_object(); | |
json_t* my_power_obj = json_object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fragment from previous version, remove my_power_obj
. We're using root
now instead.
if (!root) { | ||
fprintf(stderr, "Error parsing JSON: %s\n", error.text); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency: use std::cerr
Thanks @masterleinad for adding the CI testing. Seems to compile fine with Variorum for me. |
I am ok with this if CI with Variorum succeeds. |
@@ -88,6 +104,7 @@ jobs: | |||
cmake --install build-with-${{ matrix.preset }} --prefix=${Kokkos_ROOT} | |||
- name: Build Kokkos Tools, enabling examples | |||
run: | | |||
export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH};${VARIORUM_ROOT} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the environment variable already adds it to the search path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_package(Variorum QUIET) |
Apparently it is spelled with an uppercase
V
.Please adjust your env variable accordingly and remove the line I highlighted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous commit had Variorum_ROOT
as environemnt variable and didn't find variorum
.
kokkos-tools/cmake/configure_variorum.cmake
Lines 8 to 10 in dd08b98
if(DEFINED ENV{VARIORUM_ROOT}) | |
set(Variorum_ROOT $ENV{VARIORUM_ROOT}) | |
set(MSG_NOTFOUND "check VARIORUM_ROOT environment variable ($ENV{VARIORUM_ROOT})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That configure cmake file logic is absurd and need fixing.
(On my cell can't navigate easily but in short it should not write to Variorum_ROOT that is nonsensical)
Updates to the current connector with updates such as json updates, energy estimation, device information. It has also been altered to get power information at the kernel level.