Skip to content
This repository has been archived by the owner on Apr 23, 2021. It is now read-only.

Add TTI pass initialization to pass managers. #49

Closed
wants to merge 10 commits into from

Conversation

dcaballe
Copy link
Contributor

Hi!

This PR is mostly to ask for feedback :). We are using mlir::ExecutionEngine to JIT-compile and execute nGraph ops on CPUs and noticed that LLVM optimizer wasn't getting the proper target information because TTI implementation was initialized with NoTTIImpl. This happens because TargetTransformInfoWrapperPass is not added to the pass managers.

I think there are a couple of ways to fix this:

  1. MLIR repo: something like the code attached, which adds TargetTransformInfoWrapperPass initialized with the proper TargetMachine to the pass managers. I tested it by passing a TargetMachine with all the default host sub-target features and specific cpu name (skylake-avx512). However, I must be missing something since I’m getting skylake-avx512 assembly with this approach even though the target machine used by OrcJit doesn’t have any sub-target features/cpu name set: https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp#L150
  2. nGraph repo: using mlir::makeLLVMPassesTransformer (https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/OptUtils.cpp#L103) in nGraph to add TargetTransformInfoWrapperPass to the pass managers.
  3. Any other idea?

I see value in adding TargetTransformInfoWrapperPass pass in MLIR repo so that other users don’t hit the same issue or have to replicate the same code. I also see value in having a utility in MLIR repo that creates a TargetMachine with the default host cpu parameters (cpu name, sub-target features, etc.) and reusing it every time is needed (for example, in the OrcJIT code I pointed before and also here: https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp#L150)

Please, let me know what you think. I should be able to help with this.

Thanks!
Diego

std::function<llvm::Error(llvm::Module *)>
makeOptimizingTransformer(unsigned optLevel, unsigned sizeLevel);
makeOptimizingTransformer(unsigned optLevel, unsigned sizeLevel,
llvm::TargetMachine *targetMachine = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not even have a default nullptr value here to avoid forgetting it at call-sites.

@joker-eph
Copy link
Contributor

Thanks! I had a similar patch locally that I didn't complete.

Ultimately my take was that most of the code here could be moved directly in LLVM as there isn't much specific to MLIR in the JIT abstraction we have in MLIR. In the meantime this PR seems like a good thing to do.

@dcaballe
Copy link
Contributor Author

Ultimately my take was that most of the code here could be moved directly in LLVM as there isn't much specific to MLIR in the JIT abstraction we have in MLIR.

Yeah, that would be great.

I also see value in having a utility in MLIR repo that creates a TargetMachine with the default host cpu parameters (cpu name, sub-target features, etc.)

Do you think adding a utility for this ^ in a separate commit is also a good idea?

This is a suggestion for the squashed commit message. Sorry, I didn't add it to the first commit:
Add TargetMachine parameter to makeOptimizingTransformer and populatePassManagers APIs and initialize TargetTransformInfoWrapperPass with the right target information and add it to the pass managers.

Thanks, Mehdi!
Diego

@dcaballe dcaballe changed the title [WIP] Add TTI pass initialization to pass managers. Add TTI pass initialization to pass managers. Jul 19, 2019
dcaballe added a commit to dcaballe/mlir that referenced this pull request Jul 22, 2019
PR#49 (tensorflow#49) adds TTI
initialization pass to pass managers for a given target machine.
However, this may cause an inconsistency if the target machine used by
the JIT compiler is not the same as the one used by TTI. This PR allows
passing a reference target machine to be used by JIT compiler. If no
reference target machine is provided, a default host target machine is
used, including all its sub-target features.
@dcaballe
Copy link
Contributor Author

I also see value in having a utility in MLIR repo that creates a TargetMachine with the default host cpu parameters (cpu name, sub-target features, etc.)

Do you think adding a utility for this ^ in a separate commit is also a good idea?

dcaballe#1 is a stepping stone into ^. Please, let me know what you think.

Thanks!
Diego

@joker-eph
Copy link
Contributor

You have conflicts now apparently, can you rebase?

dcaballe#1 is a stepping stone into ^. Please, let me know what you think.

Thanks, it seems a bit strange to me to manually unwrap a TM to build a TM-builder, I'm not sure why ORCJIT does not just take a std::function<std::unique_ptr<TargetMachine>()> callback instead of this class?
Most of the code you're adding is putting this whole utility even more in LLVM land, at this point I rather invest time into moving this into LLVM itself than adding so much "low-level" glue code in MLIR.

@dcaballe
Copy link
Contributor Author

You have conflicts now apparently, can you rebase?

Done

I'm not sure why ORCJIT does not just take a std::function<std::unique_ptr()> callback instead of this class?

Yeah, it would make more sense since TM seems to be needed before ORCJIT is created. There is sort of a chicken-egg problem as we need TM to properly initialize TTI pass in the transformer function that we pass to create ORCJIT.

at this point I rather invest time into moving this into LLVM itself than adding so much "low-level" glue code in MLIR.

Sure. Please, keep me posted and let me know if I can help.

Diego

@dcaballe
Copy link
Contributor Author

Hi Mehdi,

Do you think we could proceed with this PR in the meantime until things are moved to LLVM? That would be very helpful.

Thanks,
Diego

@@ -69,7 +71,8 @@ void mlir::initializeLLVMPasses() {
// This behaves similarly to LLVM opt.
static void populatePassManagers(llvm::legacy::PassManager &modulePM,
llvm::legacy::FunctionPassManager &funcPM,
unsigned optLevel, unsigned sizeLevel) {
unsigned optLevel, unsigned sizeLevel,
llvm::TargetMachine *targetMachine = nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a single call site, you should be able to skip the default value here

/// levels (e.g. -O2 or -Os).
/// levels (e.g. -O2 or -Os). If not null, \p targetMachine is used to
/// initialize passes that provide target-specific information to the LLVM
/// optimizer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention that the the provided TM is expected to outlive the returned std::function?

@joker-eph
Copy link
Contributor

Sorry, I didn't mean to stale the current PR of course. If you don't mind updating it with the two nit I just commented, that seems OK to merge.

Copy link
Contributor

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry I wrote this comment the other day and didn't hit "submit review")

@@ -127,7 +127,8 @@ std::function<llvm::Error(llvm::Module *)> mlir::makeLLVMPassesTransformer(
continue;

if (insertOptPasses && optPassesInsertPos == i) {
populatePassManagers(modulePM, funcPM, mbOptLevel.getValue(), 0);
populatePassManagers(modulePM, funcPM, mbOptLevel.getValue(), 0,
nullptr /*TTI*/);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this function (mlir::makeLLVMPassesTransformer) also take a TM as argument?

Nit: the "standard" way of naming arguments is to prepend with the argument name /* targetMachine=*/ nullptr (The nice thing is that clang-tidy would then warn when the name in the comment does not match the function declaration name for the parameter)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it makes sense. I didn't know that about clang-tidy. Good to know!

auto transformer =
mlir::makeLLVMPassesTransformer(passes, optLevel, optPosition);
auto transformer = mlir::makeLLVMPassesTransformer(
passes, optLevel, /*targetMachine=*/nullptr, optPosition);
Copy link
Contributor

@joker-eph joker-eph Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here can (and should?) we do something like:

auto TM_or_error = JITTargetMachineBuilder::detectHost().createTargetMachine();
if (!TM) {
  llvm::errs() << "Failed to create a TargetMachine for the host\n";
  return EXIT_FAILURE;
}
auto transformer = mlir::makeLLVMPassesTransformer(
		       passes, optLevel, /*targetMachine=*/TM_or_error->get(), optPosition);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion here but maybe we should add this in a separate commit since this is changing runner's behavior. Also, I'm not sure how much we are getting from detectHost().createTargetMachine() since this is not adding sub-target features. We would need something more in the direction of what I pointed out (dcaballe#1) but I understood you preferred that code in LLVM. Let me know if we can reuse part of that code. I could prepare another commit with that. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't most of the TLI independent of sub-target features?
In any case adding sub-target features seems like the kind of thing that detectHost().createTargetMachine() should take care of, don't you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vectorization is sub-target dependent, for example.

I agree with you but I feel there must be a reason why it's not there already :). Let me try to add it to LLVM and I'll create another PR to update jit runner if you don't beat me to it.

@ftynse ftynse self-requested a review August 2, 2019 13:49
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 6, 2019
Many LLVM transformations benefits from knowing the targets. This enables optimizations,
especially in a JIT context when the target is (generally) well-known.

Closes #49

COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#49 from dcaballe:dcaballe/tti ab02f72eb326f660945696e5dadeeb983cf263b3
PiperOrigin-RevId: 261840617
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants