-
Notifications
You must be signed in to change notification settings - Fork 950
Add cholesky and gradients #1492
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
@ethanluoyc Have You looked at PR#1356? It already contains a Cholesky Decomposition. Your implementation is a lot cleaner and uses TFJS methods as building blocks, which is a big upside. But as of yet, PR#1356 seems to be significantly faster. The Benchmark is done on CPU backend since PR#1492 runs faster there than on WebGL on my machine. Feel free to double check the benchmark since I'm known to make mistakes and mix things up. P.S.: I hope its okay that I included Your code in the Benchmark. I specifically mentioned the source and exempted it from the license. |
Hi @DirkToewe, Thanks for letting me know your PR! I checked the issues page but didn't see any mentions of cholesky so I thought no one has tackled this. The version currently I have here is WIP. One downside is indeed the memory issue. I think mine has not been optimized for that yet. I guess maybe we can work together on implementing cholesky and merge it into the codebase. BTW, do you have the gradients implemented? I have a preliminary version which I think is correct (more testing needed still). Another thing I haven't implemented is the blocked version of the algorithm, same for the gradients. I need the differentiable version because I use it as part of my models (Gaussian processes). Updates |
CLAs look good, thanks! |
Yes gradients are implemented symbollically, see here. But it requires quite a few new operations ( |
That's correct. It was the implementation of choice for me because:
But there's no doubt there's room for performance improvements. If You want to do blocked operations, You're going to have to work with nested loops. As an example for that, take a look at the CPU MatMul implementation. But a slight word of warning: From my experience, micro-optimizations will not yield as much performance as You may hope for. |
This change is