Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

"Low-Level" QR Implementation #1366

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

DirkToewe
Copy link
Contributor

@DirkToewe DirkToewe commented Oct 31, 2018

Description

As discussed, this is the first split of PR#1356: A faster QR Decomposition using a direct implementation of the Givens method, including support of Backpropagation.

bandPart and triangularSolve are a prerequesites for symbolic backpropagation of the QR Decomposition. Since both of the methods are frequently used in symbolic backpropagation, I believe they both deserve a backend implementation. matrixTriangularSolve and matrixBandParts are kernels in the Python/C/C++ Tensorflow Implentation as well.

bandPart is currently implemented purely using TFJS methods. In a quick performance trial, bandPart was only 8x slower without a backend implementation. Since it is only an O(m*n) operation in the first place, it's not too worrysome. The memory overhead however may be more of an issue, depending on how broadcasting is implemented.

triangularSolve will allow solving linear equations systems in TFJS, which is a requested feature.

All linting errors are now fixed, but I'm afraid the code might have become (even) less readable. Suggestions as to how to improve this are welcome. In my defense: There was method to my code-formatting-madness. Low level linear algebra is always hard to read (at least to me). So in NDJS I tried my best to format the code in a way that things that belong together are aligned, reducing distractions and making bugs easier to spot.

As I said before, it should not be too hard to implement qr and triangularSolve in WebGL1. In order to do that, I will however need some guidance and introduction to the TFJS WebGL backend.

The randomized gradients test fails by a small margin roughly every 1 in 10,000 100,000 tests. With the old implementation (after fixing some reshape and disposal issues), it fails roughly 1 time in 1,000 76 times in 50,000, sometimes with a large margin (Possible explaination: The Householder implementation ignores sign changes in the input causing abrupt changes in the gradients). Sadly, tf.randomUniform does not seem to have a seed parameter, which would make the tests reproducible.

If there are any questions about implementation details, I'm more than happy to answer them.

Quick Overview

The qr implementation is two-fold: Whenever the resulting R is a square matrix, the economic QR Decomposition qrEcoDecompKernel() is computed. For the gradients, the same symbolic backpropagation as in Python/C/C++ Tensorflow is used.

In all other cases, the full QR Decomposition is computed using qrFullDecompKernel(). Backpropagation is computed via qrFullBackpropKernel() using the Givens rotations' sin and cos values that were recorded by qrFullDecompKernel(). Higher order derivatives are not (yet) supported/implemented.

Why Givens Rotations?

The Householder method is the de-facto standard for QR Decomposition, so I feel like I have to explain why I chose Givens Rotations over it:

  • For NDJS, I did some performance trials and could not see a significant performance difference
    between Givens and Householder in JS. My guess is that the JS overhead outweighs the difference
    in FLOPs
  • With Givens Rotations, I was able to implement an economic QR Decomposition that requires only O(m*n) memory instead of O(m²). For Householder, I could not find such an implementation.
  • Givens Methods is easier to implement in a numerically stable way (e.g. no underflow-safe norm
    is required).
  • Givens Method is easier to backpropagate.
  • Givens Method guarantees det(Q) = +1, which is somewhat more canonical.
  • Givens Method is better parallelizable.
  • Givens Rotations seem to be smoother when it comes to Pertubation (Householder: do You reflect colum c to +‖c‖e or -‖c‖e). This should result in smoother gradients.
  • If the input is already close to upper triangular, a lot of operations can easily be skipped which may reduce the computation cost from O(m²n) all the way down to O(m*n) for upper triangular inputs.

For repository owners only:

Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY

For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md


This change is Reviewable

bileschi and others added 17 commits August 28, 2018 08:02
The QR Decomposition implementation uses Givens Rotations for
elimination. There's a few reasons, why the Givens Rotation was
chosen over the Householder method:
  * During a quick trial, there seemed to be no practical
    performance difference in JS between Householder and Givens.
    JS itself seems to introduce enough overhead to make the
    fewe extra FLOPS irrelevant.
  * Givens Method is easier to implement numerically stable
    (e.g. there is no underflow-safe norm necessary).
  * Givens Method is easier to backpropagate.
  * Givens Method ensure det(Q) = 1
  * Givens Rotations seem to be smoother when it comes to Pertubation,
    resulting in smoother gradients.
  * Givens Method is easier to parallelize

As long as R is non-singular there is always a way to produce a
canonical representation from a QR Decompostion (e.g. make the
diagonal of R all positive). That also means that there is no
compatibilty issue with whichever QR implementation Tensorflow
for Python/C/C++ chooses.

Both `bandPart` and `triangularSolve` were necessary to implement
the symbolic backpropagation of the QR Decomposition.
…to linalg_qr

I have no idea where why that branch was ahead of the local one...
…to linalg_qr

Merging (Merge from upstream into origin) into local.
@DirkToewe
Copy link
Contributor Author

Okay so the PR is in a reviewable state (I hope). No more commits will be made until further changes are requested and/or suggested. This is a huge PR, so I understand that review will take some time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants