Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the GTPSA.jl backend #329

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

mattsignorelli
Copy link

@gdalle
Copy link
Member

gdalle commented Jun 25, 2024

@mattsignorelli thank you for the contribution!
I fixed a few things and defined AutoGTPSA in the main package for now. Types should not be defined in package extensions because it makes them hard to import.
The main part missing is the pushforward operator. If you don't have a dedicated mechanism for that, you can always deduce it from derivative, gradient or jacobian depending on the function type (array/scalar -> array/scalar). It will be slow but it will work.

@gdalle gdalle linked an issue Jun 25, 2024 that may be closed by this pull request
@gdalle gdalle added the backend Related to one or more autodiff backends label Jun 25, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jun 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 416 lines in your changes missing coverage. Please review.

Project coverage is 54.75%. Comparing base (d51fc0a) to head (77e49d6).

Files with missing lines Patch % Lines
...ace/ext/DifferentiationInterfaceGTPSAExt/onearg.jl 0.00% 287 Missing ⚠️
...ace/ext/DifferentiationInterfaceGTPSAExt/twoarg.jl 0.00% 119 Missing ⚠️
...face/ext/DifferentiationInterfaceGTPSAExt/utils.jl 0.00% 9 Missing ⚠️
...erfaceGTPSAExt/DifferentiationInterfaceGTPSAExt.jl 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (d51fc0a) and HEAD (77e49d6). Click for more details.

HEAD has 38 uploads less than BASE
Flag BASE (d51fc0a) HEAD (77e49d6)
DI 41 11
DIT 10 2
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #329       +/-   ##
===========================================
- Coverage   98.51%   54.75%   -43.76%     
===========================================
  Files         108      103        -5     
  Lines        4297     4597      +300     
===========================================
- Hits         4233     2517     -1716     
- Misses         64     2080     +2016     
Flag Coverage Δ
DI 52.44% <0.00%> (-46.14%) ⬇️
DIT 60.01% <ø> (-38.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gdalle gdalle marked this pull request as draft June 25, 2024 09:05
@gdalle gdalle marked this pull request as ready for review July 8, 2024 04:41
@gdalle gdalle marked this pull request as draft July 30, 2024 14:05
@mattsignorelli
Copy link
Author

Sorry for the delay on this, I've had my hands tied with all different projects. Two weeks ago I pushed major updates/fixes and finishing touches to GTPSA.jl, which is now it's in first stable release. I'll get back to this shortly

@gdalle
Copy link
Member

gdalle commented Aug 7, 2024

The structure of the tests has changed a little since the last time you checked, can you merge main into this PR?

@gdalle
Copy link
Member

gdalle commented Aug 9, 2024

@mattsignorelli before I review, I updated the tests so that we now have full code coverage information. Can you take a look and see if you are able to increase the coverage?

One thing you need to know is that in-place operators like pushforward!(f, dy, backend, x, dx) are not tested on scenarios for which the target dy is not mutable. So the branch if y isa Number in this operator is not needed.

Another thing you may want to exploit to reduce code duplication is the redundancy between operators: derivative is just pushforward with dx = 1, gradient is just pullback with dy = 1. They all fall back onto one another inside DI, so that it is theoretically sufficient to just implement value_and_pushforward. If custom implementations of other operators yield a speed boost, then let's keep them, but if all you do in derivative is essentially call pushforward, you can get rid of it (for example).

@mattsignorelli
Copy link
Author

mattsignorelli commented Aug 9, 2024

Thanks for looking this over!

For second order, because GTPSA does not allow nesting, separate operators must be written. For the first order operators gradient, jacobian, there is a large slowdown/significantly higher number of allocations when using only pushforward. This is because, in order to construct the jacobian or gradient, several pushforwards with different seeds have to be propagated through the function. The TPS type is mutable, and so it is much slower to repeatedly evaluate the function for different seeds, than it is to evaluate it once treating the TPSs as variables and then extract the first order derivatives. However derivative I have removed as using pushforward for that is equivalent.

I have also removed inaccessible branches (where y is not mutable) for the in-place operators, following the code coverage report, and simplified the HVP code (since the best GTPSA can do really is construct the enter hessian and multiply the vector).

The rest of the uncovered code is because the user can optionally specify a custom Descriptor, which lives in the AutoGTPSA type - this allows users to choose custom truncation orders for different variables (e.g. calculate hessian of f(x,y) but assume no $x^2$ terms in the resulting truncated power series so $\partial^2 f/\partial x^2$ is zero). This can significantly improve speed at the cost of lower accuracy. This feature is only applicable in highly specific use cases and I don't expect it to be used very often, but it should be available since GTPSA provides it. As for the code coverage in this case, the number of input variables and truncation order(s) have to be specified in the Descriptor for each tested function, and I'm not sure how to do this for every test case?

@gdalle gdalle added this to the v0.6 milestone Sep 14, 2024
@gdalle gdalle marked this pull request as draft September 18, 2024 08:37
@gdalle
Copy link
Member

gdalle commented Sep 18, 2024

Sorry for taking so long to get back to you @mattsignorelli! I have brought your PR up to speed with the rest of the package, and taken into account the new version of ADTypes which includes AutoGTPSA.
The last thing I'd like to do before merging is clean up the code a little bit, by removing duplication in favor of clear auxiliary functions. For example, my last commit cuts a lot of code from pushforward thanks to

function initialize!(xt::TPS, x::Number, dx::Number)
    xt[0] = x
    xt[1] = dx
    return xt
end

function initialize!(xt::AbstractArray{<:TPS}, x::AbstractArray, dx::AbstractArray)
    for i in eachindex(xt, x, dx)
        initialize!(xt[i], x[i], dx[i])
    end
    return xt
end

Can you maybe create similar functions to shorten the other operators? Just like TPS initialization, extraction from TPS into numbers and arrays could also be streamlined.

@mattsignorelli
Copy link
Author

mattsignorelli commented Sep 18, 2024

Thanks for your work on this!

function initialize!(xt::TPS, x::Number, dx::Number)
    xt[0] = x
    xt[1] = dx
    return xt
end

I think we need to be careful here. This would only apply for single-variable functions: xt[1] = dx basically sets the dual part of the first variable to dx. However, if there are two variables, then to set the dual part of the second variable, you would need to do xt[2] = dx. E.g.

julia> d = Descriptor(2,1); # two variables to first order

julia> f = TPS(use=d);

julia> f[0] = 1; f[1] = 2; f[2] = 3;

julia> print(f) # corresponds to f(dx1,dx2) = 1 + dx1 + dx2
TPS64:
 Coefficient                Order   Exponent
  1.0000000000000000e+00      0      0   0
  2.0000000000000000e+00      1      1   0
  3.0000000000000000e+00      1      0   1

Also, because the TPS type is mutable, we really only want to evaluate the function once. I'll make some changes to the updates and try to simplify/remove code duplication using utility functions, and push for review

@mattsignorelli
Copy link
Author

I think I understand what Tangents are now. Just to check: to take the pushforward of a 3 variable function with seed [dx1, dx2, dx3], your Tangents object will be Tangents{1, Vector{Float64}}(([dx1, dx2, dx3])) ? And so Tangents is there to store multiple seeds?

@gdalle
Copy link
Member

gdalle commented Sep 18, 2024

Tangents is there to store multiple seeds?

The expression "multiple seeds" is a bit unclear. For instance, for a function f(x::AbstractVector), DI considers dx::AbstractVector to be "a single seed". It used to be possible to call pushforward(f, backend, x, dx) but that is no longer the case.
From v0.6 you will need to use the wrapper pushforward(f, backend, x, Tangents(dx)), and recover a Tangents{1} as the result. On the other hand, if you have several seeds dxa, dxb, dxc (perhaps clearer than numbers?) inside Tangents(dxa, dxb, dxc), then your batch size is 3 and you will get a Tangents{3} as the result.
Sorry if this is not well documented, you're the first outside person to ever contribute an extension so I didn't plan for that case ^^ But if there are unclear aspects I should definitely improve the docs.

@gdalle
Copy link
Member

gdalle commented Sep 18, 2024

I think we need to be careful here. This would only apply for single-variable functions

That's the setting considered in DI. Even with all the latest changes, there will only ever be one active variable x, either in f(x) or in f!(y, x). Of course that variable can itself be an array.

@mattsignorelli
Copy link
Author

Ah ok, I think I understand better. I'm admittedly not super familiar with the usual conventions used by most AD packages in Julia, only that in GTPSA, so I'm trying to connect the dots here. Let me explain how it works in GTPSA, and then maybe you can help better see how to fit it into DI:

Basically with GTPSA, it calculates truncated power series in the variables, assuming the variables are small. So for example to calculate a Jacobian of a function $F: \mathbb{R}^n -&gt; \mathbb{R}^m$, you would define a Descriptor(n, 1) for $n$ variables to order 1, and then use x = vars(d) which gives you a vector which in the usual AD language is actually $[\Delta x_1, \Delta x_2, ... \Delta x_n]$ ("variables" in GTPSA are small, so actually this might be better understood as dx = vars(d)). Then, to calculate the multivariable Taylor expansion of the function $F$ around some point $\vec{a}$, you would just do y = F(a.+x). y here is now the multivariable Taylor expansion. You can then extract the Jacobian from the monomial coefficients of the Taylor expansion.

Alternatively, one could do this one with variable, e.g. calculate $F([a_1+\Delta x_1, a_2, ..., a_n])$, then do $F([a_1, a_2+\Delta x_1, ..., a_n])$ etc and extract the first order part of the TPS result for each component for each calculation. However, this will be slower than instead propagating all of $F(\vec{a}+[\Delta x_1, \Delta x_2, ... \Delta x_n])$, because TPSs are mutable and we only want to calculate the function once.

@gdalle
Copy link
Member

gdalle commented Sep 19, 2024

That makes sense, thank you. But then I think the initialize! function I wrote earlier is fine? At least it passes the tests ^^
Also, this makes me wonder if we could compute HVPs by considering both x and dx as variables somehow. With higher-level differentiation utilities like those in GTPSA, there must be something better to do than computing the full Hessian first.

@gdalle gdalle closed this Sep 19, 2024
@gdalle gdalle reopened this Sep 19, 2024
@gdalle gdalle removed this from the v0.6 milestone Sep 23, 2024
@gdalle gdalle mentioned this pull request Sep 25, 2024
@mattsignorelli
Copy link
Author

OK sorry for the big delay, I have been completely consumed by my other project but I'm ready to get back to this now.

Also, this makes me wonder if we could compute HVPs by considering both x and dx as variables somehow. With higher-level differentiation utilities like those in GTPSA, there must be something better to do than computing the full Hessian first.

That's a great idea. I'll look into this and see if I can come up with anything.

I'll take a look at the latest code and dev guide to reorient myself, and be in touch soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to one or more autodiff backends
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding GTPSA.jl to the interface
3 participants