Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map type annotations to source text #345

Closed
wants to merge 49 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
6282f16
RFC/WIP: map type annotations to source text
timholy Feb 5, 2023
453b5ef
Handle more complicated examples
timholy Feb 6, 2023
9048ffa
Improve robustness
timholy Feb 8, 2023
50b7c9b
Apply suggestions from code review
timholy Feb 9, 2023
4a92541
handle static parameter
aviatesk Feb 10, 2023
ef29b44
use :cyan for annotating stable type
aviatesk Feb 10, 2023
51da034
handle prefix op call nicely
aviatesk Feb 10, 2023
0304f1a
Add TypedSyntax subdir package
timholy Feb 17, 2023
85a8726
README tweaks
timholy Feb 17, 2023
fa54151
Prevent matching in `->` and `do` blocks
timholy Feb 19, 2023
f5a265e
Delete sourcetext.jl from Cthulhu
timholy Feb 19, 2023
8136e6a
Ambiguity test: assign outside of inner fcn
timholy Feb 19, 2023
661ba33
Support `[ref]` nodes
timholy Feb 19, 2023
81618ab
WIP refactor to use args
timholy Feb 21, 2023
e5d4233
Finish refactor
timholy Feb 22, 2023
4d152b8
Improve matching, support tuple-destructuring
timholy Feb 22, 2023
3e3cd7c
Add duplication test
timholy Feb 22, 2023
12e4b5d
Don't error on kwargs
timholy Feb 22, 2023
fefeb0e
Support mcrocall & arg::T in funcdefs
timholy Feb 23, 2023
d79a50b
Fix `+=`, partial fix for literals
timholy Feb 23, 2023
db1feaf
Modernize & test `printstyled`
timholy Feb 23, 2023
08e4346
Wire new framework into Cthulhu
timholy Feb 23, 2023
dda4ed6
Support `where`, unnamed arguments
timholy Feb 23, 2023
b4b8a03
Pass settings down during descend
timholy Feb 23, 2023
f1780c3
Improve `return`, ambiguous nodes
timholy Feb 24, 2023
c5bfc38
Show callsites with source
timholy Feb 24, 2023
9cafb61
Sub-menu: print source-text
timholy Feb 24, 2023
6a14767
Truncate body when filling keywords
timholy Feb 26, 2023
d8a54c2
Implement toggling, make source-view default
timholy Feb 26, 2023
847b4fd
Update TypedSyntax README
timholy Feb 26, 2023
12ed91b
Describe new source-mapping in README
timholy Feb 26, 2023
7f3758c
Handle duplicate slotnames in call
timholy Feb 27, 2023
dc9e6ca
Respect Cthulhu's `type_annotations`
timholy Feb 27, 2023
92d1b30
Fix several sources of test failures
timholy Feb 27, 2023
c0bebeb
Support not showing type-annotations
timholy Feb 27, 2023
be00100
Simplify toggles
timholy Feb 27, 2023
821c3c2
Update terminal tests
timholy Feb 27, 2023
7dafee4
Require JuliaSyntax 0.3.2
timholy Feb 27, 2023
b338996
Print type-annotation with [ref] nodes
timholy Feb 27, 2023
9828003
Update images for simpler toggles menu
timholy Feb 27, 2023
cf9d003
README: indicate that more help is coming
timholy Feb 27, 2023
2989933
Better fallbacks for failure to retrieve source
timholy Feb 28, 2023
0eb8178
Handle varargs
timholy Feb 28, 2023
0d587ab
Apply suggestions from code review
timholy Feb 28, 2023
f91cd1b
Simplify getting src & mappings
timholy Feb 28, 2023
072d83e
Update src/Cthulhu.jl
timholy Feb 28, 2023
28a9b92
Update TypedSyntax/src/show.jl
timholy Feb 28, 2023
708b8b2
Merge branch 'master' into teh/sourcetext
timholy Feb 28, 2023
73fc650
Printing improvements
timholy Feb 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ Manifest.toml
*.jl.cov
*.jl.*.cov
*.jl.mem
LocalPreferences.toml
3 changes: 3 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,18 @@ version = "2.7.9"
CodeTracking = "da1fd8a2-8d9e-5ec2-8556-3022fb5608a2"
FoldingTrees = "1eca21be-9b9b-4ed8-839a-6d8ae26b1781"
InteractiveUtils = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
JuliaSyntax = "70703baa-626e-46a2-a12c-08ffd08c73b4"
Preferences = "21216c6a-2e73-6563-6e65-726566657250"
REPL = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
SnoopPrecompile = "66db9d55-30c0-4569-8b51-7e840670fc0c"
TypedSyntax = "d265eb64-f81a-44ad-a842-4247ee1503de"
UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
Unicode = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

[compat]
CodeTracking = "0.5, 1"
FoldingTrees = "1"
JuliaSyntax = "0.3.2"
Preferences = "1"
SnoopPrecompile = "1"
julia = "1.7"
Expand Down
141 changes: 86 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,27 @@
:warning: The latest stable version is only compatible with Julia v1.7 and higher.

Cthulhu can help you debug type inference issues by recursively showing the
`code_typed` output until you find the exact point where inference gave up,
messed up, or did something unexpected. Using the Cthulhu interface you can
type-inferred code until you find the exact point where inference gave up,
messed up, or did something unexpected. Using the Cthulhu interface, you can
debug type inference problems faster.

Looking at type-inferred code can be a bit daunting initially, but you grow more
comfortable with practice. Consider starting with a [tutorial on "lowered" representation](https://juliadebug.github.io/JuliaInterpreter.jl/stable/ast/),
which introduces most of the new concepts. Type-inferrred code differs mostly
by having additional type annotation and (depending on whether you're looking
at optimized or non-optimized code) may incorporate inlining and other fairly
significant transformations of the original code as written by the programmer.

Cthulhu's main tool, `descend`, can be invoked like this:

```julia
descend(f, tt) # function and argument types
descend(f, tt) # function `f` and Tuple `tt` of argument types
@descend f(args) # normal call
```

`descend` allows you to interactively explore the output of
`code_typed` by descending into `invoke` and `call` statements. (`invoke`
statements correspond to static dispatch, whereas `call` statements correspond
to dynamic dispatch.) Press enter to select an `invoke` or `call` to descend
into, select ↩ to ascend, and press q or control-c to quit.

### JuliaCon 2019 Talk and Demo
[Watch on YouTube](https://www.youtube.com/watch?v=qf9oA09wxXY)
[![Click to watch video](https://img.youtube.com/vi/qf9oA09wxXY/0.jpg)](https://www.youtube.com/watch?v=qf9oA09wxXY)

The version of Cthulhu in the demo is a little outdated, without the newest features, but largely it has not changed too much.
`descend` allows you to interactively explore the type-annotated source
code by descending into the callees of `f`.
Press enter to select a call to descend into, select ↩ to ascend,
and press q or control-c to quit.
You can also toggle various aspect of the view, for example to suppress
"type-stable" (concretely inferred) annotations or view non-concrete
types in red.
Currently-active options are highlighted with color; press the corresponding
key to toggle these options. Below we walk through a simple example of
these interactive features.

## Usage: descend

Expand All @@ -46,10 +38,45 @@ function foo()
sum(rand(T, 100))
end

descend(foo, Tuple{})
@descend foo()
descend(foo, Tuple{}) # option 1: specify by function name and argument types
@descend foo() # option 2: apply `@descend` to a working execution of the function
```

If you do this, you'll see quite a bit of text output. Let's break it down and
see it section-by-section. At the top, you may see something like this:

![source-section-all](images_readme/descend_source_show_all.png)

This shows your original source code (together with line numbers, which here were in the REPL).
The cyan annotations are the types of the variables: `Union{Float64, Int64}` means "either a `Float64`
or an `Int64`".
Small *concrete* unions (where all the possibilities are known exactly) are generally are not a problem
for type inference, unless there are so many that Julia stops trying to work
out all the different combinations (see [this blog post](https://julialang.org/blog/2018/08/union-splitting/)
for more information).

In the next section you may see something like

![toggles](images_readme/descend_toggles.png)

This section shows you some interactive options you have for controlling the display.
Normal text inside `[]` generally indicates "off", and color is used for "on" or specific options.
For example, if you hit `w` to turn on warnings, now you should see something like this:

![warn](images_readme/descend_source_toggles_warn.png)

Now you can see small concrete unions in yellow, and concretely inferred code in cyan.
Serious forms of poor inferrability are colored in red (of which there are none in this example);
these generally hurt runtime performance and may make compiled code more vulnerable to being invalidated.

In the final section, you see:

![calls](images_readme/descend_calls.png)

This is a menu of calls that you can further descend into. Move the dot `•` with the up and down
arrow keys, and hit Enter to descend into a particular call.


## Methods: descend

- `@descend_code_typed`
Expand Down Expand Up @@ -117,7 +144,7 @@ The calls that appear on the same line separated by `=>` represent inlined metho
you enter at the final (topmost) call on that line.

By default,
- `descend` views optimized code without "warn" coloration of types
- `descend` views non-optimized code without "warn" coloration of types
- `ascend` views non-optimized code with "warn" coloration

You can toggle between these with `o` and `w`.
Expand Down Expand Up @@ -152,29 +179,19 @@ Then invoke:
Cthulhu.@descend foo(5)
```

Now, descend:

```
%22 = call bar(::Union{Float64, Int64},::Union{Float64, Int64},::Union{Float64, Int64})::String
```

which shows (after typing `w`)
Now, descend into `bar`: move the cursor down (or wrap around by hitting the up arrow) until
the dot is next to the `bar` call:

```
∘ ── %0 = invoke bar(::Union{Float64, Int64},::Union{Float64, Int64},::Union{Float64, Int64})::String
Variables
#self#::Core.Const(bar)
x::Union{Float64, Int64}
y::Union{Float64, Int64}
z::Union{Float64, Int64}
[...]
4 (4.5 * n::Int64)::Float64
• 6 bar(x, y, z)
```

The text of `Union{Float64, Int64}` will be colored in red indicating there are type-instabilities,
but they are unlikely to be problem in actual execution,
because `bar` here serves as a ["function barrier"](https://docs.julialang.org/en/v1/manual/performance-tips/#kernel-functions) and
`bar` will be called with fully concrete runtime types via dynamic dispatch.
and then hit Enter. Then you will see the code for `bar` with its type annotations.

Notice that many variables are annotated as `Union`.
To give Cthulhu more complete type information, we have to actually run some Julia code. There are many ways to do this. In this example, we use [`Infiltrator.jl`](https://github.com/JuliaDebug/Infiltrator.jl).

Add an `@infiltrate`:
Expand Down Expand Up @@ -203,21 +220,35 @@ Infiltrating foo(n::Int64) at ex.jl:10:
infil>
```

Enter `@descend bar(x, y, z)` and type `w`:
Enter `@descend bar(x, y, z)` you can see that, for `foo(4)`, the types within `bar` are fully inferred.

```
infil> @descend bar(x, y, z)

∘ ── %0 = invoke bar(::Float64,::Float64,::Int64)::String
Variables
#self#::Core.Const(bar)
x::Float64
y::Float64
z::Int64
[...]
```
## Viewing the internal representation of Julia code

Anyone using Cthulhu to investigate the behavior of Julia's compiler will
prefer to examine the
While Cthulhu tries to place type-annotations on the source code, this obscures
detail and can occassionally go awry (see details [here](TypedSyntax/README.md)).
For anyone who needs more direct insight, it can be better to look directly at Julia's
internal representations of type-inferred code.
Looking at type-inferred code can be a bit daunting initially, but you grow more
comfortable with practice. Consider starting with a
[tutorial on "lowered" representation](https://juliadebug.github.io/JuliaInterpreter.jl/stable/ast/),
which introduces most of the new concepts. Type-inferrred code differs from
lowered representation by having additional type annotation.
Moreover, `call` statements that can be inferred are converted to `invoke`s
(these correspond to static dispatch), whereas dynamic dispatch is indicated by the
remaining `call` statements.
Depending on whether you're looking at optimized or non-optimized code,
it may also incorporate inlining and other fairly significant transformations
of the original code as written by the programmer.

This video demonstrates Cthulhu for viewing "raw" type-inferred code:
[Watch on YouTube](https://www.youtube.com/watch?v=qf9oA09wxXY)
[![Click to watch video](https://img.youtube.com/vi/qf9oA09wxXY/0.jpg)](https://www.youtube.com/watch?v=qf9oA09wxXY)

The version of Cthulhu in the demo is a little outdated, without the newest features,
but may still be relevant for users who want to view code at this level of detail.

You can see that, for `foo(4)`, the types within `bar` are fully inferred.

## Customization

Expand Down
21 changes: 21 additions & 0 deletions TypedSyntax/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Tim Holy <[email protected]> and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
19 changes: 19 additions & 0 deletions TypedSyntax/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name = "TypedSyntax"
uuid = "d265eb64-f81a-44ad-a842-4247ee1503de"
authors = ["Tim Holy <[email protected]> and contributors"]
version = "0.1.0"

[deps]
CodeTracking = "da1fd8a2-8d9e-5ec2-8556-3022fb5608a2"
JuliaSyntax = "70703baa-626e-46a2-a12c-08ffd08c73b4"

[compat]
CodeTracking = "1"
JuliaSyntax = "0.3.2"
julia = "1"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test"]
133 changes: 133 additions & 0 deletions TypedSyntax/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# TypedSyntax

This package aims to map types, as determined via type-inference, back to the source code as written by the developer. It can be used to understand program behavior and identify causes of "type instability" (inference failures) without the need to read [intermediate representations](https://docs.julialang.org/en/v1/devdocs/ast/) of Julia code.

This package is built on [JuliaSyntax](https://github.com/JuliaLang/JuliaSyntax.jl) and extends it by attaching type annotations to the nodes of its syntax trees. Here's a demo:

```julia
julia> using TypedSyntax

julia> f(x, y, z) = x + y * z;

julia> node = TypedSyntaxNode(f, (Float64, Int, Float32))
timholy marked this conversation as resolved.
Show resolved Hide resolved
line:col│ tree │ type
1:1 │[=] │Float64
1:1 │ [call]
1:1 │ f
1:3 │ x │Float64
1:6 │ y │Int64
1:9 │ z │Float32
1:13 │ [call-i] │Float64
1:14 │ x │Float64
1:16 │ +
1:17 │ [call-i] │Float32
1:18 │ y │Int64
1:20 │ *
1:22 │ z │Float32
```

The right hand column is the new information added by `TypedSyntaxNode`, indicating the type assigned to each variable or function call.

You can also display this in a form closer to the original source code, but with type-annotations:

```julia
julia> printstyled(stdout, node; hide_type_stable=false)
f(x::Float64, y::Int64, z::Float32)::Float64 = (x::Float64 + (y::Int64 * z::Float32)::Float32)::Float64
```

`hide_type_stable=true` (which is the default) will suppress printing of concrete types, so you need to set it to `false` if you want to see all the types.

The default is aimed at identifying sources of "type instability" (poor inferrability):

```julia
julia> printstyled(stdout, TypedSyntaxNode(f, (Float64, Int, Real)))
```

which produces

<code>f(x, y, z::<b>Real</b>)::<b>Any</b> = (x + (y * z::<b>Real</b>)::<b>Any</b>)::<b>Any</b></code>

The boldfaced text above is typically printed in color in the REPL:

- red indicates non-concrete types
- yellow indicates a "small union" of concrete types. These usually pose no issues, unless there are too many combinations of such unions.

Printing with color can be suppressed with the keyword argument `iswarn=false`.

## Caveats

TypedSyntax aims for accuracy, but there are a number of factors that pose challenges.
First, anonymous and internal functions appear as part of the source text, but internally Julia handles these as separate type-inferred methods, and these are hidden from the annotator.
Therefore, in

```julia
julia> sumfirst(c) = sum(x -> first(x), c); # better to use `sum(first, c)` but this is just an illustration

julia> printstyled(stdout, TypedSyntaxNode(sumfirst, (Vector{Any},)))
sumfirst(c)::Any = sum(x -> first(x), c)::Any
```

`x` and `first(x)` both have type `Any`, but they are not annotated as such because they are hidden inside the anonymous function.

Second, this package works by attempting to "reconstruct history": starting from the type-inferred code, it tries to map calls back to the source. It would be much safer to instead keep track of the source during inference, but at present this is not possible (see [this Julia issue](https://github.com/JuliaLang/julia/issues/31162)). There are cases where this mapping fails: for example, with

```julia
julia> function summer(list)
s = 0
for x in list
s += x
end
return s
end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we need to remove this semicolon otherwise we can't construct TypedSyntaxNode:

julia> function summer(list)
           s = 0
           for x in list
               s += x
           end
           return s
       end;

julia> TypedSyntaxNode(summer, (Vector{Float64},))
ERROR: MethodError: no method matching iterate(::Nothing)

Closest candidates are:
  iterate(::Union{LinRange, StepRangeLen})
   @ Base range.jl:887
  iterate(::Union{LinRange, StepRangeLen}, ::Integer)
   @ Base range.jl:887
  iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}}
   @ Base dict.jl:716
  ...

Stacktrace:
 [1] indexed_iterate(I::Nothing, i::Int64)
   @ Base ./tuple.jl:93
 [2] JuliaSyntax.TreeNode{TypedSyntax.TypedSyntaxData}(f::Any, t::Any; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ TypedSyntax ~/julia/packages/Cthulhu/TypedSyntax/src/node.jl:20
 [3] JuliaSyntax.TreeNode{TypedSyntax.TypedSyntaxData}(f::Any, t::Any)
   @ TypedSyntax ~/julia/packages/Cthulhu/TypedSyntax/src/node.jl:18
 [4] top-level scope
   @ REPL[33]:1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update your CodeTracking

```
then (on Julia 1.9)
```
julia> tsn, mappings = TypedSyntax.tsn_and_mappings(summer, (Vector{Float64},));

julia> hcat(tsn.typedsource.code, mappings)
16×2 Matrix{Any}:
:(_4 = 0) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_2) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[list]
:(_3 = Base.iterate(%2)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[(= x list)]
:(_3 === nothing) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(Base.not_int(%4)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(goto %16 if not %5) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_3::Tuple{Float64, Int64}) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_5 = Core.getfield(%7, 1)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(Core.getfield(%7, 2)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_4 = _4 + _5) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[(+= s x)]
:(_3 = Base.iterate(%2, %9)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_3 === nothing) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(Base.not_int(%12)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(goto %16 if not %13) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(goto %7) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(return _4) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[s]
```
The left column contains the statements of the type-inferred code, the right column the mappings back to the source.
You can see that the majority of these mappings are empty, indicating either no good match or that there were multiple possible matches. This is because lowering changes the implementation so significantly that there are few calls that relate directly to the source.

Nevertheless, many statements in the source can be annotated:

```julia
julia> tsn
line:col│ tree │ type
1:1 │[function] │Union{Float64, Int64}
1:10 │ [call]
1:10 │ summer
1:17 │ list │Vector{Float64}
1:22 │ [block]
2:5 │ [=]
2:5 │ s
2:9 │ 0
3:5 │ [for]
3:8 │ [=] │Union{Nothing, Tuple{Float64, Int64}}
3:9 │ x
3:14 │ list │Vector{Float64}
3:18 │ [block]
4:9 │ [+=] │Float64
4:9 │ s │Float64
4:14 │ x │Float64
6:5 │ [return] │Union{Float64, Int64}
6:12 │ s │Union{Float64, Int64}
```
This is largely because just the named-variables provide considerable information.
Loading