Skip to content

Commit

Permalink
Add TypedSyntax package (#353)
Browse files Browse the repository at this point in the history
This package attempts to map statements in type-inferred code back
to the source code as written by the programmer.

The intention is to use this in Cthulhu to present the results of
inference in an easier-to-digest form. There are, of course, potential
additional applications of this source-mapping, which is why it is
developed as a semi-independent package.

Co-authored-by: Shuhei Kadowaki <[email protected]>
  • Loading branch information
timholy and aviatesk authored Mar 2, 2023
1 parent 70a600f commit ed51dac
Show file tree
Hide file tree
Showing 8 changed files with 1,100 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ jobs:
with:
check_bounds: 'auto'
coverage: 'false'
- name: TypedSyntax # run the tests of TypedSyntax (a subdir package)
if: ${{ matrix.os == 'ubuntu-latest' }}
run: julia --project=TypedSyntax -e 'using Pkg; Pkg.test(coverage=true)'
# - name: Coverage off # `empty_func` test doesn't work as intended with `coverage=true`
# if: ${{ matrix.os == 'ubuntu-latest' }}
# run: julia --project -e 'using Pkg; Pkg.test("Cthulhu"; coverage=false)'
Expand Down
21 changes: 21 additions & 0 deletions TypedSyntax/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Tim Holy <[email protected]> and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
19 changes: 19 additions & 0 deletions TypedSyntax/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name = "TypedSyntax"
uuid = "d265eb64-f81a-44ad-a842-4247ee1503de"
authors = ["Tim Holy <[email protected]> and contributors"]
version = "1.0.0"

[deps]
CodeTracking = "da1fd8a2-8d9e-5ec2-8556-3022fb5608a2"
JuliaSyntax = "70703baa-626e-46a2-a12c-08ffd08c73b4"

[compat]
CodeTracking = "1"
JuliaSyntax = "0.3.2"
julia = "1"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test"]
133 changes: 133 additions & 0 deletions TypedSyntax/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# TypedSyntax

This package aims to map types, as determined via type-inference, back to the source code as written by the developer. It can be used to understand program behavior and identify causes of "type instability" (inference failures) without the need to read [intermediate representations](https://docs.julialang.org/en/v1/devdocs/ast/) of Julia code.

This package is built on [JuliaSyntax](https://github.com/JuliaLang/JuliaSyntax.jl) and extends it by attaching type annotations to the nodes of its syntax trees. Here's a demo:

```julia
julia> using TypedSyntax

julia> f(x, y, z) = x + y * z;

julia> node = TypedSyntaxNode(f, (Float64, Int, Float32))
line:col│ tree │ type
1:1 │[=] │Float64
1:1 │ [call]
1:1 │ f
1:3 │ x │Float64
1:6 │ y │Int64
1:9 │ z │Float32
1:13 │ [call-i] │Float64
1:14 │ x │Float64
1:16+
1:17 │ [call-i] │Float32
1:18 │ y │Int64
1:20*
1:22 │ z │Float32
```

The right hand column is the new information added by `TypedSyntaxNode`, indicating the type assigned to each variable or function call.

You can also display this in a form closer to the original source code, but with type-annotations:

```julia
julia> printstyled(stdout, node; hide_type_stable=false)
f(x::Float64, y::Int64, z::Float32)::Float64 = (x::Float64 + (y::Int64 * z::Float32)::Float32)::Float64
```

`hide_type_stable=true` (which is the default) will suppress printing of concrete types, so you need to set it to `false` if you want to see all the types.

The default is aimed at identifying sources of "type instability" (poor inferrability):

```julia
julia> printstyled(stdout, TypedSyntaxNode(f, (Float64, Int, Real)))
```

which produces

<code>f(x, y, z::<b>Real</b>)::<b>Any</b> = (x + (y * z::<b>Real</b>)::<b>Any</b>)::<b>Any</b></code>

The boldfaced text above is typically printed in color in the REPL:

- red indicates non-concrete types
- yellow indicates a "small union" of concrete types. These usually pose no issues, unless there are too many combinations of such unions.

Printing with color can be suppressed with the keyword argument `iswarn=false`.

## Caveats

TypedSyntax aims for accuracy, but there are a number of factors that pose challenges.
First, anonymous and internal functions appear as part of the source text, but internally Julia handles these as separate type-inferred methods, and these are hidden from the annotator.
Therefore, in

```julia
julia> sumfirst(c) = sum(x -> first(x), c); # better to use `sum(first, c)` but this is just an illustration

julia> printstyled(stdout, TypedSyntaxNode(sumfirst, (Vector{Any},)))
sumfirst(c)::Any = sum(x -> first(x), c)::Any
```

`x` and `first(x)` both have type `Any`, but they are not annotated as such because they are hidden inside the anonymous function.

Second, this package works by attempting to "reconstruct history": starting from the type-inferred code, it tries to map calls back to the source. It would be much safer to instead keep track of the source during inference, but at present this is not possible (see [this Julia issue](https://github.com/JuliaLang/julia/issues/31162)). There are cases where this mapping fails: for example, with

```julia
julia> function summer(list)
s = 0
for x in list
s += x
end
return s
end;
```
then (on Julia 1.9)
```julia
julia> tsn, mappings = TypedSyntax.tsn_and_mappings(summer, (Vector{Float64},));

julia> hcat(tsn.typedsource.code, mappings)
16×2 Matrix{Any}:
:(_4 = 0) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_2) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[list]
:(_3 = Base.iterate(%2)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[(= x list)]
:(_3 === nothing) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(Base.not_int(%4)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(goto %16 if not %5) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_3::Tuple{Float64, Int64}) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_5 = Core.getfield(%7, 1)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(Core.getfield(%7, 2)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_4 = _4 + _5) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[(+= s x)]
:(_3 = Base.iterate(%2, %9)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(_3 === nothing) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(Base.not_int(%12)) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(goto %16 if not %13) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(goto %7) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[]
:(return _4) Union{TreeNode{SyntaxData}, TreeNode{TypedSyntaxData}}[s]
```
The left column contains the statements of the type-inferred code, the right column the mappings back to the source.
You can see that the majority of these mappings are empty, indicating either no good match or that there were multiple possible matches. This is because lowering changes the implementation so significantly that there are few calls that relate directly to the source.

Nevertheless, many statements in the source can be annotated:

```julia
julia> tsn
line:col│ tree │ type
1:1 │[function] │Union{Float64, Int64}
1:10 │ [call]
1:10 │ summer
1:17 │ list │Vector{Float64}
1:22 │ [block]
2:5 │ [=]
2:5 │ s
2:90
3:5 │ [for]
3:8 │ [=] │Union{Nothing, Tuple{Float64, Int64}}
3:9 │ x
3:14 │ list │Vector{Float64}
3:18 │ [block]
4:9 │ [+=] │Float64
4:9 │ s │Float64
4:14 │ x │Float64
6:5 │ [return] │Union{Float64, Int64}
6:12 │ s │Union{Float64, Int64}
```
This is largely because just the named-variables provide considerable information.
16 changes: 16 additions & 0 deletions TypedSyntax/src/TypedSyntax.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
module TypedSyntax

using Core: CodeInfo, MethodInstance, SlotNumber, SSAValue
using Core.Compiler: TypedSlot
using JuliaSyntax: JuliaSyntax, TreeNode, AbstractSyntaxData, SyntaxData, SyntaxNode, GreenNode, AbstractSyntaxNode, SyntaxHead, SourceFile,
head, kind, child, children, haschildren, untokenize, first_byte, last_byte, source_line, source_location,
sourcetext, @K_str, @KSet_str, is_infix_op_call, is_prefix_op_call, is_prec_assignment, is_operator, is_literal
using Base.Meta: isexpr
using CodeTracking

export TypedSyntaxNode

include("node.jl")
include("show.jl")

end
Loading

2 comments on commit ed51dac

@timholy
Copy link
Member Author

@timholy timholy commented on ed51dac Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register subdir=TypedSyntax

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/78815

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a TypedSyntax-v1.0.0 -m "<description of version>" ed51dac1e08723aa5afbc2ab7987270e0d45232d
git push origin TypedSyntax-v1.0.0

Please sign in to comment.