-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve type safety of std.zig.Ast #22397
base: master
Are you sure you want to change the base?
Conversation
d842369
to
80c0304
Compare
Did you try having an fn offset(tok: TokenIndex, off: i32) TokenIndex {
const new_tok = @as(i64, @intFromEnum(tok)) + off;
return @enumFromInt(new_tok);
} Regardless: this PR seems like a fantastic change. My main concern just from the PR description is that final performance data point -- strictly speaking, this change shouldn't impact performance at all. Do you have any idea why the last performance number is consistently so bad? |
Kind of. I had
Here are some of the possibles reasons:
Removing the |
If I remember correctly, wasn't null for optional indexes represented as zero? The PR writeup shows |
|
All of these points seem like a perfectly valid use case for an |
It might be worth seeing whether marking functions However, I don't see a 10% regression in Debug performance for an already fairly fast subcommand as a huge deal, so this PR is fine as-is if you don't feel like looking into that. |
80c0304
to
cb91fda
Compare
Unfortunately, marking the functions as
|
2b23652
to
a8d3767
Compare
a8d3767
to
baffeb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this.
The types aren't quite right though. To avoid 2 breakages rather than 1 let's get these right.
lhs: Item, | ||
rhs: Item, | ||
|
||
/// The active field is determined by looking at the node tag. | ||
pub const Item = union { | ||
node: Index, | ||
opt_node: OptionalIndex, | ||
token: TokenIndex, | ||
opt_token: OptionalTokenIndex, | ||
extra_index: ExtraIndex, | ||
@"for": For, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead, Data should be the untagged union.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exactly would you want that to look like? There are two possibilities I came up with:
Approach 1
This is what I initial experimented with until quickly realizing that the second solution is probably the better one.
Here I changed Data
to encode every possible type combination of lhs
and rhs
. It's not really viable to give each field a descriptive name that align well with all the possible tags. That would require me that create around 50 fields for all the unique encodings at which point I might as well have one field for every tag like in the second solution.
pub const Data = union {
node: Index,
token: TokenIndex,
node_and_node: struct { Index, Index },
opt_node_and_opt_node: struct { OptionalIndex, OptionalIndex },
node_and_opt_node: struct { Index, OptionalIndex },
opt_node_and_node: struct { Index, Index },
node_and_extra: struct { Index, ExtraIndex },
extra_and_node: struct { ExtraIndex, Index },
extra_and_opt_node: struct { ExtraIndex, OptionalIndex },
node_and_token: struct { Index, TokenIndex },
token_and_node: struct { Index, TokenIndex },
token_and_token: struct { TokenIndex, TokenIndex },
opt_token_and_opt_token: struct { OptionalTokenIndex, OptionalTokenIndex },
opt_index_and_token: struct { OptionalIndex, TokenIndex },
@"for": struct { Index, For },
extra_range: struct { ExtraIndex, ExtraIndex },
};
Approach 2
In this approach, the tag and data field form a tagged union where the union tag and union data are stored separately. this is like std.MultiArrayList(Unwrapped)
except that there also is the main_token
field. Problem is going to be the code generation, See #22784
The reason why I used an extern unions was to avoid Data
from getting a safety tag (See #22785). If this approach is chosen, I will also have to look into giving most field types some named instead of using anonymous structs everywhere.
pub const NodeList = std.MultiArrayList(Node);
pub const Node = struct {
tag: Tag,
main_token: TokenIndex,
data: Data,
pub const Unwrapped = union(enum) {
root: @compileError("TODO"),
@"usingnamespace": Index,
test_decl: packed struct { name_token: OptionalTokenIndex, block: Index },
global_var_decl: packed struct { extra: ExtraIndex, init: OptionalIndex },
local_var_decl: packed struct { extra: ExtraIndex, init: OptionalIndex },
simple_var_decl: packed struct { ty: OptionalIndex, init: OptionalIndex },
aligned_var_decl: packed struct { alignment: Index, init: OptionalIndex },
@"errdefer": packed struct { payload: OptionalTokenIndex, expr: Index },
@"defer": Index,
@"catch": packed struct { lhs: Index, rhs: Index },
field_access: packed struct { lhs: Index, field_name: TokenIndex },
unwrap_optional: packed struct { lhs: Index, question_mark: TokenIndex },
equal_equal: packed struct { lhs: Index, rhs: Index },
bang_equal: packed struct { lhs: Index, rhs: Index },
less_than: packed struct { lhs: Index, rhs: Index },
greater_than: packed struct { lhs: Index, rhs: Index },
less_or_equal: packed struct { lhs: Index, rhs: Index },
greater_or_equal: packed struct { lhs: Index, rhs: Index },
assign_mul: packed struct { lhs: Index, rhs: Index },
assign_div: packed struct { lhs: Index, rhs: Index },
assign_mod: packed struct { lhs: Index, rhs: Index },
assign_add: packed struct { lhs: Index, rhs: Index },
assign_sub: packed struct { lhs: Index, rhs: Index },
assign_shl: packed struct { lhs: Index, rhs: Index },
assign_shl_sat: packed struct { lhs: Index, rhs: Index },
assign_shr: packed struct { lhs: Index, rhs: Index },
assign_bit_and: packed struct { lhs: Index, rhs: Index },
assign_bit_xor: packed struct { lhs: Index, rhs: Index },
assign_bit_or: packed struct { lhs: Index, rhs: Index },
assign_mul_wrap: packed struct { lhs: Index, rhs: Index },
assign_add_wrap: packed struct { lhs: Index, rhs: Index },
assign_sub_wrap: packed struct { lhs: Index, rhs: Index },
assign_mul_sat: packed struct { lhs: Index, rhs: Index },
assign_add_sat: packed struct { lhs: Index, rhs: Index },
assign_sub_sat: packed struct { lhs: Index, rhs: Index },
assign: packed struct { lhs: Index, rhs: Index },
assign_destructure: packed struct { extra: ExtraIndex, init: OptionalIndex },
merge_error_sets: packed struct { lhs: Index, rhs: Index },
mul: packed struct { lhs: Index, rhs: Index },
div: packed struct { lhs: Index, rhs: Index },
mod: packed struct { lhs: Index, rhs: Index },
array_mult: packed struct { lhs: Index, rhs: Index },
mul_wrap: packed struct { lhs: Index, rhs: Index },
mul_sat: packed struct { lhs: Index, rhs: Index },
add: packed struct { lhs: Index, rhs: Index },
sub: packed struct { lhs: Index, rhs: Index },
array_cat: packed struct { lhs: Index, rhs: Index },
add_wrap: packed struct { lhs: Index, rhs: Index },
sub_wrap: packed struct { lhs: Index, rhs: Index },
add_sat: packed struct { lhs: Index, rhs: Index },
sub_sat: packed struct { lhs: Index, rhs: Index },
shl: packed struct { lhs: Index, rhs: Index },
shl_sat: packed struct { lhs: Index, rhs: Index },
shr: packed struct { lhs: Index, rhs: Index },
bit_and: packed struct { lhs: Index, rhs: Index },
bit_xor: packed struct { lhs: Index, rhs: Index },
bit_or: packed struct { lhs: Index, rhs: Index },
@"orelse": packed struct { lhs: Index, rhs: Index },
bool_and: packed struct { lhs: Index, rhs: Index },
bool_or: packed struct { lhs: Index, rhs: Index },
bool_not: Index,
negation: Index,
bit_not: Index,
negation_wrap: Index,
address_of: Index,
@"try": Index,
@"await": Index,
optional_type: Index,
array_type: packed struct { lhs: Index, rhs: Index },
array_type_sentinel,
ptr_type_aligned: packed struct { alignment: OptionalIndex, elem_type: Index },
ptr_type_sentinel: packed struct { sentinel: OptionalIndex, elem_type: Index },
ptr_type: packed struct { extra: ExtraIndex, elem_type: Index },
ptr_type_bit_range: packed struct { extra: ExtraIndex, elem_type: Index },
slice_open: packed struct { lhs: Index, rhs: Index },
slice: packed struct { lhs: Index, extra: ExtraIndex },
slice_sentinel: packed struct { lhs: Index, extra: ExtraIndex },
deref: Index,
array_access: packed struct { lhs: Index, rhs: Index },
array_init_one: packed struct { lhs: Index, rhs: Index },
array_init_one_comma: packed struct { lhs: Index, rhs: Index },
array_init_dot_two: [2]OptionalIndex,
array_init_dot_two_comma: [2]OptionalIndex,
array_init_dot: packed struct { start: ExtraIndex, end: ExtraIndex },
array_init_dot_comma: packed struct { start: ExtraIndex, end: ExtraIndex },
array_init: packed struct { lhs: Index, extra: ExtraIndex },
array_init_comma: packed struct { lhs: Index, extra: ExtraIndex },
struct_init_one: packed struct { lhs: Index, first_field_init: OptionalIndex },
struct_init_one_comma: packed struct { lhs: Index, first_field_init: OptionalIndex },
struct_init_dot_two: [2]OptionalIndex,
struct_init_dot_two_comma: [2]OptionalIndex,
struct_init_dot: packed struct { start: ExtraIndex, end: ExtraIndex },
struct_init_dot_comma: packed struct { start: ExtraIndex, end: ExtraIndex },
struct_init: packed struct { lhs: Index, extra: ExtraIndex },
struct_init_comma: packed struct { lhs: Index, extra: ExtraIndex },
call_one: packed struct { lhs: Index, first_arg: OptionalIndex },
call_one_comma: packed struct { lhs: Index, first_arg: OptionalIndex },
async_call_one: packed struct { lhs: Index, first_arg: OptionalIndex },
async_call_one_comma: packed struct { lhs: Index, first_arg: OptionalIndex },
call: packed struct { lhs: Index, extra: ExtraIndex },
call_comma: packed struct { lhs: Index, extra: ExtraIndex },
async_call: packed struct { lhs: Index, extra: ExtraIndex },
async_call_comma: packed struct { lhs: Index, extra: ExtraIndex },
@"switch": packed struct { lhs: Index, extra: ExtraIndex },
switch_comma: packed struct { lhs: Index, extra: ExtraIndex },
switch_case_one: packed struct { lhs: Index, rhs: OptionalIndex },
switch_case_inline_one: packed struct { lhs: Index, rhs: OptionalIndex },
switch_case: packed struct { extra: ExtraIndex, rhs: Index },
switch_case_inline: packed struct { extra: ExtraIndex, rhs: Index },
switch_range: packed struct { lhs: Index, rhs: Index },
while_simple: packed struct { lhs: Index, rhs: Index },
while_cont: packed struct { lhs: Index, extra: ExtraIndex },
@"while": packed struct { lhs: Index, rhs: Index },
for_simple: packed struct { lhs: Index, rhs: Index },
@"for": packed struct { lhs: ExtraIndex, data: For },
for_range: packed struct { lhs: Index, rhs: Index },
if_simple: packed struct { lhs: Index, rhs: Index },
@"if": packed struct { lhs: Index, rhs: Index },
@"suspend": Index,
@"resume": Index,
@"continue": packed struct { label: OptionalTokenIndex, expr: OptionalIndex },
@"break": packed struct { label: OptionalTokenIndex, expr: OptionalIndex },
@"return": Node.OptionalIndex,
fn_proto_simple: packed struct { first_param_ty: OptionalIndex, return_type: OptionalIndex },
fn_proto_multi: packed struct { extra: ExtraIndex, return_type: OptionalIndex },
fn_proto_one: packed struct { extra: ExtraIndex, return_type: OptionalIndex },
fn_proto: packed struct { extra: ExtraIndex, return_type: OptionalIndex },
fn_decl: packed struct { fn_prot: Index, body: Index },
anyframe_type: packed struct { right_arrow: TokenIndex, elem_ty: Index },
anyframe_literal,
char_literal,
number_literal,
unreachable_literal,
identifier,
enum_literal: TokenIndex,
string_literal,
multiline_string_literal: packed struct { start: TokenIndex, end: TokenIndex },
grouped_expression: packed struct { expr: Index, r_paren: TokenIndex },
builtin_call_two: [2]OptionalIndex,
builtin_call_two_comma: [2]OptionalIndex,
builtin_call: packed struct { start: ExtraIndex, end: ExtraIndex },
builtin_call_comma: packed struct { start: ExtraIndex, end: ExtraIndex },
error_set_decl: TokenIndex,
container_decl: packed struct { start: ExtraIndex, end: ExtraIndex },
container_decl_trailing: packed struct { start: ExtraIndex, end: ExtraIndex },
container_decl_two: [2]OptionalIndex,
container_decl_two_trailing: [2]OptionalIndex,
container_decl_arg: packed struct { lhs: Index, extra: ExtraIndex },
container_decl_arg_trailing: packed struct { lhs: Index, extra: ExtraIndex },
tagged_union: packed struct { start: ExtraIndex, end: ExtraIndex },
tagged_union_trailing: packed struct { start: ExtraIndex, end: ExtraIndex },
tagged_union_two: [2]OptionalIndex,
tagged_union_two_trailing: [2]OptionalIndex,
tagged_union_enum_tag: packed struct { lhs: Index, extra: ExtraIndex },
tagged_union_enum_tag_trailing: packed struct { lhs: Index, extra: ExtraIndex },
container_field_init: packed struct { ty: Index, default_init: OptionalIndex },
container_field_align: packed struct { ty: Index, alignment: Index },
container_field: packed struct { ty: Index, extra: ExtraIndex },
@"comptime": Index,
@"nosuspend": Index,
block_two: [2]Node.OptionalIndex,
block_two_semicolon: [2]Node.OptionalIndex,
block: packed struct { start: ExtraIndex, end: ExtraIndex },
block_semicolon: packed struct { start: ExtraIndex, end: ExtraIndex },
asm_simple: packed struct { lhs: Index, r_paren: TokenIndex },
@"asm": packed struct { lhs: Index, extra: ExtraIndex },
asm_output: packed struct { lhs: OptionalIndex, r_paren: TokenIndex },
asm_input: packed struct { lhs: Index, r_paren: TokenIndex },
error_value: packed struct { error_token: OptionalTokenIndex, name: OptionalTokenIndex },
error_union: packed struct { lhs: Index, rhs: Index },
};
const Tag = std.meta.Tag(Unwrapped);
const Data = @Type(.{
.@"union" = .{
.layout = .@"extern", // avoid storing the safety tag twice
.tag_type = null,
.fields = std.meta.fields(Unwrapped),
.decls = &.{},
},
});
};
d92a074
to
0330052
Compare
This function checks for various possibilities that are never produced by the parser. Given that lastToken is unsafe to call on an Ast with errors, I also removed code paths that would be reachable on an Ast with errors.
AstGen doesn't seem to report enough information to figure out if a func decl has a calling convention ast node. A calling convention could be present without it by marking the function as inline for example.
This commits adds the following distinct integer types to std.zig.Ast: - OptionalTokenIndex - TokenOffset - OptionalTokenOffset - Node.OptionalIndex - Node.Offset - Node.OptionalOffset The `Node.Index` type has also been converted to a distinct type while TokenIndex remains unchanged. `Ast.Node.Data.Item` has also been changed to a (untagged) union to provide safety checks.
The existing comment are incomplete, outdated and sometimes incorrect.
0330052
to
7657993
Compare
This PR introduces named/distinct integer types to
std.zig.Ast
, improving type safety and aligning it more closely with other compiler data structures like ZIR.At it's core, the changes can be summarized by the following:
Before
After
I initial changed
TokenIndex
to a named/distinct integer type as well but it made dealing with them unergonomic. This was especially noticable while portinglib/std/zig/render.zig
because it frequently relied on relative token indicies.The offset types (
Node.Offset
,TokenOffset
, etc.) have been added for ZIR because it stores relative ast indicies as opposed to absolute indicies.zig ast-check src/Sema.zig
(ReleaseFast)zig fmt --check src/Sema.zig
(ReleaseFast)zig ast-check src/Sema.zig
(Debug)zig fmt --check src/Sema.zig
(Debug)