Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rewrite the scanner in C #54

Merged
merged 3 commits into from
Jun 25, 2023
Merged

Conversation

amaanq
Copy link
Contributor

@amaanq amaanq commented Jun 20, 2023

Summary

I reewrote the scanner in C for portability & support for projects using a pure C toolchain.

Requirements (place an x in each [ ])

  • I've added tests for any new code and ran npm run test-corpus to make sure all tests pass.
  • I've read and understood the Contributing Guidelines and have done my best effort to follow them.
  • I've read and agree to the Code of Conduct.

@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @amaanq to sign the Salesforce Inc. Contributor License Agreement.

@amaanq amaanq force-pushed the rewrite-it-in-c branch 2 times, most recently from 8503dc7 to 86f679d Compare June 20, 2023 22:05
Copy link
Collaborator

@4e554c4c 4e554c4c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't gone through everything yet, but I thought a few comments would be helpful.

src/scanner.c Outdated Show resolved Hide resolved
src/scanner.c Outdated Show resolved Hide resolved
src/scanner.c Outdated Show resolved Hide resolved
src/scanner.c Show resolved Hide resolved
@amaanq amaanq force-pushed the rewrite-it-in-c branch 2 times, most recently from d69b8c3 to 934e983 Compare June 20, 2023 22:53
src/scanner.c Outdated Show resolved Hide resolved
src/scanner.c Outdated Show resolved Hide resolved
@4e554c4c
Copy link
Collaborator

let me work on testing

@amaanq
Copy link
Contributor Author

amaanq commented Jun 21, 2023

You can fuzz the parser with this script:

#!/usr/bin/env sh

set -eu

ROOT_DIR="fuzzer"

# XXX: ensure shift below is consistent with number of args here
LANG=$1
TIMEOUT=$2
MAX_TOTAL_TIME=$3
CPP=$4

shift 4

# if scanner = scanner.cc then XFLAG = c++ else XFLAG = c
if [ "$CPP" = "cpp" ]; then
	COMPILER="clang++"
	SCANNER="scanner.cc"
	XFLAG="c++"
else
	COMPILER="clang"
	SCANNER="scanner.c"
	XFLAG="c"
fi

export CFLAGS="$(pkg-config --cflags --libs tree-sitter) -O0 -g -w"

JQ_FILTER='.. | if .type? == "STRING" or (.type? == "ALIAS" and .named? == false) then .value else null end'

build_dict() {
	jq "$JQ_FILTER" <src/grammar.json |
		grep -v "\\\\" | grep -v null |
		iconv -c -f UTF-8 -t ASCII//TRANSLIT |
		awk '!/^""$/' >"$ROOT_DIR/dict"
}

build_fuzzer() {
	cat <<END | $COMPILER -fsanitize=fuzzer,address,undefined $CFLAGS -lstdc++ -g -x $XFLAG - src/$SCANNER src/parser.c $@ -o $ROOT_DIR/fuzzer
#include <stdio.h>
#include <stdlib.h>
#include <tree_sitter/api.h>

#ifdef __cplusplus
extern "C"
#endif
TSLanguage *tree_sitter_$LANG();

#ifdef __cplusplus
extern "C"
#endif
int LLVMFuzzerTestOneInput(const uint8_t * data, const size_t len) {
  // Create a parser.
  TSParser *parser = ts_parser_new();

  // Set the parser's language.
  ts_parser_set_language(parser, tree_sitter_$LANG());

  // Build a syntax tree based on source code stored in a string.
  TSTree *tree = ts_parser_parse_string(
    parser,
    NULL,
    (const char *)data,
    len
  );
  // Free all of the heap-allocated memory.
  ts_tree_delete(tree);
  ts_parser_delete(parser);
  return 0;
}
END
}

generate_fuzzer() {
	tree-sitter generate
}

makedirs() {
	rm -rf "$ROOT_DIR"
	mkdir -p "$ROOT_DIR"
	mkdir -p "$ROOT_DIR/out"
}

makedirs
generate_fuzzer

build_dict
build_fuzzer $@
cd "$ROOT_DIR"
./fuzzer -dict=dict -timeout=$TIMEOUT -max_total_time=$MAX_TOTAL_TIME out/

and run ./fuzz.sh hack 1 10 c

@amaanq amaanq force-pushed the rewrite-it-in-c branch from 934e983 to 4e195ed Compare June 21, 2023 16:58
@amaanq amaanq force-pushed the rewrite-it-in-c branch from 4e195ed to 8612f79 Compare June 24, 2023 00:52
@4e554c4c 4e554c4c self-assigned this Jun 24, 2023
4e554c4c
4e554c4c previously approved these changes Jun 24, 2023
Copy link
Collaborator

@4e554c4c 4e554c4c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! thanks :)

@4e554c4c
Copy link
Collaborator

CI says that you need to run npm build

@amaanq
Copy link
Contributor Author

amaanq commented Jun 24, 2023

Not sure I follow - nothing of significance changed with npm run build, but I did update the generated parser (which bumped the abi from 13 to 14)

@4e554c4c
Copy link
Collaborator

good catch, this is probably failing due to a stale previous commit. should be good now.

@4e554c4c 4e554c4c merged commit b1c41e4 into slackhq:main Jun 25, 2023
@mjraha592
Copy link

Looks great! thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants