[EXPERIMENT] Create a recursive descent version of KAS' parser #1746

kevinb-khan · 2024-10-12T20:35:55Z

Summary:

The reason for creating this alternate version of the KAS parse is that I want to create a parser that outputs a MathBlocks AST for my hackathon project. This project is heavily based on MathBlocks which is a suite of node modules I've developed as a side project to facilitate the development of interact math applications on the web. I thought that it would be easier to create the parser for my hackathon project if I first had a recursive descent parser (as opposed to the current JISON-based parser). It's much hard to see what's going on with the JISON parser because parts of code the make up the parser are in different files. Also, the JISON parser doesn't produce type-safe code.

In the future, we may want replace KAS' current parser with this one. I think it'll make maintenance a bit easier since it uses plain old TypeScript code which means that maintainers don't have to understand JISON. The parser is implemented as a recursive descent parser which is one of the simplest types of parsers to write. Because it uses TypeScript it's also typesafe which also makes it easier to work on. The parser itself is also fewer lines 577 (shipped) vs 814 (shipped) + 214 (dev-only).

If we do want to adopt this parser we'll likely also want to convert the unit parser to be a recursive descent parser as well. This would allow us to remove JISON from the code base completely (JISON is a dev dependency so this wouldn't have too much impact on bundle size). Additionally, converting nodes.js to TypeScript and all of the node types to be ES6 classes would also help with maintainability, but would not be a requirement for adopting the new parser(s).

Issue: None

TODO:

handle custom decimal separator

Test plan:

yarn test packages/kas/src

github-actions · 2024-10-12T20:39:12Z

npm Snapshot: Published

Good news!! We've packaged up the latest commit from this PR (21a716f) and published it to npm. You
can install it using the tag PR1746.

Example:

yarn add @khanacademy/perseus@PR1746

If you are working in Khan Academy's webapp, you can run:

./dev/tools/bump_perseus_version.sh -t PR1746

github-actions · 2024-10-12T20:39:20Z

Size Change: +14 B (0%)

Total Size: 867 kB

Filename	Size	Change
`packages/kas/dist/es/index.js`	38.8 kB	+14 B (+0.04%)

ℹ️ View Unchanged

Filename	Size
`packages/keypad-context/dist/es/index.js`	760 B
`packages/kmath/dist/es/index.js`	4.27 kB
`packages/math-input/dist/es/index.js`	78 kB
`packages/math-input/dist/es/strings.js`	1.79 kB
`packages/perseus-core/dist/es/index.js`	1.48 kB
`packages/perseus-editor/dist/es/index.js`	280 kB
`packages/perseus-linter/dist/es/index.js`	22.2 kB
`packages/perseus/dist/es/index.js`	420 kB
`packages/perseus/dist/es/strings.js`	3.4 kB
`packages/pure-markdown/dist/es/index.js`	3.66 kB
`packages/simple-markdown/dist/es/index.js`	12.4 kB

_{compressed-size-action}

jeremywiebe · 2024-10-15T23:07:17Z

packages/kas/src/lexer.ts

+type PreparedRule = [RegExp, Token | ((match: string) => Token)];
+
+const preparedRules: PreparedRule[] = rules.map((rule) => {
+    const regex = new RegExp("^(?:" + rule[0] + ")");


So each rule becomes a regex that's anchored to the beginning of the input string (^) and is non-capturing ((?:...)). Am I reading this correctly?

Yes, that's correct. It's a simplified version of https://github.com/zaach/jison-lex/blob/master/regexp-lexer.js#L10-L68 which is what jison uses internally for its lexer.

I don't think the non-capturing group part is really necessary since we only ever look at the whole string that's matched. 🤷‍♂️

FWIW: I think it's useful to mark them as non-capturing. It is a form of stating that the match is not used.

packages/kas/src/lexer.ts

packages/kas/src/parser.ts

…arser

…, add a comment with a brief description of how the parser works

…n is being consumed

khan-actions-bot · 2024-10-17T15:27:52Z

Gerald

Required Reviewers

@Khan/perseus for changes to packages/kas/src/lexer.ts, packages/kas/src/nodes.ts, packages/kas/src/parser.ts, packages/kas/src/__tests__/lexer.test.ts, packages/kas/src/__tests__/parser.test.ts, packages/kas/src/__tests__/__snapshots__/lexer.test.ts.snap

Don't want to be involved in this pull request? Comment #removeme and we won't notify you of further changes.

kevinb-khan · 2024-10-18T02:18:26Z

This PR resulted in a minimal (+14 B) change in the bundle size because the new parser wasn't being included in bundle.

jeremywiebe

I have a few thoughts. I've also asked @jaredly to take a look as I get out of my depth when we get to the parser and how it handles the token stream. :)

jeremywiebe · 2024-10-15T23:10:27Z

packages/kas/src/lexer.ts

+        throw new Error(`No match for ${input}`);
+    }
+
+    return [token, input.slice(currentMatch.length)];


This is likely extremely nitpicky, but slicing the input every time we match means we create quite a few strings while parsing. Would it be alot more complex to just maintain an index that marks where we've processed to?

I don't think it would that much additional complexity. I did it this way because that's what jison-lex does, see https://github.com/zaach/jison-lex/blob/master/regexp-lexer.js#L330. I'll try keeping track of the current index instead, but to do so I think restructuring the code to use a class for the lexer will help.

jeremywiebe · 2024-10-28T19:49:52Z

packages/kas/src/lexer.ts

+type PreparedRule = [RegExp, Token | ((match: string) => Token)];
+
+const preparedRules: PreparedRule[] = rules.map((rule) => {
+    const regex = new RegExp("^(?:" + rule[0] + ")");


FWIW: I think it's useful to mark them as non-capturing. It is a form of stating that the match is not used.

jeremywiebe · 2024-10-28T19:58:17Z

packages/kas/src/parser.ts

+    //
+    // This method can only be called with `tokenKind`s for tokens that
+    // have a `value` property.  See lexer.ts.
+    expectValue<Kind extends "FUNC" | "VAR" | "TRIG">(tokenKind: Kind): string {


Is there a way in TypeScript to model the last statement of the doc comment instead of saying the Kind extends "FUN" | "VAR" | "TRIG"? Those aren't the only tokens that carry a value (see CONST, SIGN, FLOAT, etc)

// This method can only be called with `tokenKind`s for tokens that // have a `value` property. See lexer.ts.

There's probably a fancy way to compute this type from the Token type. I'll give that a try.

jeremywiebe · 2024-10-28T19:59:59Z

packages/kas/src/parser.ts

+
+    equation(): Expr {
+        if (this.peek().kind === "EOF") {
+            return new Add([]);


I'm confused why we place an empty Add operation here when we see EOF.

I don't know why we do this either, but this is what was in the original code.

kevinb-khan self-assigned this Oct 12, 2024

jeremywiebe reviewed Oct 15, 2024

View reviewed changes

kevinb-khan commented Oct 15, 2024

View reviewed changes

packages/kas/src/lexer.ts Outdated Show resolved Hide resolved

packages/kas/src/parser.ts Show resolved Hide resolved

packages/kas/src/parser.ts Outdated Show resolved Hide resolved

packages/kas/src/parser.ts Outdated Show resolved Hide resolved

kevinb-khan requested a review from handeyeco October 16, 2024 11:54

kevinb-khan added 7 commits October 17, 2024 11:22

Create a recursive descent version of KAS' parser

e43eba7

update lexer snapshots

c1fcfc2

refactor trigfunc to try variants in the same order as the original p…

0c6d33f

…arser

remove unnecessary parse rules, add some helpers to clean up the code…

9330e81

…, add a comment with a brief description of how the parser works

Remove TODO from lexer.ts

26678fe

remove 'variable' from the grammar in the comments

d46b1e1

add a comment after each call to this.consume() indicating which toke…

b5abf08

…n is being consumed

kevinb-khan force-pushed the recursive-descent-kas branch from 4e016f1 to b5abf08 Compare October 17, 2024 15:22

fix type errors after rebasing

21a716f

kevinb-khan marked this pull request as ready for review October 17, 2024 15:27

khan-actions-bot requested a review from a team October 17, 2024 15:27

kevinb-khan marked this pull request as draft October 17, 2024 16:05

kevinb-khan mentioned this pull request Oct 18, 2024

Delete old jison-based parser #1763

Draft

1 task

jeremywiebe reviewed Oct 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EXPERIMENT] Create a recursive descent version of KAS' parser #1746

[EXPERIMENT] Create a recursive descent version of KAS' parser #1746

kevinb-khan commented Oct 12, 2024 •

edited

Loading

github-actions bot commented Oct 12, 2024 •

edited

Loading

github-actions bot commented Oct 12, 2024 •

edited

Loading

jeremywiebe Oct 15, 2024 •

edited

Loading

kevinb-khan Oct 15, 2024 •

edited

Loading

kevinb-khan Oct 15, 2024

jeremywiebe Oct 28, 2024

khan-actions-bot commented Oct 17, 2024

kevinb-khan commented Oct 18, 2024

jeremywiebe left a comment

jeremywiebe Oct 15, 2024

kevinb-khan Oct 29, 2024

jeremywiebe Oct 28, 2024

jeremywiebe Oct 28, 2024

kevinb-khan Oct 29, 2024

jeremywiebe Oct 28, 2024

kevinb-khan Oct 29, 2024

[EXPERIMENT] Create a recursive descent version of KAS' parser #1746

Are you sure you want to change the base?

[EXPERIMENT] Create a recursive descent version of KAS' parser #1746

Conversation

kevinb-khan commented Oct 12, 2024 • edited Loading

Summary:

Test plan:

github-actions bot commented Oct 12, 2024 • edited Loading

npm Snapshot: Published

github-actions bot commented Oct 12, 2024 • edited Loading

jeremywiebe Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

kevinb-khan Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khan-actions-bot commented Oct 17, 2024

Gerald

kevinb-khan commented Oct 18, 2024

jeremywiebe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinb-khan commented Oct 12, 2024 •

edited

Loading

github-actions bot commented Oct 12, 2024 •

edited

Loading

github-actions bot commented Oct 12, 2024 •

edited

Loading

jeremywiebe Oct 15, 2024 •

edited

Loading

kevinb-khan Oct 15, 2024 •

edited

Loading