This document covers some of the internals of Biome, and how they are used inside the project.
Parser and CST
The CST (Concrete Syntax Tree) is data structure very similar to AST (Abstract Syntax Tree) that keeps track of all the information of a program, trivia included.
Trivia are represented by all that information that are important to a program to run:
Trivia are attached to a node. A node can have leading trivia and trailing trivia. If you read code from left-to-right, leading trivia appear before a keyword, and trialing trivia appear after a keyword.
Leading trivia and trailing trivia are categorized as follows:
- Every trivia up to the token/keyword (including line breaks) will be the leading trivia;
- Everything until the next linebreak (but not including it) will be the trailing trivia;
// comment 1 is a trailing trivia of the token
// comment 2 is a leading trivia to the keyword
const. Below a minimized version of the CST represented by Biome: :
The CST is never directly accessible by design, a developer can read its information using the Red tree, using a number of APIs that are autogenerated from the grammar of the language.
Resilient and recoverable parser
In order to construct a CST, a parser needs to be error resilient and recoverable:
- resilient: a parser that is able to resume parsing after encountering syntax error that belong to the language;
- recoverable: a parser that is able to understand where an error occurred, and being able to resume the parsing by creating correct information;
The recoverable part of the parser is not a science, and there aren’t any rules set on stone. This means that depending on what the parser was parsing and where an error occurred, the parser might be able to recover itself in an expected ways.
To protect the consumers from consuming incorrect syntax, the parser also uses
Bogus nodes. These nodes are used decorate the broken code caused by an syntax error.
In the following example, the parenthesis in the
while are missing, although parser is able to recover itself in a good manner, and it’s able to represent the code with a decent CST. The parenthesis and condition of the loop are marked as missing, and the code block is correctly parsed:
This is error emitted during parsing:
The same can’t be said for the following snippet. The parser can’t properly understand the syntax during the recovery phase, so it needs to rely on the bogus nodes to mark some syntax as erroneous. Notice the
This is the error we get from the parsing phase:
Biome uses a server-client architecture to run its tasks.
A daemon is a long-running server that Biome spawns in the background and uses to process requests from the editor and CLI.