Architecture
Цей контент ще не доступний вашою мовою.
This document covers some of the internals of Biome, and how they are used inside the project.
Scanner
Section titled “Scanner”Biome has a scanner that is responsible for crawling the file system to extract important metadata about projects. Specifically, there are three ways in which the scanner is used:
- To discover nested
biome.json
/biome.jsonc
files in monorepos. - To discover
.gitignore
files if thevcs.useIgnoreFile
setting is enabled. - To index all JavaScript/TypeScript files in a project if any rules from the project domain are enabled.
Scanner targeting
Section titled “Scanner targeting”If project rules are not enabled, the scanner automatically targets only the folders that are relevant for a given session.
This means that if you have a large monorepo, and you run biome check
from
inside the packages/foo/
folder, that folder will be “targeted”. This means
the following folders get scanned for configuration files and/or .gitignore
:
- The root folder of the repository.
- The
packages/
folder. - The
packages/foo/
folder. - Any folders that exist under
packages/foo/
, exceptnode_modules/
or those that are excluded by your configuration (see below).
Other folders that may be adjacent to either packages/
or packages/foo/
will
be automatically skipped.
Similarly, if you run biome format packages/bar/src/index.ts
from the root
of the repository, the scanner will target the packages/bar/src/
folder.
If project rules are enabled, these optimisations don’t apply.
Configuring the scanner
Section titled “Configuring the scanner”The scanner mostly respects the
files.includes
setting, but there
are some exceptions to be aware of:
- If the project domain or one of its rules is enabled,
.d.ts
files andpackage.json
manifests insidenode_modules
are still read, even if they’re ignored infiles.includes
. This is because they may be a valuable source of type information. - The scanner will not ignore individual files if they are in a non-ignored directory. This is because files that are co-located with regular (non-ignored) files are likely to be a valuable source of type information, and they are also more likely to form import cycles with other files.
This means that if you use files.includes
to ignore folders with generated
code, type information from such folders is inaccessible to the scanner. If you
do wish to have such type information available, we advise to not ignore these
folders through files.includes
and instead ignore them using tool-specific
settings such as linter.includes
and formatter.includes
only.
Finally, if you really insist on preventing the scanner from accessing a given
file or folder, you can use the
files.experimentalScannerIgnores
setting for that.
Parser and CST
Section titled “Parser and CST”The architecture of the parser is bumped by an internal fork of rowan, a library that implements the Green and Red tree pattern.
The CST (Concrete Syntax Tree) is a data structure very similar to an AST (Abstract Syntax Tree) that keeps track of all the information of a program, trivia included.
Trivia is represented by all that information that is important to a program to run:
- spaces
- tabs
- comments
Trivia is attached to a node. A node can have leading trivia and trailing trivia. If you read code from left to right, leading trivia appears before a keyword, and trialing trivia appears after a keyword.
Leading trivia and trailing trivia are categorized as follows:
- Every trivia up to the token/keyword (including line breaks) will be the leading trivia;
- Everything until the next linebreak (but not including it) will be the trailing trivia;
Given the following JavaScript snippet, // comment 1
is a trailing trivia of the token ;
, and // comment 2
is a leading trivia to the keyword const
. Below is a minimized version of the CST represented by Biome:
const a = "foo"; // comment 1// comment 2const b = "bar";
0: JS_MODULE@0..55 ... 1: SEMICOLON@15..27 ";" [] [Whitespace(" "), Comments("// comment 1")] 1: JS_VARIABLE_STATEMENT@27..55 ... 1: CONST_KW@27..45 "const" [Newline("\n"), Comments("// comment 2"), Newline("\n")] [Whitespace(" ")] 3: EOF@55..55 "" [] []
The CST is never directly accessible by design; a developer can read its information using the Red tree, using a number of APIs that are autogenerated from the grammar of the language.
Resilient and recoverable parser
Section titled “Resilient and recoverable parser”In order to construct a CST, a parser needs to be error-resilient and recoverable:
- resilient: a parser that is able to resume parsing after encountering syntax errors that belong to the language;
- recoverable: a parser that is able to understand where an error occurred and being able to resume the parsing by creating correct information;
The recoverable part of the parser is not a science, and no rules are set in stone. This means that depending on what the parser was parsing and where an error occurred, the parser might be able to recover itself in an expected way.
The parser also uses’ Bogus’ nodes to protect the consumers from consuming incorrect syntax. These nodes are used to decorate the broken code caused by a syntax error.
In the following example, the parentheses in the while
are missing, although the parser can recover itself in a good manner and can represent the code with a decent CST. The parenthesis and condition of the loop are marked as missing, and the code block is correctly parsed:
while {}
JsModule { interpreter_token: missing (optional), directives: JsDirectiveList [], items: JsModuleItemList [ JsWhileStatement { while_token: WHILE_KW@0..6 "while" [] [Whitespace(" ")], l_paren_token: missing (required), test: missing (required), r_paren_token: missing (required), body: JsBlockStatement { l_curly_token: L_CURLY@6..7 "{" [] [], statements: JsStatementList [], r_curly_token: R_CURLY@7..8 "}" [] [], }, }, ], eof_token: EOF@8..8 "" [] [],}
This is an error emitted during parsing:
main.tsx:1:7 parse ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✖ expected `(` but instead found `{`
> 1 │ while {} │ ^
ℹ Remove {
The same can’t be said for the following snippet. The parser can’t properly understand the syntax during the recovery phase, so it needs to rely on the bogus nodes to mark some syntax as erroneous. Notice the JsBogusStatement
:
function}
JsModule { interpreter_token: missing (optional), directives: JsDirectiveList [], items: JsModuleItemList [ TsDeclareFunctionDeclaration { async_token: missing (optional), function_token: FUNCTION_KW@0..8 "function" [] [], id: missing (required), type_parameters: missing (optional), parameters: missing (required), return_type_annotation: missing (optional), semicolon_token: missing (optional), }, JsBogusStatement { items: [ R_CURLY@8..9 "}" [] [], ], }, ], eof_token: EOF@9..9 "" [] [],}
This is the error we get from the parsing phase:
main.tsx:1:9 parse ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✖ expected a name for the function in a function declaration, but found none
> 1 │ function} │ ^
Formatter
Section titled “Formatter”Linter
Section titled “Linter”Daemon
Section titled “Daemon”Biome uses a server-client architecture to run its tasks.
A daemon is a long-running server that Biome spawns in the background and uses to process requests from the editor and CLI.