This is a rather large refactor that moves most of the code for loading, fetching, and building grammars into a new helix-loader module. This works well with the [[grammars]] syntax for languages.toml defined earlier: we only have to depend on the types for GrammarConfiguration in helix-loader and can leave all the [[language]] entries for helix-core.
5.1 KiB
Adding languages
Language configuration
To add a new language, you need to add a language
entry to the
languages.toml
found in the root of the repository;
this languages.toml
file is included at compilation time, and is
distinct from the languages.toml
file in the user's configuration
directory.
[[language]]
name = "mylang"
scope = "scope.mylang"
injection-regex = "^mylang$"
file-types = ["mylang", "myl"]
comment-token = "#"
indent = { tab-width = 2, unit = " " }
These are the available keys and descriptions for the file.
Key | Description |
---|---|
name |
The name of the language |
scope |
A string like source.js that identifies the language. Currently, we strive to match the scope names used by popular TextMate grammars and by the Linguist library. Usually source.<name> or text.<name> in case of markup languages |
injection-regex |
regex pattern that will be tested against a language name in order to determine whether this language should be used for a potential language injection site. |
file-types |
The filetypes of the language, for example ["yml", "yaml"] . Extensions and full file names are supported. |
shebangs |
The interpreters from the shebang line, for example ["sh", "bash"] |
roots |
A set of marker files to look for when trying to find the workspace root. For example Cargo.lock , yarn.lock |
auto-format |
Whether to autoformat this language when saving |
diagnostic-severity |
Minimal severity of diagnostic for it to be displayed. (Allowed values: Error , Warning , Info , Hint ) |
comment-token |
The token to use as a comment-token |
indent |
The indent to use. Has sub keys tab-width and unit |
config |
Language server configuration |
grammar |
The tree-sitter grammar to use (defaults to the value of name ) |
Grammar configuration
If a tree-sitter grammar is available for the language, add a new grammar
entry to languages.toml
.
[[grammar]]
name = "mylang"
source = { git = "https://github.com/example/mylang", rev = "a250c4582510ff34767ec3b7dcdd3c24e8c8aa68" }
Grammar configuration takes these keys:
Key | Description |
---|---|
name |
The name of the tree-sitter grammar |
source |
The method of fetching the grammar - a table with a schema defined below |
Where source
is a table with either these keys when using a grammar from a
git repository:
Key | Description |
---|---|
git |
A git remote URL from which the grammar should be cloned |
rev |
The revision (commit hash or tag) which should be fetched |
subpath |
A path within the grammar directory which should be built. Some grammar repositories host multiple grammars (for example tree-sitter-typescript and tree-sitter-ocaml ) in subdirectories. This key is used to point hx --build-grammars to the correct path for compilation. When omitted, the root of repository is used |
Or a path
key with an absolute path to a locally available grammar directory.
Queries
For a language to have syntax-highlighting and indentation among
other things, you have to add queries. Add a directory for your
language with the path runtime/queries/<name>/
. The tree-sitter
website
gives more info on how to write queries.
NOTE: When evaluating queries, the first matching query takes precedence, which is different from other editors like neovim where the last matching query supersedes the ones before it. See this issue for an example.
Common Issues
-
If you get errors when running after switching branches, you may have to update the tree-sitter grammars. Run
hx --fetch-grammars
to fetch the grammars andhx --build-grammars
to build any out-of-date grammars. -
If a parser is segfaulting or you want to remove the parser, make sure to remove the compiled parser in
runtime/grammar/<name>.so
-
The indents query is
indents.toml
, notindents.scm
. See this issue for more information.