mirror of
https://github.com/helix-editor/helix.git
synced 2025-01-05 14:49:40 +04:00
4fc991fdec
This is a rather large refactor that moves most of the code for loading, fetching, and building grammars into a new helix-loader module. This works well with the [[grammars]] syntax for languages.toml defined earlier: we only have to depend on the types for GrammarConfiguration in helix-loader and can leave all the [[language]] entries for helix-core.
91 lines
5.1 KiB
Markdown
91 lines
5.1 KiB
Markdown
# Adding languages
|
|
|
|
## Language configuration
|
|
|
|
To add a new language, you need to add a `language` entry to the
|
|
[`languages.toml`][languages.toml] found in the root of the repository;
|
|
this `languages.toml` file is included at compilation time, and is
|
|
distinct from the `languages.toml` file in the user's [configuration
|
|
directory](../configuration.md).
|
|
|
|
```toml
|
|
[[language]]
|
|
name = "mylang"
|
|
scope = "scope.mylang"
|
|
injection-regex = "^mylang$"
|
|
file-types = ["mylang", "myl"]
|
|
comment-token = "#"
|
|
indent = { tab-width = 2, unit = " " }
|
|
```
|
|
|
|
These are the available keys and descriptions for the file.
|
|
|
|
| Key | Description |
|
|
| ---- | ----------- |
|
|
| `name` | The name of the language |
|
|
| `scope` | A string like `source.js` that identifies the language. Currently, we strive to match the scope names used by popular TextMate grammars and by the Linguist library. Usually `source.<name>` or `text.<name>` in case of markup languages |
|
|
| `injection-regex` | regex pattern that will be tested against a language name in order to determine whether this language should be used for a potential [language injection][treesitter-language-injection] site. |
|
|
| `file-types` | The filetypes of the language, for example `["yml", "yaml"]`. Extensions and full file names are supported. |
|
|
| `shebangs` | The interpreters from the shebang line, for example `["sh", "bash"]` |
|
|
| `roots` | A set of marker files to look for when trying to find the workspace root. For example `Cargo.lock`, `yarn.lock` |
|
|
| `auto-format` | Whether to autoformat this language when saving |
|
|
| `diagnostic-severity` | Minimal severity of diagnostic for it to be displayed. (Allowed values: `Error`, `Warning`, `Info`, `Hint`) |
|
|
| `comment-token` | The token to use as a comment-token |
|
|
| `indent` | The indent to use. Has sub keys `tab-width` and `unit` |
|
|
| `config` | Language server configuration |
|
|
| `grammar` | The tree-sitter grammar to use (defaults to the value of `name`) |
|
|
|
|
## Grammar configuration
|
|
|
|
If a tree-sitter grammar is available for the language, add a new `grammar`
|
|
entry to `languages.toml`.
|
|
|
|
```toml
|
|
[[grammar]]
|
|
name = "mylang"
|
|
source = { git = "https://github.com/example/mylang", rev = "a250c4582510ff34767ec3b7dcdd3c24e8c8aa68" }
|
|
```
|
|
|
|
Grammar configuration takes these keys:
|
|
|
|
| Key | Description |
|
|
| --- | ----------- |
|
|
| `name` | The name of the tree-sitter grammar |
|
|
| `source` | The method of fetching the grammar - a table with a schema defined below |
|
|
|
|
Where `source` is a table with either these keys when using a grammar from a
|
|
git repository:
|
|
|
|
| Key | Description |
|
|
| --- | ----------- |
|
|
| `git` | A git remote URL from which the grammar should be cloned |
|
|
| `rev` | The revision (commit hash or tag) which should be fetched |
|
|
| `subpath` | A path within the grammar directory which should be built. Some grammar repositories host multiple grammars (for example `tree-sitter-typescript` and `tree-sitter-ocaml`) in subdirectories. This key is used to point `hx --build-grammars` to the correct path for compilation. When omitted, the root of repository is used |
|
|
|
|
Or a `path` key with an absolute path to a locally available grammar directory.
|
|
|
|
## Queries
|
|
|
|
For a language to have syntax-highlighting and indentation among
|
|
other things, you have to add queries. Add a directory for your
|
|
language with the path `runtime/queries/<name>/`. The tree-sitter
|
|
[website](https://tree-sitter.github.io/tree-sitter/syntax-highlighting#queries)
|
|
gives more info on how to write queries.
|
|
|
|
> NOTE: When evaluating queries, the first matching query takes
|
|
precedence, which is different from other editors like neovim where
|
|
the last matching query supersedes the ones before it. See
|
|
[this issue][neovim-query-precedence] for an example.
|
|
|
|
## Common Issues
|
|
|
|
- If you get errors when running after switching branches, you may have to update the tree-sitter grammars. Run `hx --fetch-grammars` to fetch the grammars and `hx --build-grammars` to build any out-of-date grammars.
|
|
|
|
- If a parser is segfaulting or you want to remove the parser, make sure to remove the compiled parser in `runtime/grammar/<name>.so`
|
|
|
|
- The indents query is `indents.toml`, *not* `indents.scm`. See [this](https://github.com/helix-editor/helix/issues/114) issue for more information.
|
|
|
|
[treesitter-language-injection]: https://tree-sitter.github.io/tree-sitter/syntax-highlighting#language-injection
|
|
[languages.toml]: https://github.com/helix-editor/helix/blob/master/languages.toml
|
|
[neovim-query-precedence]: https://github.com/helix-editor/helix/pull/1170#issuecomment-997294090
|