helix-mirror/book/src/languages.md
Michael Davis 0ad10ce6f7
rewrite language configuration docs (#2838)
This change moves the configuration tables from the Adding
Languages guide into the overall Languages section. It also
adds more detailed documentation on the `language-server`
configuration key and fixes a typo in the "mylang" example
(the scope was `scope.mylang` instead of `source.mylang`).
2022-06-21 01:15:50 +02:00

6.5 KiB

Languages

Language-specific settings and settings for language servers are configured in languages.toml files.

languages.toml files

There are three possible languages.toml files. The first is compiled into Helix and lives in the Helix repository. This provides the default configurations for languages and language servers.

You may define a languages.toml in your configuration directory which overrides values from the built-in language configuration. For example to disable auto-LSP-formatting in Rust:

# in <config_dir>/helix/languages.toml

[[language]]
name = "rust"
auto-format = false

Language configuration may also be overridden local to a project by creating a languages.toml file under a .helix directory. Its settings will be merged with the language configuration in the configuration directory and the built-in configuration.

Language configuration

Each language is configured by adding a [[language]] section to a languages.toml file. For example:

[[language]]
name = "mylang"
scope = "source.mylang"
injection-regex = "^mylang$"
file-types = ["mylang", "myl"]
comment-token = "#"
indent = { tab-width = 2, unit = "  " }
language-server = { command = "mylang-lsp", args = ["--stdio"] }

These configuration keys are available:

Key Description
name The name of the language
scope A string like source.js that identifies the language. Currently, we strive to match the scope names used by popular TextMate grammars and by the Linguist library. Usually source.<name> or text.<name> in case of markup languages
injection-regex regex pattern that will be tested against a language name in order to determine whether this language should be used for a potential language injection site.
file-types The filetypes of the language, for example ["yml", "yaml"]. Extensions and full file names are supported.
shebangs The interpreters from the shebang line, for example ["sh", "bash"]
roots A set of marker files to look for when trying to find the workspace root. For example Cargo.lock, yarn.lock
auto-format Whether to autoformat this language when saving
diagnostic-severity Minimal severity of diagnostic for it to be displayed. (Allowed values: Error, Warning, Info, Hint)
comment-token The token to use as a comment-token
indent The indent to use. Has sub keys tab-width and unit
language-server The Language Server to run. See the Language Server configuration section below.
config Language Server configuration
grammar The tree-sitter grammar to use (defaults to the value of name)

Language Server configuration

The language-server field takes the following keys:

Key Description
command The name of the language server binary to execute. Binaries must be in $PATH
args A list of arguments to pass to the language server binary
timeout The maximum time a request to the language server may take, in seconds. Defaults to 20
language-id The language name to pass to the language server. Some language servers support multiple languages and use this field to determine which one is being served in a buffer

The top-level config field is used to configure the LSP initialization options. A format sub-table within config can be used to pass extra formatting options to Document Formatting Requests. For example with typescript:

[[language]]
name = "typescript"
auto-format = true
# pass format options according to https://github.com/typescript-language-server/typescript-language-server#workspacedidchangeconfiguration omitting the "[language].format." prefix.
config = { format = { "semicolons" = "insert", "insertSpaceBeforeFunctionParenthesis" = true } }

Tree-sitter grammar configuration

The source for a language's tree-sitter grammar is specified in a [[grammar]] section in languages.toml. For example:

[[grammar]]
name = "mylang"
source = { git = "https://github.com/example/mylang", rev = "a250c4582510ff34767ec3b7dcdd3c24e8c8aa68" }

Grammar configuration takes these keys:

Key Description
name The name of the tree-sitter grammar
source The method of fetching the grammar - a table with a schema defined below

Where source is a table with either these keys when using a grammar from a git repository:

Key Description
git A git remote URL from which the grammar should be cloned
rev The revision (commit hash or tag) which should be fetched
subpath A path within the grammar directory which should be built. Some grammar repositories host multiple grammars (for example tree-sitter-typescript and tree-sitter-ocaml) in subdirectories. This key is used to point hx --grammar build to the correct path for compilation. When omitted, the root of repository is used

Choosing grammars

You may use a top-level use-grammars key to control which grammars are fetched and built when using hx --grammar fetch and hx --grammar build.

# Note: this key must come **before** the [[language]] and [[grammar]] sections
use-grammars = { only = [ "rust", "c", "cpp" ] }
# or
use-grammars = { except = [ "yaml", "json" ] }

When omitted, all grammars are fetched and built.