Commit Graph

120 Commits

Author SHA1 Message Date
Lewis Russell
032df963bb refactor(treesitter): language loading 2024-04-21 14:09:27 +01:00
Lewis Russell
47388614cb refactor(treesitter): handle coverity warnings better 2024-03-20 12:22:54 +00:00
Lewis Russell
0f85aeb478 fix(treesitter): treecursor regression
- Also address some coverity warnings

Fixes #27942
2024-03-20 10:56:16 +00:00
Lewis Russell
597d4c63bd refactor(treesitter): reorder functions 2024-03-19 18:40:08 +00:00
Lewis Russell
aca6c93002 refactor(treesitter): simplify argument checks for userdata 2024-03-19 16:16:54 +00:00
Lewis Russell
aca2048bcd refactor(treesitter): redesign query iterating
Problem:

  `TSNode:_rawquery()` is complicated, has known issues and the Lua and
  C code is awkwardly coupled (see logic with `active`).

Solution:

  - Add `TSQueryCursor` and `TSQueryMatch` bindings.
  - Replace `TSNode:_rawquery()` with `TSQueryCursor:next_capture()` and `TSQueryCursor:next_match()`
  - Do more stuff in Lua
  - API for `Query:iter_captures()` and `Query:iter_matches()` remains the same.
  - `treesitter.c` no longer contains any logic related to predicates.
  - Add `match_limit` option to `iter_matches()`. Default is still 256.
2024-03-19 14:24:59 +00:00
zeertzjq
ac8cd5368d refactor: use ml_get_buf_len() in API code (#27825) 2024-03-12 10:44:53 +08:00
Thomas Vigouroux
bd5008de07 fix(treesitter): correctly handle query quantifiers (#24738)
Query patterns can contain quantifiers (e.g. (foo)+ @bar), so a single
capture can map to multiple nodes. The iter_matches API can not handle
this situation because the match table incorrectly maps capture indices
to a single node instead of to an array of nodes.

The match table should be updated to map capture indices to an array of
nodes. However, this is a massively breaking change, so must be done
with a proper deprecation period.

`iter_matches`, `add_predicate` and `add_directive` must opt-in to the
correct behavior for backward compatibility. This is done with a new
"all" option. This option will become the default and removed after the
0.10 release.

Co-authored-by: Christian Clason <c.clason@uni-graz.at>
Co-authored-by: MDeiml <matthias@deiml.net>
Co-authored-by: Gregory Anders <greg@gpanders.com>
2024-02-16 11:54:47 -06:00
Jongwook Choi
800134ea5e refactor(treesitter): typing for Query, TSQuery, and TSQueryInfo
- `TSQuery`: userdata object for parsed query.

- `vim.treesitter.Query`: renamed from `Query`.
  - Add a new field `lang`.

- `TSQueryInfo`:
  - Move to `vim/treesitter/_meta.lua`, because C code owns it.
  - Correct typing for `patterns`, should be a map from `integer`
    (pattern_id) to `(integer|string)[][]` (list of predicates or
    directives).

- `vim.treesitter.QueryInfo` is added.
  - This currently has the same structure as `TSQueryInfo` (exported
    from C code).
  - Document the fields (see `TSQuery:inspect`).

- Add typing for `vim._ts_parse_query()`.
2024-02-08 12:40:16 +00:00
Jongwook Choi
5b1b765610 docs: enforce "treesitter" spelling #27110
It's the "tree-sitter" project, but "treesitter" in our code and docs.
2024-01-28 17:53:14 -08:00
Christian Clason
83b51b36aa fixup: raise TS min version 2024-01-25 23:39:25 +01:00
dundargoc
79b6ff28ad refactor: fix headers with IWYU 2023-11-28 22:23:56 +01:00
dundargoc
6c14ae6bfa refactor: rename types.h to types_defs.h 2023-11-27 21:57:51 +01:00
dundargoc
f4aedbae4c build(IWYU): fix includes for undo_defs.h 2023-11-27 19:33:17 +01:00
zeertzjq
574d25642f refactor: move Arena and ArenaMem to memory_defs.h (#26240) 2023-11-27 17:21:58 +08:00
dundargoc
353a4be7e8 build: remove PVS
We already have an extensive suite of static analysis tools we use,
which causes a fair bit of redundancy as we get duplicate warnings. PVS
is also prone to give false warnings which creates a lot of work to
identify and disable.
2023-11-12 21:26:39 +01:00
dundargoc
5f03a1eaab build(lint): remove unnecessary clint.py rules
Uncrustify is the source of truth where possible.
Remove any redundant checks from clint.py.
2023-10-23 20:06:21 +02:00
dundargoc
8e932480f6 refactor: the long goodbye
long is 32 bits on windows, while it is 64 bits on other architectures.
This makes the type suboptimal for a codebase meant to be
cross-platform. Replace it with more appropriate integer types.
2023-10-09 11:45:46 +02:00
zeertzjq
cf8b2c0e74 build(iwyu): add a few more _defs.h mappings (#25435) 2023-09-30 12:05:28 +08:00
bfredl
5970157e1d refactor(map): enhanced implementation, Clean Code™, etc etc
This involves two redesigns of the map.c implementations:

1. Change of macro style and code organization

The old khash.h and map.c implementation used huge #define blocks with a
lot of backslash line continuations.

This instead uses the "implementation file" .c.h pattern. Such a file is
meant to be included multiple times, with different macros set prior to
inclusion as parameters. we already use this pattern e.g. for
eval/typval_encode.c.h to implement different typval encoders reusing a
similar structure.

We can structure this code into two parts. one that only depends on key
type and is enough to implement sets, and one which depends on both key
and value to implement maps (as a wrapper around sets, with an added
value[] array)

2. Separate the main hash buckets from the key / value arrays

Change the hack buckets to only contain an index into separate key /
value arrays
This is a common pattern in modern, state of the art hashmap
implementations. Even though this leads to one more allocated array, it
is this often is a net reduction of memory consumption. Consider
key+value consuming at least 12 bytes per pair. On average, we will have
twice as many buckets per item.
Thus old implementation:

  2*12 = 24 bytes per item

New implementation

  1*12 + 2*4 = 20 bytes per item

And the difference gets bigger with larger items.
One might think we have pulled a fast one here, as wouldn't the average size of
the new key/value arrays be 1.5 slots per items due to amortized grows?
But remember, these arrays are fully dense, and thus the accessed memory,
measured in _cache lines_, the unit which actually matters, will be the
fully used memory but just rounded up to the nearest cache line
boundary.

This has some other interesting properties, such as an insert-only
set/map will be fully ordered by insert only. Preserving this ordering
in face of deletions is more tricky tho. As we currently don't use
ordered maps, the "delete" operation maintains compactness of the item
arrays in the simplest way by breaking the ordering. It would be
possible to implement an order-preserving delete although at some cost,
like allowing the items array to become non-dense until the next rehash.

Finally, in face of these two major changes, all code used in khash.h
has been integrated into map.c and friends. Given the heavy edits it
makes no sense to "layer" the code into a vendored and a wrapper part.
Rather, the layered cake follows the specialization depth: code shared
for all maps, code specialized to a key type (and its equivalence
relation), and finally code specialized to value+key type.
2023-09-08 12:48:46 +02:00
Lewis Russell
dd0e77d48a fix(query_error): multiline bug 2023-08-31 15:12:17 +01:00
Amaan Qureshi
845d5b8b64 feat(treesitter): improve query error message 2023-08-31 13:33:40 +01:00
Lewis Russell
f85aa2e67f fix(treesitter.c): improve comments on fenv usage 2023-08-30 10:50:07 +01:00
bfredl
50a03c0e99 fix(treesitter): fix another TSNode:tree() double free
Unfortunately the gc=false objects can refer to a dangling tree if the
gc=true tree was freed first. This reuses the same tree object as the
node itself is keeping alive via the uservalue of the node userdata.
(wrapped in a table due to lua 5.1 restrictions)
2023-08-29 11:35:46 +02:00
nwounkn
6e45567b49 fix(treesitter): fix TSNode:tree() double free (#24796)
Problem: `push_tree`, every time its called for the same TSTree with
`do_copy=false` argument, creates a new userdata for it. Each userdata,
when garbage collected, frees the same TSTree C object.

Solution: Add flag to userdata, which indicates, should C object,
which userdata points to, be freed, when userdata is garbage collected.
2023-08-29 10:48:23 +02:00
bfredl
cefd774fac refactor(memline): distinguish mutating uses of ml_get_buf()
ml_get_buf() takes a third parameters to indicate whether the
caller wants to mutate the memline data in place. However
the vast majority of the call sites is using this function
just to specify a buffer but without any mutation. This makes
it harder to grep for the places which actually perform mutation.

Solution: Remove the bool param from ml_get_buf(). it now works
like ml_get() except for a non-current buffer. Add a new
ml_get_buf_mut() function for the mutating use-case, which can
be grepped along with the other ml_replace() etc functions which
can modify the memline.
2023-08-24 22:40:56 +02:00
Lewis Russell
8179d68dc1 fix(treesitter): logger memory leak 2023-08-13 11:23:17 +01:00
Christian Clason
2e824e89c1 build(deps): bump tree-sitter to HEAD - 0a1c4d846 (#24607)
adapt to breaking change in `ts_query_cursor_set_max_start_depth`
https://github.com/tree-sitter/tree-sitter/pull/2278
2023-08-08 14:57:06 +02:00
bfredl
67176c3f20 Merge pull request #15534 from bfredl/monomap
refactor(map): avoid duplicated khash_t implementations for values and support sets
2023-05-17 13:00:32 +02:00
Lewis Russell
189fb62032 feat(treesitter): improved logging (#23638)
- Add bindings to Treesitter ts_parser_set_logger and ts_parser_logger
- Add logfile with path STDPATH('log')/treesitter.c
- Rework existing LanguageTree loggin to use logfile
- Begin implementing log levels for vim.g.__ts_debug
2023-05-17 11:42:18 +01:00
bfredl
e2fdd53d8c refactor(map): avoid duplicated khash_t types for values
This reduces the total number of khash_t instantiations from 22 to 8.

Make the khash internal functions take the size of values as a runtime
parameter. This is abstracted with typesafe Map containers which
are still specialized for both key, value type.

Introduce `Set(key)` type for when there is no value.

Refactor shada.c to use Map/Set instead of khash directly.
This requires `map_ref` operation to be more flexible.
Return pointers to both key and value, plus an indicator for new_item.
As a bonus, `map_key` is now redundant.

Instead of Map(cstr_t, FileMarks), use a pointer map as the FileMarks struct is
humongous.

Make `event_strings` actually work like an intern pool instead of wtf it
was doing before.
2023-05-17 12:26:21 +02:00
Lewis Russell
e124672ce9 fix(treesitter): reset cursor max_start_depth 2023-05-11 12:08:33 +01:00
Lewis Russell
af040c3a07 feat(treesitter): add support for setting query depths 2023-05-11 11:13:32 +01:00
dundargoc
3b0df1780e refactor: uncrustify
Notable changes: replace all infinite loops to `while(true)` and remove
`int` from `unsigned int`.
2023-04-26 23:23:44 +02:00
Lewis Russell
090ade4af6 refactor(treesitter): delegate region calculation to treesitter (#22576) 2023-04-04 13:58:16 +02:00
Lewis Russell
4f75960660 fix(treesitter): correct include_bytes arg for parse() 2023-03-10 10:08:32 +00:00
Lewis Russell
ae263aff95 refactor(treesitter): use byte ranges from treesitter (#22589) 2023-03-09 16:09:39 +00:00
Lewis Russell
b9f19d3e28 Revert "refactor(treesitter): delegate region calculation to treesitter" (#22575)
Revert "refactor(treesitter): delegate region calculation to treesitter (#22553)"

This reverts commit 276b647fdb.
2023-03-08 17:59:45 +00:00
Lewis Russell
276b647fdb refactor(treesitter): delegate region calculation to treesitter (#22553) 2023-03-08 17:22:28 +00:00
Christian Clason
7e90f247e7 fix(treesitter): raise ts_match_limit to 256 (#22497)
Problem: Some complex queries may not return all matches.

Solution: Raise `ts_match_limit` from current 64 (twice the original
default) to 256 (which Helix uses, and seems to be enough for the reported
problematic cases).

If this leads performance regressions in other queries, we should add a
generic querying timeout instead of relying on a low value here.
2023-03-03 16:16:17 +01:00
bfredl
c002fd421e refactor(build): graduate libtreesitter features which are 1+ years old 2023-03-03 14:25:20 +01:00
Lewis Russell
774e59f3f9 feat(treesitter): expand the API 2023-02-26 16:53:33 +00:00
dundargoc
7224c889e0 build: enable MSVC level 3 warnings (#21934)
MSVC has 4 different warning levels: 1 (severe), 2 (significant), 3
(production quality) and 4 (informational). Enabling level 3 warnings
mostly revealed conversion problems, similar to GCC/clang -Wconversion
flag.
2023-02-11 10:25:24 +01:00
Matthieu Coudron
151b9fc52e feat(treesitter): show filetype associated with parser (#17633)
to ease debug. At one point I had an empty filetype and the current message was not helpful enough
2023-01-22 16:51:17 +01:00
dundargoc
50f03773f4 refactor: replace char_u with char 18 (#21237)
refactor: replace char_u with char

Work on https://github.com/neovim/neovim/issues/459
2023-01-09 22:37:34 +08:00
dundargoc
66360675cf build: allow IWYU to fix includes for all .c files
Allow Include What You Use to remove unnecessary includes and only
include what is necessary. This helps with reducing compilation times
and makes it easier to visualise which dependencies are actually
required.

Work on https://github.com/neovim/neovim/issues/549, but doesn't close
it since this only works fully for .c files and not headers.
2022-11-15 10:30:03 +01:00
Dundar Göc
04cdea5f4a refactor: replace char_u with char
Work on https://github.com/neovim/neovim/issues/459
2022-10-15 13:18:46 +02:00
dundargoc
df646572c5 docs: fix typos (#20394)
Co-authored-by: Raphael <glephunter@gmail.com>
Co-authored-by: smjonas <jonas.strittmatter@gmx.de>
Co-authored-by: zeertzjq <zeertzjq@outlook.com>
2022-09-30 09:53:52 +02:00
dundargoc
91e912f8d4 refactor: move klib out of src/nvim/ #20341
It's confusing to mix vendored dependencies with neovim source code. A
clean separation is simpler to keep track of and simpler to document.
2022-09-25 06:26:37 -07:00
Dundar Göc
c5322e752e refactor: replace char_u with char
Work on https://github.com/neovim/neovim/issues/459
2022-09-09 21:02:42 +02:00