clang-tools  8.0.0
Trigram.h
Go to the documentation of this file.
1 //===--- Trigram.h - Trigram generation for Fuzzy Matching ------*- C++ -*-===//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //===----------------------------------------------------------------------===//
9 ///
10 /// \file
11 /// Trigrams are attributes of the symbol unqualified name used to effectively
12 /// extract symbols which can be fuzzy-matched given user query from the
13 /// inverted index. To match query with the extracted set of trigrams Q, the set
14 /// of generated trigrams T for identifier (unqualified symbol name) should
15 /// contain all items of Q, i.e. Q ⊆ T.
16 ///
17 /// Trigram sets extracted from unqualified name and from query are different:
18 /// the set of query trigrams only contains consecutive sequences of three
19 /// characters (which is only a subset of all trigrams generated for an
20 /// identifier).
21 ///
22 //===----------------------------------------------------------------------===//
23 
24 #ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_DEX_TRIGRAM_H
25 #define LLVM_CLANG_TOOLS_EXTRA_CLANGD_DEX_TRIGRAM_H
26 
27 #include "Token.h"
28 
29 #include <string>
30 
31 namespace clang {
32 namespace clangd {
33 namespace dex {
34 
35 /// Returns list of unique fuzzy-search trigrams from unqualified symbol.
36 /// The trigrams give the 3-character query substrings this symbol can match.
37 ///
38 /// The symbol's name is broken into segments, e.g. "FooBar" has two segments.
39 /// Trigrams can start at any character in the input. Then we can choose to move
40 /// to the next character or to the start of the next segment.
41 ///
42 /// Short trigrams (length 1-2) are used for short queries. These are:
43 /// - prefixes of the identifier, of length 1 and 2
44 /// - the first character + next head character
45 ///
46 /// For "FooBar" we get the following trigrams:
47 /// {f, fo, fb, foo, fob, fba, oob, oba, bar}.
48 ///
49 /// Trigrams are lowercase, as trigram matching is case-insensitive.
50 /// Trigrams in the returned list are deduplicated.
51 std::vector<Token> generateIdentifierTrigrams(llvm::StringRef Identifier);
52 
53 /// Returns list of unique fuzzy-search trigrams given a query.
54 ///
55 /// Query is segmented using FuzzyMatch API and downcasted to lowercase. Then,
56 /// the simplest trigrams - sequences of three consecutive letters and digits
57 /// are extracted and returned after deduplication.
58 ///
59 /// For short queries (less than 3 characters with Head or Tail roles in Fuzzy
60 /// Matching segmentation) this returns a single trigram with the first
61 /// characters (up to 3) to perform prefix match.
62 std::vector<Token> generateQueryTrigrams(llvm::StringRef Query);
63 
64 } // namespace dex
65 } // namespace clangd
66 } // namespace clang
67 
68 #endif // LLVM_CLANG_TOOLS_EXTRA_CLANGD_DEX_TRIGRAM_H
std::vector< Token > generateIdentifierTrigrams(llvm::StringRef Identifier)
Returns list of unique fuzzy-search trigrams from unqualified symbol.
Definition: Trigram.cpp:24
std::vector< Token > generateQueryTrigrams(llvm::StringRef Query)
Returns list of unique fuzzy-search trigrams given a query.
Definition: Trigram.cpp:87
===– Representation.cpp - ClangDoc Representation --------—*- C++ -*-===//
Token objects represent a characteristic of a symbol, which can be used to perform efficient search...