Search results

Jargonness
...guage corpus and <math>f_s</math> stands for its frequency in a scientific corpus. ...he word's jargonness. In case a word has no mention in the general English corpus, 3 is taken as its jargonness as suggested by the second part of the equati ...

3 KB (382 words) - 08:40, 5 October 2024
Second-order co-occurrence pointwise mutual information
...t neighbor words of the two target words from a large [[Corpus linguistics|corpus]]. ...occur with the same neighboring words. For example, the [[British National Corpus]] (BNC) has been used as a source of frequencies and contexts. ...

5 KB (852 words) - 20:30, 9 March 2022
Yarowsky algorithm
{{short description|Method in computational linguistics}} In [[computational linguistics]] the '''Yarowsky algorithm''' is an [[unsupervised learning]] [[algorithm] ...

6 KB (956 words) - 19:00, 28 January 2023
Kneser–Ney smoothing
.../assets/files/langmod.pdf 'Brown University: Introduction to Computational Linguistics ']</ref> however, Kneser–Ney smoothing corrects this by considering the fre ...ces of the word <math>w</math> followed by the word <math>w'</math> in the corpus. ...

6 KB (927 words) - 07:19, 13 February 2023
Draft:Mathematical linguistics
{{Draft topics|linguistics|stem}} | header = Example Applications of Mathematical Linguistics ...

15 KB (2,002 words) - 14:51, 15 February 2025
Noisy channel model
...processing: an introduction to natural language processing, computational linguistics, and speech recognition |date=2009 |others=James H. Martin |isbn=978-0-13-1 ...r Estimation |url=https://aclanthology.org/J93-2003 |journal=Computational Linguistics |volume=19 |issue=2 |pages=263–311}}</ref> ...

8 KB (1,267 words) - 19:51, 4 November 2024
Abstract Meaning Representation
...t Meaning Representation (AMR)|publisher=}}</ref> is a [[Formal semantics (linguistics)|semantic representation language]]. AMR graphs are rooted, labeled, direct ...cation=Philadelphia, Pennsylvania |publisher=Association for Computational Linguistics |pages=153–158 |doi=10.3115/100964.100979|doi-access=free }}</ref> they are ...

5 KB (764 words) - 06:51, 17 January 2025
Ōno's lexical law
...ar regression. In "Quantitative Linguistics Vol. 39, Japanese Quantitative Linguistics" (ed. Shizuo Mizutani) pp. 1–13, Bochum: Studienverlag Dr. N. Brockmeyer.</ [[Category:Corpus linguistics]] ...

5 KB (921 words) - 03:22, 8 October 2023
ELMo
...character-level as inputs and produces word-level embeddings, trained on a corpus of about 30 million sentences and 1 billion words. The architecture of ELMo accomplishes a [[Context (linguistics)|contextual]] understanding of [[Lexical token|tokens]]. Deep contextualize ...

8 KB (1,161 words) - 14:38, 7 November 2024
GloVe
...med on aggregated global word-word [[co-occurrence]] [[statistics]] from a corpus, and the resulting representations showcase interesting linear substructure ...representation which we call GloVe, for Global Vectors, because the global corpus statistics are captured directly by the model."</ref> and was launched in 2 ...

12 KB (1,784 words) - 20:35, 14 January 2025
Referring expression generation
...'', as for example in the work of [[John Lyons (linguist)|John Lyons]]. In linguistics, the study of reference relations belongs to [[pragmatics]], the study of l ...). Centering: A Parametric Theory and Its Instantiations. ''Computational Linguistics'' 30:309-363 [http://www.aclweb.org/anthology/J04-3003.pdf]</ref> and ideal ...

31 KB (4,473 words) - 16:24, 15 January 2024
IBM alignment models
...Translation |url=https://aclanthology.org/J90-2002/ |journal=Computational Linguistics |volume=16 |issue=2 |pages=79–85}}</ref> and the entire series is published === Learning from a corpus === ...

19 KB (3,163 words) - 19:10, 10 February 2025
Paraphrasing (computational linguistics)
...ain.62 |arxiv=2210.03568 }}</ref> of new samples to expand existing [[Text corpus|corpora]].<ref name=Barzilay /> ...nce alignment]] to generate sentence-level paraphrases from an unannotated corpus. This is done by ...

24 KB (3,312 words) - 23:25, 27 February 2025
Word list
{{Short description|Bare list of a language's words in corpus linguistics}} ...on words are not left out. Some major pitfalls are the corpus content, the corpus [[register (sociolinguistics)|register]], and the definition of "[[word]]". ...

26 KB (3,645 words) - 16:47, 4 February 2025
Vector space model
...he vocabulary (the number of distinct words occurring in the [[text corpus|corpus]]). Candidate documents from the corpus can be retrieved and ranked using a variety of methods. [[Relevance (inform ...

10 KB (1,468 words) - 02:57, 30 September 2024
Word n-gram language model
...s, each word's probability is slightly lower than its frequency count in a corpus. To calculate it, various methods were used, from simple "add-one" smoothin ...fixed vocabulary. In such a scenario, the ''n''-grams in the [[text corpus|corpus]] that contain an out-of-vocabulary word are ignored. The ''n''-gram probab ...

20 KB (2,793 words) - 20:42, 28 November 2024
Word2vec
...timates these representations by modeling text in a large [[corpus of text|corpus]]. Once trained, such a model can detect [[synonym]]ous words or suggest ad ...hundred [[dimensions]], with each unique word in the [[Corpus linguistics|corpus]] being assigned a vector in the space. ...

30 KB (4,458 words) - 00:34, 26 February 2025
SimRank
In a [[Text corpus|document corpus]], matching text may be used, and for collaborative filtering, similar user ...applied to scientific papers and their citations, or to any other document corpus with [[cross-reference]] information. ...

16 KB (2,520 words) - 21:33, 5 July 2024
Draft:Toolformer
...onal Linguistics |location=Online |publisher=Association for Computational Linguistics |pages=7315–7330 |doi=10.18653/v1/2020.acl-main.653}}</ref> ...

12 KB (1,704 words) - 04:02, 30 October 2024
BERT (language model)
...g/2020.tacl-1.54|journal=Transactions of the Association for Computational Linguistics|volume=8|pages=842–866|doi=10.1162/tacl_a_00349|arxiv=2002.12327|s2cid=2115 ...he model predicts if these two spans appeared sequentially in the training corpus, outputting either <code>[IsNext]</code> or <code>[NotNext]</code>. The fir ...

30 KB (4,224 words) - 00:58, 24 February 2025

Search results

Navigation menu

Search