References in periodicals archive ?
(13) Ilustracije radi, korpusna pretraga mrePnoga korpusa hrWaC 15.
Drugi bi takav glagol bio prokiUnjavati, ciji je PF parnjak prokisnuti posvjedocen u svim suvremenim rjecnicima (HJP, VRH, RHJ), nama je poznat iako rubno te je u korpusu hrWaC vrlo slabo posvjedocen (37 potvrda prema oko 1.500 potvrda za prokiUnjavati).
Kako bismo potkrijepili tu tvrdnju, u korpusu hrWaC pretraPili smo oblik smotr.* (42) i dobili samo devet potvrda, (43) vecinom u tekstovima iz starijeg jezika dostupnima na internetu.
We compare those numbers to the ones obtained on the Croatian, Bosnian and Serbian domains , showing that the second versions of the corpora (hrWaC and slWaC), which merge two crawls obtained with different tools and were collected three years apart, show a smaller level of reduction (around 30%) at each step of near-duplicate removal, while the first versions of corpora (bsWaC and srWaC), obtained with SpiderLing only and in one crawl, suffer more data loss in this process (around 35-40%).
Finally, as far as the analysis of reference corpora is concerned, Sketch Engine software (Kilgariff 2014) is used so as to reveal the frequency of the extracted extended term-embedding collocations in hrWaC 2.0.
Extended term-embedding collocations in CroCon Extended term-embedding Frequency in Frequency in collocations CroCon hrwac 2.0.
In the analysis that follows, lexical units based on the root kus and tat have been selected and each of them has been checked for meanings and contextual uses in the Croatian National Corpus (CNC), Croatian Web Corpus (HrWaC) and METU Turkish Corpus.
The noun okus appears in HrWaC almost 70,000 times; out of 200 randomly chosen tokens, only 12 refer to domains other than taste.
In HrWaC the verb kusati appears 10,448 times and the verb okusiti 4,671 times.
As previously mentioned, the Croatian noun okus 'taste' appears in HrWaC almost 70,000 times and among randomly chosen 200 tokens, the noun okus 'taste' refers to something other than food only in 12 examples.
Nevertheless, there are some examples of that kind in HrWaC, although they are rare and stylistically very marked, e.g.
(16) Pretraga korpusa hrWaC: u roku nekoliko (380 rezultata) / u roku od nekoliko (1,156).
Acronyms browser ?
Full browser ?