[情報] OA STM Corpus
https://elsevierlabs.github.io/OA-STM-Corpus/
OA STM Corpus
A corpus, and small treebank, of Open Access journal articles from
multiple disciplines in Science, Technology, and Medicine
Corpus
-----------
To improve this situation, Elsevier is providing a selection of 110
journal articles from 10 different STM domains as a
freely-redistributable corpus. The articles were selected from our Open
Access content and have a Creative Commons CC-BY license. Therefore, they
are free to redistribute and use. The domains are agriculture, astronomy,
biology, chemistry, computer science, earth science, engineering,
materials science, math, and medicine. Currently we provide 11 articles
in each of the 10 domains. For each article in the corpus we provide:
* the XML source,
* a simple text version for easier text mining,
* several versions with different annotations. These include part of
speech tags, sentence breaks, NP and VP chunks, lemmas, syntactic
constituents parses, wikipedia concept identification, and discourse
analysis. (Some of this is still under construction.)
相關文章:
New open access resource will support text mining and natural language
processing, http://goo.gl/ayB6UO
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 147.46.219.86
※ 文章網址: https://www.ptt.cc/bbs/Linguistics/M.1426231132.A.367.html
Linguistics 近期熱門文章
PTT職涯區 即時熱門文章