[情報] OA STM Corpus

看板Linguistics (語言學習)作者 (只是個暱稱罷了)時間9年前 (2015/03/13 15:18), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串1/1
https://elsevierlabs.github.io/OA-STM-Corpus/ OA STM Corpus A corpus, and small treebank, of Open Access journal articles from multiple disciplines in Science, Technology, and Medicine Corpus ----------- To improve this situation, Elsevier is providing a selection of 110 journal articles from 10 different STM domains as a freely-redistributable corpus. The articles were selected from our Open Access content and have a Creative Commons CC-BY license. Therefore, they are free to redistribute and use. The domains are agriculture, astronomy, biology, chemistry, computer science, earth science, engineering, materials science, math, and medicine. Currently we provide 11 articles in each of the 10 domains. For each article in the corpus we provide: * the XML source, * a simple text version for easier text mining, * several versions with different annotations. These include part of speech tags, sentence breaks, NP and VP chunks, lemmas, syntactic constituents parses, wikipedia concept identification, and discourse analysis. (Some of this is still under construction.) 相關文章: New open access resource will support text mining and natural language processing, http://goo.gl/ayB6UO -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 147.46.219.86 ※ 文章網址: https://www.ptt.cc/bbs/Linguistics/M.1426231132.A.367.html
文章代碼(AID): #1L0ezSDd (Linguistics)
文章代碼(AID): #1L0ezSDd (Linguistics)