基于支持向量機(jī)的文本分類算法研究.doc

約47頁DOC格式手機(jī)打開展開

基于支持向量機(jī)的文本分類算法研究, 2萬字自己原創(chuàng)的畢業(yè)論文，僅在本站獨(dú)家出售，重復(fù)率低，推薦下載使用摘要：隨著互聯(lián)網(wǎng)技術(shù)的飛速發(fā)展，網(wǎng)上的電子文檔數(shù)量急劇增加。人們選擇可選擇的信息變多，但是信息的選擇也變得繁瑣，這就使得文本的自動分類越來越受人們的重視，而支持向量機(jī)和文本分類問題有著良好的結(jié)合點(diǎn)，從而使得基于支持向量機(jī)...
編號:150-423024大小:2.33M
分類: 論文>通信/電子論文

內(nèi)容介紹

此文檔由會員淘寶大夢發(fā)布

基于支持向量機(jī)的文本分類算法研究

2萬字
自己原創(chuàng)的畢業(yè)論文，僅在本站獨(dú)家出售，重復(fù)率低，推薦下載使用

摘要：隨著互聯(lián)網(wǎng)技術(shù)的飛速發(fā)展，網(wǎng)上的電子文檔數(shù)量急劇增加。人們選擇可選擇的信息變多，但是信息的選擇也變得繁瑣，這就使得文本的自動分類越來越受人們的重視，而支持向量機(jī)和文本分類問題有著良好的結(jié)合點(diǎn)，從而使得基于支持向量機(jī)的文本分類成為這個領(lǐng)域的研究熱點(diǎn)，支持向量機(jī)是一種基于結(jié)構(gòu)風(fēng)險最小化準(zhǔn)則的分類學(xué)習(xí)機(jī)模型，它的應(yīng)用十分廣泛。本文分類中，文本特征的提取和合理選擇是實(shí)現(xiàn)文本分類的一個關(guān)鍵步驟。文本分類一般分為預(yù)處理、統(tǒng)計(jì)、特征提取、訓(xùn)練和測試評價這幾個步驟。
本文主要研究基于支持向量機(jī)的分本分類，首先，從《中國學(xué)習(xí)者英語語料庫》中同一作文題目的英語作文作為語料，再根據(jù)作文得分不同將作文進(jìn)行分類，將對作文分類的問題轉(zhuǎn)化為對文本分類的問題；其次是對這些作文進(jìn)行特征的提取，組成特征向量，這些特征主要包括英語作文中的句子數(shù)量、字符總數(shù)、名詞代詞比和定冠詞頻率誤差等方面；最后利用支持向量機(jī)分類器根據(jù)提取的特征對作文進(jìn)行分類，對比語料分類結(jié)果，看分類器的準(zhǔn)確率，調(diào)整懲罰參數(shù)c和高斯徑向核函數(shù)的參數(shù)來提高分類器的性能。實(shí)驗(yàn)表明當(dāng)，時，分類器的準(zhǔn)確率達(dá)到最高78.7234%。

關(guān)鍵詞：特征提??；文本分類；支持向量機(jī)

Research on Support Vector Machine Classification Method
Abstract: With the rapid development of Internet technology,There is a sharp increase in the number of electronic documents online.People choose alternative information becomes much, but also becomes tedious selection information, which makes automatic text classification by the people more and more attention, and support vector machines, and text classification has a good combination of points, so that based on SVM text classification has become a hot research in this area, support vector machine is a structural risk minimization criteria for classification based on machine learning model, it is widely used. In this paper, classification, feature extraction and reasonable choice of texts is a key step towards text classification. Text classification is generally divided into pre-processing, statistics, feature extraction, training and testing and eva luation these steps.
In this paper,based on support vector machine of the classification, first, from the "Chinese Learner English Corpus" in the same essay topic as a corpus of English composition, according to the essay writing scores of different classification,will be transformed into the problem of writing score for text classification problems,followed by the extraction of these feature writing, composition feature vectors,these features include English composition in terms of the number of sentences, the total number of characters, nouns and pronouns than the definite article frequency errors.Then use the support vector machine classifier based on the extracted features for classification essay,compare corpus classification results,see the classification accuracy rate adjustment penalty parameter and kernel function parameters to improve the classification performance.Adjust the penalty parameter c and radial Gaussian kernel function parameters to improve the classification performance. Experiments show that when penalty parameter c =1.4, when kernel function parameters g=0.08, classifiers highest 78.7234% accuracy rate.

Keyword:feature selection;text categorization;Support Vector Machine

目錄
第一章引言 1
1.1 研究背景及意義 1
1.2國內(nèi)外研究現(xiàn)狀 2
1.2.1 文本分類研究現(xiàn)狀 2
1.2.2 SVM研究現(xiàn)狀 5
1.3 論文內(nèi)容介紹 6
第二章文本分類 7
2.1 文本自動分類概述 7
2.2 文本分類關(guān)鍵技術(shù) 7
2.2.1 文本的表示 7
2.2.2 文本特征的提取 9
2.2.3 權(quán)重計(jì)算 11
2.2.4 常用的文本分類算法 12
2.3 文本分類的主要應(yīng)用 13
第三章支持向量機(jī)簡介 15
3.1 SVM產(chǎn)生與發(fā)展 15
3.2 支持向量機(jī)簡介 16
3.3 支持向量機(jī)分類 16
3.3.1 線性可分支持向量分類機(jī) 16
3.3.2 近似線性可分問題 18
3.3.3 線性不可分問題 18
3.4 常用的核函數(shù) 20
3.4.1 核函數(shù)及特征 20
3.4.2 核函數(shù)的判定和常用的核函數(shù) 20
3.4.3 常用的核函數(shù) 20
第四章基于支持向量機(jī)的文本分類算法在六級作文分類中的應(yīng)用 22
4.1 文本處理的一般流程： 22
4.2 特征提取流程 24
4.3 實(shí)驗(yàn)步驟 26
4.4特征提取流程圖 27
第五章實(shí)驗(yàn)結(jié)果 33
5.1 實(shí)驗(yàn)過程結(jié)果 33
5.2 SVM性能測試 36
第六章總結(jié) 39
6.1 本文總結(jié) 39
6.2 未來工作 39
致謝 41
參考文獻(xiàn) 42

国产精品婷婷久久久久久,国产精品美女久久久浪潮av,草草国产,人妻精品久久无码专区精东影业

基于支持向量機(jī)的文本分類算法研究.doc

內(nèi)容介紹

TA們正在看...

相關(guān)文檔

官方微信

支付寶紅包