不平衡資料集應用於問答系統答案驗證之研究

淡江大學機構典藏 > 商管學院 > 資訊管理學系暨研究所 > 學位論文 > Item 987654321/111150

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/111150

題名:	不平衡資料集應用於問答系統答案驗證之研究
其他題名:	A study on imbalanced dataset of answer validation for question answering system
作者:	蔡承家;Tsai, Cheng-Chia
貢獻者:	淡江大學資訊管理學系碩士班戴敏育
關鍵詞:	機器學習;不平衡資料集;問答系統;支持向量機器;答案驗證;大學考試;QA-Lab;Machine learning;Imbalanced Dataset;Question Answering;Support Vector Machine;Answer Validation;university entrance examination
日期:	2016
上傳時間:	2017-08-24 23:45:15 (UTC+8)
摘要:	問答系統(Question answering)主要是在解決給定一道問題，透過機器閱讀(Machine Reading)的方式讓系統能夠理解這一道題目後進行回答。問答系統通常包含了問題分析(Question Analysis)、文件檢索(Document Retrieval)、答案抽取(Answer Extraction)、答案驗證(Answer Validation)。　　在過去文獻中有相當多的問答系統相關研究，但是並未對問答系統中答案驗證不平衡資料集與平衡資料集進行深入探討。本研究目的會透過機器學習完整分析不平衡資料集與平衡資料集。　　本研究使用 NTCIR-12 QA-Lab2 日本大學入學考試世界歷史資料集，此資料集與以往問答系統比較不同的地方在於是系統必須先理解一篇短文之後，才能夠回應接下來相關的問題。　　本研究針對不平衡資料集與平衡資料集提出了許多的模型，藉由最佳化參數與交叉驗證後，實驗結果顯示在不平衡資料集中，最佳模型的正確率達到了 90%。本論文主要貢獻為提出了一套問答系統，並且在答案驗證階段透過不平衡資料集與平衡資料集證實，不平衡資料集所建構出來之模型顯著性較高。 Question Answering is a system that can process and answer a given question. Question Answering system usually consists of four stages: Question Analysis, Document Retrieval, Answer Extraction and Answer Validation. Although a considerable number of studies have been made on Question Answering system, little is known about the power of Imbalanced datasets and balanced datasets for Answer Validation from Question Answering. The purpose of this paper is to provide a comprehensive analysis of Imbalanced datasets and balanced datasets through machine learning. In this paper, we used datasets from NTCIR-12 QA-Lab2 Japanese university entrance exams on the subject of "World History". The difference between this datasets and previous ones lies in the different processing method that the system needed to understand a context provided by the present research’s datasets and answered the following related questions. The study presented many Imbalanced datasets and Balanced datasets models by using f.select and Cross Validation. The results show the best performance of our system achieved an accuracy of 90% in the Imbalanced datasets model. The main contribution of this study was in proposing a question answering system for Japanese university entrance exams and providing evidence that the Imbalanced datasets model outperformed the balanced datasets model for Answer Validation.
顯示於類別:	[資訊管理學系暨研究所] 學位論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	296	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....