基於 Lambda 架構於 Apache Mesos 之推薦系統實作

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 學位論文 > Item 987654321/114668

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/114668

Title:	基於 Lambda 架構於 Apache Mesos 之推薦系統實作
Other Titles:	Implementation of Lambda architecture recommender system on Apache Mesos
Authors:	李忠和;Lee, Chung-Ho
Contributors:	淡江大學資訊工程學系碩士班林其誼;Lin, Chi-Yi
Keywords:	Apache Mesos;Apache Spark;Lambda Architecture;Recommender System
Date:	2017
Issue Date:	2018-08-03 15:00:11 (UTC+8)
Abstract:	由於現今網路發展迅速，產生許多新型的雲端服務，卻也造成許多問題。在上網人口急速上升的同時，企業必須有能力處理大量的資料。企業為了要獲取更高的收益和留住使用者，開始注重分析資料以取得使用者的偏好，能夠推薦使用者有興趣的商品，如:著名音樂串流平台Spotify，透過分析使用者的聆聽紀錄，推薦其他歌曲與歌手，越精確的推薦將讓使用者花更多時間在 Spotify 聽音樂。隨著多樣化的數據與數據產生的速度不斷加快，之前的框架已經不能滿足需求，複雜事件處理CEP (Complex Event Processing)或即時處理，也因此被提出。在本研究將依照Lambda架構，在自行建構的叢集中，部署服務，Lambda架構中，資料處理主要分為兩層，批次處理層(Batch Layer)與即時處理層(Speed Layer)，批次處理層運行大量資料的運算，因此用於非即時性的應用，即時處理層運行小型的工作，單一資料量小，而且運算複雜度較低，能在短時間內算出結果，適合即時性的應用。為了部署Lambda架構，我們必須安裝許多服務，用來建構主要的兩個運算層，造成叢集的複雜性提高，也更加難以維護。因此我們使用Apache Mesos作為叢集管理核心，Mesos 擁有動態配置資源，簡單的增加運算節點，對於未來系統擴展有益處，此外在 Mesos 上運行 Lambda 架構不僅可以讓服務共享資源，而且幫助容錯機制，本系統所使用的框架 Apache Hadoop、Apache Spark、Apache Kafka，都是透過Mesos部署到各個節點，集中管理這些框架，大量簡化工作量與提高資源使用率。我們實際運行應用與部署服務，用來驗證系統的運作情況，也進行不同的實驗，在批次運算層部署推薦系統，並透過四種資源設定，來了解批次處理層運行的情況，即時運算層則進行即時的資料分析，運行即時的機器學習法，進行資料分類，並評估即時分析層的效能，經過這些實驗與服務建置，我們相信使用容器化技術，未來將會越來越熱門。 In nowadays, the rapid development of the Internet not only has fostered various kinds of innovative cloud services, but also created many new problems. For example, because the number of Internet users has largely increased, enterprises face the problems of how to handle so many users at the same time and how to manage the big data created by the users. To increase the business revenue and decrease the churn rate of the customers, many enterprises have begun paying attention to user behavior analytics. Specifically, enterprises can derive the preference of a specific customer from user behavior analytics, and then can recommend items to the customer who may be interested in those items. Spotify, the famous music streaming company, has the ability of analyzing the personal history of played music, and then recommend similar songs or singers to the customer. As long as the recommendation meets the customer’s expectation, the stickiness of the customer can be increased. However, the volume of data is increasing extremely fast and the data types are heterogeneous. Therefore, the technologies of complex event processing (CEP) and real-time processing were introduced to solve this problem because traditional computational frameworks cannot fit in this application domain. In this research, we will follow the Lambda architecture to design our data analytics system. In the Lambda Architecture, there are two different Layer in the system, one is Batch Layer, other one is Speed Layer. Batch Layer processed the large data and it usually used for the offline data processing. The Speed Layer processed the small computing job. The data size of the job is small. So, we can get the result in short time. Speed Layer usually used for the real-time application. For deploy the Lambda Architecture on our cluster, we need to install many services and frameworks. Because of many services installed on the server, it’s very difficult to manage and maintain the server. We use Mesos as the kernel of the cluster manager. Mesos can manage the resources of the cluster dynamically. It can allocate cluster resources dynamically and scale up or down easily. Deploying Lambda architecture on Mesos has the advantage of resource sharing and fault tolerance. Specifically, using Mesos we can deploy Apache Spark nodes and Apache Kafka nodes efficiently in the cluster. With the Mesos, we manage these services efficiently and easily. We deployed our applications and experiments to evaluate the system. We also did the different experiments on the system. The Batch Layer ran the Recommender System with the 4 different resources configuration. Like the different number of CPU and amount of Memory. This experiment help us understand the of the batch job in Mesos. The real-time job ran on the Speed Layer. The Speed Layer also perform the real-time machine learning algorithm. Like the real-time classification or prediction for evaluate the performance of the Speed Layer. We believed that the container technology will become more and more popular in recent years.
Appears in Collections:	[資訊工程學系暨研究所] 學位論文

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	305	View/Open

Loading...