Exploratory data analysis (EDA) serves as a preliminary yet essential tool for summarizing the main characteristics of a data set before appropriate statistical modeling can be applied. Quite often, EDA employs the traditional graphical techniques such as the boxplot, histogram and scatterplot and are equipped with various dimension reduction methods and computer-aided interactive functionalities. EDA has been used to explore different data types. Examples were the cases of the survival data, the time series data, the functional data and the longitudinal data. Conventionally, these data set were tabulated by a table with p columns corresponding to p variables. Each subject is measured by a single numerical value for each variable. Nowadays the collected data keeps getting much bigger and more complex. The description of data was no longer stored by a form of a single value but the intervals, histograms and/or distributions. These are examples of the so-called symbolic data. This study intends to develop EDA with more visual methods for symbolic data. Two dimension reduction methods, the principal component analysis (PCA) and the sliced inverse regression (SIR), are also extended and used to reveal the insight structure of symbolic objects embedded in the high-dimensional space. On the other hand, the statisticians are facing the challenges of analyzing the big data that are gathered rapidly from diverse resources with complex types. SDA supplies various data descriptions and has great capacity for big data. As a consequence, exploratory symbolic data analysis (ESDA) as a tool that supports the efficient, effective and practical exploration of symbolic data sets is needed.
關聯:
9th Conference of the Asian Regional Section of the IASC (IASC-ARS 2015)