定义了最小超球面密度的概念,提出了一种基于最小超球面密度的孤立点检测算法(minimum hyper sphere density,MHSD)。该算法根据数据的 k 近邻和反 k 近邻获得数据的有效近邻,并使用最小超球面密度和有效近邻计算每个数据的密度背离程度,进而计算每个数据的孤立程度,将孤立程度超过规定阈值的数据视为孤立点。实验数据为一个二维人工数据集和两个高维实际数据集,检测三个数据集的孤立点,对算法性能进行评估,并与经典的局部离群因子算法(local outlier factor,LOF)、离群影响因子算法(influenced outlierness,INFLO)和密度相似邻域离群因子算法(density similarity neighbor based outlier factor,DSNOF)进行比较。实验结果表明,基于最小超球面密度的孤立点检测算法可以准确检测出数据中的孤立点,且性能优于三种经典算法。
Minimum hyper sphere density (MHSD) is defined and an outlier detection algorithm based on MHSD is proposed. The effective neighbors are obtained according to k-nearest neighbors and reverse k-nearest neighbors. The density deviation degree of each datum is calculated using minimum hyper sphere density and effective neighbors. Then the isolation degrees can be calculated. Data are regarded as outliers when their isolation degrees are higher than the threshold. A two-dimensional artificial data set and two high-dimensional real data sets are used to evaluate the algorithm performance. The mining results are compared with those of three classical algorithms,which are local outlier factor (LOF),influenced outlierness (INFLO) and density similarity neighbor based outlier factor (DSNOF). The experiment shows that MHSD can find outliers accurately and its performance is better than the three classical algorithms.
Computer Technology and Development
minimum hyper sphere
local density difference