首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
正本文针对高维数据的相似性搜索问题中,最具代表性的算法是局部敏感哈希算法,提出尝试采用混合索引结构,在数据的搜索和处理过程对数据按出现次数进行分类,过滤无效数据,来提高搜索效率的观点。在计算机信息相似性搜索行业起到提高搜索效率,减少空间消耗的作用。  相似文献   

2.
高维数据的稀疏性问题是降低协同过滤技术推荐质量的主要原因之一。提出了基于径向基函数网络(RBFN)——项目聚类的算法来降低数据的稀疏性,应用径向基函数网络(RBFN)处理高维稀疏数据得到一个完整的矩阵,应用基于项目聚类的协同过滤推荐算法产生推荐。实验结果表明,本算法比其他算法能更好处理协同过滤中的稀疏性问题。  相似文献   

3.
基于词片网格的语音文档主题分类,为了从网格(lattice)多候选中得到语音文档主题分类召回率更高,提出了在lattice音节网格上直接提取词片,并且在处理语音词片的同时,将非负矩阵分解方法引入语音文档主题的分类。该方法避免了语音识别率低所导致的语音文档主题分类准确性的降低。实验结果表明:当N-best的召回率为91.66%时,基于混淆网络的关键词检出系统的召回率为96.67%,当SVD的F1值为83.38%,NMF的F1值为96.944%。  相似文献   

4.
陈晶 《大众科技》2010,(1):55-56
基于词片网格的语音文档主题分类,为了从网格(lattice)多候选中得到语音文档主题分类召回率更高,提出了在lattice音节网格上直接提取词片,并且在处理语音词片的同时,将非负矩阵分解方法引入语音文档主题的分类。该方法避免了语音识别率低所导致的语音文档主题分类准确性的降低。实验结果表明:当N-best的召回率为91.66%时,基于混淆网络的关键词检出系统的召回率为96.67%,当SVD的F1值为83.38%,NMF的F1值为96.944%。  相似文献   

5.
李聪  梁昌勇  董珂 《情报杂志》2008,27(3):85-87
基于项目的协同过滤推荐算法离线生成项目相似性,但是高维、稀疏的用户-项目评分矩阵对服务器存储空间要求较高,同时还存在空间浪费.针对上述问题,将十字链表存储技术引入到协同过滤中,对用户-项目评分矩阵进行压缩存储,从而有效减少了物理空间占用,并用C Builder 6.0实现了十字链表存储程序.  相似文献   

6.
为了提高并行应用系统的效率,研究了针对大型稀疏矩阵的压缩通信问题。通过对矩阵压缩通信过程中矩阵稀疏度、网络带宽、处理器计算能力之间的关系进行定量分析,推导出稀疏度下界计算公式。通过对不同稀疏度情况下算法所取得的效率进行分析,总结出压缩通信中稀疏度与通信效率之间的函数关系。结果表明本算法在稀疏矩阵通信方面效率有明显的提高。  相似文献   

7.
基于客户知识的客户CABOSFV聚类   总被引:1,自引:0,他引:1  
客户知识的多元性与多源性决定了客户知识的高维特性,而个性化与离散化又决定其稀疏特性.针对客户知识的高维稀疏特性,借助稀疏特征向量及其可加性原理,提出基于客户知识的客户CABOSFV聚类算法,并利用其进行实例分析,检验其相对于传统聚类算法的优越性.  相似文献   

8.
提出了一种基于位置敏感哈希算法的海量文本数据查询算法,通过位置敏感哈希算法将文本数据的特征向量映射到哈希桶中,从而有效地降低了计算复杂度并提高了数据检索的效率。首先,利用TF-IDF特征表示文本的特征向量,并根据给定的哈希函数集把文本的特征向量映射至哈希桶;接下来,利用哈希表为给定的文本计算出与之对应的直方图,通过直方图距离计算文本的相似度;最后,通过计算目标文本集中的文本与待查询文本的相似度进行文本排序,排序分值高的文本作为相关文本返回给用户。实验结果表明,对比已有方法文本提出的算法在MAP以及查全率-查准率曲线两个测度上都获得了较好的性能。  相似文献   

9.
武同宝  袁海燕  黄尊志  陈志伟 《科技通报》2019,35(7):143-146,151
针对传统特征映射方法存在映射时间长、高维数据转换率低等问题,提出基于最小熵的高维电力数据可视化特征映射方法。对高维电力数据进行空间模拟,从数据预处理、转换、离散化分析和特征分类方面入手,完成对高维电力数据可视化特征分类。建立电力数据类的散布矩阵,根据矩阵计算高维电力数据的特征相对值和判别值,完成数据特征提取。基于上述特征分类和特征提取结果,利用熵对高维电力数据各类的可分性进行描述,选取出熵最小的数据特征,定义数据的熵并将熵当作数据类别的可分性判据,利用电力数据的总体熵实现高维数据到低维数据的映射。实验结果表明,所提方法的特征数据分类准确度较高,且平均高维数据转换率为78%左右,映射耗时短,远远优于传统方法,验证了所提方法的优越性能。  相似文献   

10.
提出改进的并行化谱聚类算法。该算法对于距离矩阵与相似度矩阵进行了改进,并在其中加入了kd树技术以对大规模数据进行稀疏化处理;然后在进行数据特征计算时,将数据以拉普拉斯矩阵的方式存入Hadoop之中,通过运行Lanczos分布计算的形式得到了其向量特征;最后运用在聚类算法中的较为高效的k-means聚类算法对向量特征的转置矩阵进行处理从而得到了需要的聚类结果。仿真实验结果表明,本文所提出的谱聚类并行算法能够为大规模的数据挖掘工作带来性能的巨大提升。  相似文献   

11.
Similarity search with hashing has become one of the fundamental research topics in computer vision and multimedia. The current researches on semantic-preserving hashing mainly focus on exploring the semantic similarities between pointwise or pairwise samples in the visual space to generate discriminative hash codes. However, such learning schemes fail to explore the intrinsic latent features embedded in the high-dimensional feature space and they are difficult to capture the underlying topological structure of data, yielding low-quality hash codes for image retrieval. In this paper, we propose an ordinal-preserving latent graph hashing (OLGH) method, which derives the objective hash codes from the latent space and preserves the high-order locally topological structure of data into the learned hash codes. Specifically, we conceive a triplet constrained topology-preserving loss to uncover the ordinal-inferred local features in binary representation learning. By virtue of this, the learning system can implicitly capture the high-order similarities among samples during the feature learning process. Moreover, the well-designed latent subspace learning is built to acquire the noise-free latent features based on the sparse constrained supervised learning. As such, the latent under-explored characteristics of data are fully employed in subspace construction. Furthermore, the latent ordinal graph hashing is formulated by jointly exploiting latent space construction and ordinal graph learning. An efficient optimization algorithm is developed to solve the resulting problem to achieve the optimal solution. Extensive experiments conducted on diverse datasets show the effectiveness and superiority of the proposed method when compared to some advanced learning to hash algorithms for fast image retrieval. The source codes of this paper are available at https://github.com/DarrenZZhang/OLGH .  相似文献   

12.
A procedure for approximating fractional-order systems by means of integer-order state-space models is presented. It is based on the rational approximation of fractional-order operators suggested by Oustaloup. First, a matrix differential equation is obtained from the original fractional-order representation. Then, this equation is realized in a state-space form that has a sparse block-companion structure. The dimension of the resulting integer-order model can be reduced using an efficient algorithm for rational L2 approximation. Two numerical examples are worked out to show the performance of the suggested technique.  相似文献   

13.
针对传统CF算法中稀疏评分数据及其产生的用户间相似性不准确问题,提出以用户行为对应一定分值代替空缺评分的方法来修正用户I-U评分矩阵,并以角色下的权重系数K约束最近邻的计算。实验表明,改进的算法具有更优的推荐质量。  相似文献   

14.
Searching hierarchically clustered document collections can be effective[6], but creating the cluster hierarchies is expensive, since there are both many documents and many terms. However, the information in the document-term matrix is sparse: Documents are usually indexed by relatively few terms. This paper describes the implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so that collections much larger than the algorithms' worst case running times would suggest can be clustered. The implementations described in the paper have been used to cluster a collection of 12,000 documents.  相似文献   

15.
In this paper an algorithm is presented for listing all output sets for a large sparse square matrix A arising in large scale systems applications using network theory and the degree switching operations. The algorithm exploits the zero nonzero structure of matrix A and uses optimum data structures and data manipulation methods. The method is shown to be useful in finding all optimum assignments in an n x n optimum assignment problem and generation of all digraphs that can be associated with an n x nsparse matrix. The problem of testing whether there exists a set of vertex disjoint cycles of specified lengths in a network is shown to be NP-complete.  相似文献   

16.
In recent years, sparse subspace clustering (SSC) has been witnessed to its advantages in subspace clustering field. Generally, the SSC first learns the representation matrix of data by self-expressive, and then constructs affinity matrix based on the obtained sparse representation. Finally, the clustering result is achieved by applying spectral clustering to the affinity matrix. As described above, the existing SSC algorithms often learn the sparse representation and affinity matrix in a separate way. As a result, it may not lead to the optimum clustering result because of the independence process. To this end, we proposed a novel clustering algorithm via learning representation and affinity matrix conjointly. By the proposed method, we can learn sparse representation and affinity matrix in a unified framework, where the procedure is conducted by using the graph regularizer derived from the affinity matrix. Experimental results show the proposed method achieves better clustering results compared to other subspace clustering approaches.  相似文献   

17.
In this work, we investigate compressed sensing (CS) techniques based on the exploitation of prior knowledge to support telemedicine. In particular, prior knowledge is obtained by computing the probability of appearance of non-zero elements in each row of a sparse matrix, which is then employed in sensing matrix design and recovery algorithms for CS systems. A robust sensing matrix is designed by jointly reducing the average mutual coherence and the projection of the sparse representation error. A Probability-Driven Normalized Iterative Hard Thresholding (PD-NIHT) algorithm is developed as the recovery method, which also exploits the prior knowledge of the probability of appearance of non-zero elements and can bring performance benefits. Simulations for synthetic data and different organs of endoscopy image are carried out, where the proposed sensing matrix and PD-NIHT algorithm achieve a better performance than previously reported algorithms.  相似文献   

18.
The paper is concerned with similarity search at large scale, which efficiently and effectively finds similar data points for a query data point. An efficient way to accelerate similarity search is to learn hash functions. The existing approaches for learning hash functions aim to obtain low values of Hamming distances for the similar pairs. However, these methods ignore the ranking order of these Hamming distances. This leads to the poor accuracy about finding similar items for a query data point. In this paper, an algorithm is proposed, referred to top k RHS (Rank Hash Similarity), in which a ranking loss function is designed for learning a hash function. The hash function is hypothesized to be made up of l binary classifiers. The issue of learning a hash function can be formulated as a task of learning l binary classifiers. The algorithm runs l rounds and learns a binary classifier at each round. Compared with the existing approaches, the proposed method has the same order of computational complexity. Nevertheless, experiment results on three text datasets show that the proposed method obtains higher accuracy than the baselines.  相似文献   

19.
针对层次分析法(AHP)和变异系数法(CV)的单一赋权方法在高新技术企业竞争力评估中的不足,提出一套综合权重赋值方法称为AHP-CV。首先从广东省科技大数据平台中获取高新技术企业的登记数据集,根据预设维度分类规则分类至预设企业维度,然后基于AHP-CV设计指标权重算法模型计算各企业维度对应的评分值,最后从3个企业维度对不同领域高新技术企业进行归类分析,根据归类结果对高新技术企业的发展状况进行评估,设计能够实时监测高新技术企业综合竞争力的监控系统。实证分析表明此监测算法模型能够很好地对高新技术企业进行监测。基于该监测算法的监控系统已经成功运用于广东省科技大数据平台,将实时对广东省内高新技术企业的综合竞争力进行监测,为各级有关监管部门决策提供参考。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号