首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Collaborative frequent itemset mining involves analyzing the data shared from multiple business entities to find interesting patterns from it. However, this comes at the cost of high privacy risk. Because some of these patterns may contain business-sensitive information and hence are denoted as sensitive patterns. The revelation of such patterns can disclose confidential information. Privacy-preserving data mining (PPDM) includes various sensitive pattern hiding (SPH) techniques, which ensures that sensitive patterns do not get revealed when data mining models are applied on shared datasets. In the process of hiding sensitive patterns, some of the non-sensitive patterns also become infrequent. SPH techniques thus affect the results of data mining models. Maintaining a balance between data privacy and data utility is an NP-hard problem because it requires the selection of sensitive items for deletion and also the selection of transactions containing these items such that side effects of deletion are minimal. There are various algorithms proposed by researchers that use evolutionary approaches such as genetic algorithm(GA), particle swarm optimization (PSO) and ant colony optimization (ACO). These evolutionary SPH algorithms mask sensitive patterns through the deletion of sensitive transactions. Failure in the sensitive patterns masking and loss of data have been the biggest challenges for such algorithms. The performance of evolutionary algorithms further gets degraded when applied on dense datasets. In this research paper, victim item deletion based PSO inspired evolutionary algorithm named VIDPSO is proposed to sanitize the dense datasets. In the proposed algorithm, each particle of the population consists of n number of sub-particles derived from pre-calculated victim items. The proposed algorithm has a high exploration capability to search the solution space for selecting optimal transactions. Experiments conducted on real and synthetic dense datasets depict that VIDPSO algorithm performs better vis-a-vis GA, PSO and ACO based SPH algorithms in terms of hiding failure with minimal loss of data.  相似文献   

2.
关联规则挖掘算法是数据挖掘领域的主要研究方向之一。对几种经典的关联规则挖掘算法进行了分析、探讨和比较,给出了一种基于支持矩阵的、不需要产生候选项目集的算法设计思想。算法为事务数据库中的每个项目设置二进制向量,利用逻辑与运算构造支持矩阵来挖掘频繁项目集,极大地节省了存储空间,提高了算法运行效率。  相似文献   

3.
赵伟 《科技广场》2005,(10):8-12
关联规则挖掘算法为了发现事先未知的关联规则,需要用高效的方法计算出数据库中的大项目集。影响数据挖掘效率的两个因素,一个是数据库大小,另一个是算法的效率。本文算法通过将数据库进行高度压缩,使数据库中的数据量大大减少,同时算法采用逻辑运算方法计算项集的支持数,计算效率较高。  相似文献   

4.
An hybrid uninterrupted multi-speed transmission (HUMST), based on the integration of a planetary gear set and a 3-speed automatic manual transmission (3-AMT), is developed to satisfy the specific performance indexes of mining trucks. The power-split device can alleviate and eliminate the inherent torque interruption of the 3-AMT during gear shift by implementing the designed cooperative shift control strategy which is optimized by quadratic performance index. In order to achieve fast torque coordination while guaranteeing the driving comfort performance, the torque profiles of the power split device and the traction motor are optimized by Linear-quadratic regulator (LQR) algorithm. Dynamic programming (DP) is implemented as a benchmark to demonstrate the maximum fuel efficiency of the proposed HUMST. Because of the high computational cost of optimal control strategies such as DP, an improved real-time control strategy (IRTCS) using modified Gaussian distribution function is proposed to significantly reduce the computing load. As efficiency-oriented energy control strategy would result in frequent gear shifts, to achieve a desirable tradeoff between the overall efficiency and the shift stability, multi-objective genetic algorithm (MGA) is integrated to optimize the overall performance. The detail mathematical and dynamic model shows that the proposed shifting strategy with LQR can effectively suppress shift jerk, and the proposed IRTCS with MGA can reduce shift frequency by 70.78% to improve the drivability, only sacrificing 4.86% of overall efficiency compared to that of DP.  相似文献   

5.
Machine understanding and thinking require prior knowledge consisting of explicit and implicit knowledge. The current knowledge base contains various explicit knowledge but not implicit knowledge. As part of implicit knowledge, the typical characteristics of the things referred to by the concept are available by concept cognition for knowledge graphs. Therefore, this paper attempts to realize concept cognition for knowledge graphs from the perspective of mining multigranularity decision rules. Specifically, (1) we propose a novel multigranularity three-way decision model that merges the ideas of multigranularity (i.e., from coarse granularity to fine granularity) and three-way decision (i.e., acceptance, rejection, and deferred decision). (2) Based on the multigranularity three-way decision model, an algorithm for mining multigranularity decision rules is proposed. (3) The monotonicity of positive or negative granule space ensured that the positive (or negative) granule space from coarser granularity does not need to participate in the three-classification process at a finer granularity, which accelerates the process of mining multigranularity decision rules. Moreover, the experimental results show that the multigranularity decision rule is better than the two-way decision rule, frequent decision rule and single granularity decision rule, and the monotonicity of positive or negative granule space can accelerate the process of mining multigranularity decision rules.  相似文献   

6.
挖掘最大频繁项目集是数据挖掘领域的一个重要的研究内容。Apriori算法作为一种挖掘频繁项目集的基本算法,其缺点是产生大量的候选项目集,算法的代价很大。本文在基于FP-Tree的基础上提出了挖掘最大频繁项目集的新算法FP-GDMA。该算法采用自顶向下和自底向上相结合的搜索策略有效减少了生产候选项目集的数目,有效提高了挖掘最大频繁项目集的效率。并通过实验比较FP-GDMA与DMFIA算法。  相似文献   

7.
As text documents are explosively increasing in the Internet, the process of hierarchical document clustering has been proven to be useful for grouping similar documents for versatile applications. However, most document clustering methods still suffer from challenges in dealing with the problems of high dimensionality, scalability, accuracy, and meaningful cluster labels. In this paper, we will present an effective Fuzzy Frequent Itemset-Based Hierarchical Clustering (F2IHC) approach, which uses fuzzy association rule mining algorithm to improve the clustering accuracy of Frequent Itemset-Based Hierarchical Clustering (FIHC) method. In our approach, the key terms will be extracted from the document set, and each document is pre-processed into the designated representation for the following mining process. Then, a fuzzy association rule mining algorithm for text is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, these documents will be clustered into a hierarchical cluster tree by referring to these candidate clusters. We have conducted experiments to evaluate the performance based on Classic4, Hitech, Re0, Reuters, and Wap datasets. The experimental results show that our approach not only absolutely retains the merits of FIHC, but also improves the accuracy quality of FIHC.  相似文献   

8.
Sequential minimal optimization (SMO) is quite an efficient algorithm for training the support vector machine. The most important step of this algorithm is the selection of the working set, which greatly affects the training speed. The feasible direction strategy for the working set selection can decrease the objective function, however, may augment to the total calculation for selecting the working set in each of the iteration. In this paper, a new candidate working set (CWS) Strategy is presented considering the cost on the working set selection and cache performance. This new strategy can select several greatest violating samples from Cache as the iterative working sets for the next several optimizing steps, which can improve the efficiency of the kernel cache usage and reduce the computational cost related to the working set selection. The results of the theory analysis and experiments demonstrate that the proposed method can reduce the training time, especially on the large-scale datasets.  相似文献   

9.
We compare support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. It is assumed a preliminary search finds a set of documents that the user marks as relevant or not and then feedback iterations commence. Particular attention is paid to IR searches where the number of relevant documents in the database is low and the preliminary set of documents used to start the search has few relevant documents. Experiments show that if inverse document frequency (IDF) weighting is not used because one is unwilling to pay the time penalty needed to obtain these features, then SVMs are better whether using term-frequency (TF) or binary weighting. SVM performance is marginally better than Ide dec-hi if TF-IDF weighting is used and there is a reasonable number of relevant documents found in the preliminary search. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred.  相似文献   

10.
Constraints are very common for practical control systems. For logical systems, the existing technique of pre-feedback is an effective way of treating state-dependent constraints in control when the state is measurable. However, it is inapplicable for the case when measurement information is not available. In this situation, in order for the control input to not violate the state-dependent constraint, the control at each step must be selected from the common admissible controls of all possible states. Motivated by this observation, in this study, we propose a novel technique, termed the subset transition method, for finite-time controllability and stabilization of probabilistic logical dynamic control system (PLDCS) with a state-dependent control constraint. The main idea of this method is to construct an unconstrained deterministic logical control system over the power set of the state space, called the subset transition system (SubSTS), characterizing the transitional dynamics between subsets under common admissible controls. We prove that a control sequence is admissible with respect to all states in an initial subset if and only if it does not steer the SubSTS from the initial subset to the empty set. Based on this, necessary and sufficient conditions for set controllability and set stabilizability are obtained. Examples are presented to demonstrate the application of the obtained results.  相似文献   

11.
杜晓昕  王波  孙明  王淼 《科技通报》2012,28(5):94-98
矿区GIS中尺度较大的地物即"大型结点",如果不加处理地插入到CP树中,结点之间的重叠区域大大增加,导致查询效率降低。为此提出一种基于凸多边形最优三角剖分矿区GIS-CP索引树"大型结点"裁剪算法,算法保证裁剪后结点具有较好的几何形态以减少插入产生的重叠。实验分析表明,对"大型结点"通过裁剪预处理再插入要比不进行裁剪预处理,检索效率高很多。  相似文献   

12.
一种基于期权定价理论的矿山资产评估简化模型   总被引:8,自引:0,他引:8  
在对传统的资产评估方法在处理风险问题上的缺陷进行简要地分析之后,文章介绍了期权定价模型及其在自然资源评估应用的概况。在此基础上,笔者提出了一个基于期权定价理论的矿山资产评估简化模型。此模型既考虑了风险条件下经济主体的主动性,同时又在计算方面比现有的模型大矿产资源;资产评估;期权定  相似文献   

13.
Search patterns of documents and information requests are their better or worse representatives only, so it is important to carry on examinations on possibilities of designing self-learning information retrieval systems. Another important question is to elaborate such an organization of document search pattern set as to obtain an acceptable response time of the information system to a given information request.A self-learning process of the proposed information system consists in the determination—on a set of document and information request search patterns—of the similarity relation according to L. A. Zadeh.The organization of a set of document search patterns proposed in the paper ensures the limitation of document search pattern set searching process—when retrieving a response to a given information request—to one (or several) subset from previously determined subsets. This makes the information system response time acceptable. The proposed information retrieval strategy is discussed in terms of fuzzy sets.  相似文献   

14.
企业能力差异与合作创新动机   总被引:9,自引:0,他引:9  
罗炜  唐元虎 《预测》2001,20(3):20-23
企业参与合作创新主要有两个方面的动机,成本共享和技术共享,合作伙伴间能力的差异对合作动机重要,本文通过一个有知识溢出的两个阶段双寡头博弈模型,比较了自主创新、成本共享、技术共享几种创新方式的均衡结果,得出结论:当企业具有同质的资源和能力时,成本共享是企业合作创新的主要动机,当企业的能力互补时,技术共享是企业合作创新的主要动机。  相似文献   

15.
Information filtering (IF) systems usually filter data items by correlating a set of terms representing the user’s interest (a user profile) with similar sets of terms representing the data items. Many techniques can be employed for constructing user profiles automatically, but they usually yield large sets of term. Various dimensionality-reduction techniques can be applied in order to reduce the number of terms in a user profile. We describe a new terms selection technique including a dimensionality-reduction mechanism which is based on the analysis of a trained artificial neural network (ANN) model. Its novel feature is the identification of an optimal set of terms that can classify correctly data items that are relevant to a user. The proposed technique was compared with the classical Rocchio algorithm. We found that when using all the distinct terms in the training set to train an ANN, the Rocchio algorithm outperforms the ANN based filtering system, but after applying the new dimensionality-reduction technique, leaving only an optimal set of terms, the improved ANN technique outperformed both the original ANN and the Rocchio algorithm.  相似文献   

16.
Egghe’s three papers regarding the universal IR surface (2004, 2007, 2008) clearly represent an original and significant contribution to the IR evaluation literature. However, Egghe’s attempt to find a complete set of universal IR evaluation points (P,R,F,M) fell short of his goal: his universal IR surface equation did not suffice in and of itself, and his continuous extension argument was insufficient to find all the remaining points (quadruples). Egghe found only two extra universal IR evaluation points, (1,1,0,0) and (0,0,1,1), but it turns out that a total of 15 additional, valid, universal IR evaluation points exist. The gap first appeared in Egghe’s earliest paper and was carried into subsequent papers. The mathematical method used here for finding the additional universal IR evaluation points involves defining the relevance metrics P,R,F,M in terms of the Swets variables a,b,c,d. Then the maximum possible number of additional quadruples is deduced, and finally, all the invalid quadruples are eliminated so that only the valid, universal IR points remain. Six of these points may be interpreted as being continuous extensions of the universal IR surface, while the other nine points may be interpreted as being “off the universal IR surface.” This completely solves the problem of finding the maximum range possible of universal IR evaluation points.  相似文献   

17.
石油是不可再生资源,目前世界主要产油国大多已经进入石油产量递减的时期,随着石油资源的不断减少以及开采难度逐渐增大,老油田的生产成本不断攀升。虽然石油上游的开采成本总体趋势是上升的,但随着新油田和新技术的出现间或地降低了阶段性的开采成本,所以油气的上游成本并不是直线上升,而是呈阶段性的波动状态。所以可以通过研究其不同阶段的成本变动趋势来判断成本的未来走势,归纳出各阶段的成本特性及影响成本变动的因素,从而在宏观和微观上找到可以控制成本上升的措施,来提高石油企业的经济效益。这对我国的石油战略成本管理和石油的价格体系构建都有着至关重要的作用。  相似文献   

18.
考察技术创新与权益资本成本的关系,实证研究发现,企业的权益资本成本随时间的推移在企业各生命周期有不同的变化趋势:在成长期呈下降趋势,在成熟期和衰退期呈上升趋势;技术创新投入初期,信息不对称的降低导致权益资本成本降低,而后期其所带来的产出则会导致权益资本成本上升,即技术创新与权益资本成本之间呈现U型关系;企业在各生命周期阶段具有不同的财务与组织特征,技术创新对权益资本成本在成长期和成熟期具有负向的影响,在衰退期具有正向的影响。  相似文献   

19.
For historical and cultural reasons, English phases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.  相似文献   

20.
Latent semantic indexing (LSI) has been demonstrated to outperform lexical matching in information retrieval. However, the enormous cost associated with the singular value decomposition (SVD) of the large term-by-document matrix becomes a barrier for its application to scalable information retrieval. This work shows that information filtering using level search techniques can reduce the SVD computation cost for LSI. For each query, level search extracts a much smaller subset of the original term-by-document matrix, containing on average 27% of the original non-zero entries. When LSI is applied to such subsets, the average precision can degrade by as much as 23% due to level search filtering. However, for some document collections an increase in precision has also been observed. Further enhancement of level search can be based on a pruning scheme which deletes terms connected to only one document from the query-specific submatrix. Such pruning has achieved a 65% reduction (on average) in the number of non-zeros with a precision loss of 5% for most collections.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号