首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In recent years, sparse subspace clustering (SSC) has been witnessed to its advantages in subspace clustering field. Generally, the SSC first learns the representation matrix of data by self-expressive, and then constructs affinity matrix based on the obtained sparse representation. Finally, the clustering result is achieved by applying spectral clustering to the affinity matrix. As described above, the existing SSC algorithms often learn the sparse representation and affinity matrix in a separate way. As a result, it may not lead to the optimum clustering result because of the independence process. To this end, we proposed a novel clustering algorithm via learning representation and affinity matrix conjointly. By the proposed method, we can learn sparse representation and affinity matrix in a unified framework, where the procedure is conducted by using the graph regularizer derived from the affinity matrix. Experimental results show the proposed method achieves better clustering results compared to other subspace clustering approaches.  相似文献   

2.
Multiple-prespecified-dictionary sparse representation (MSR) has shown powerful potential in compressive sensing (CS) image reconstruction, which can exploit more sparse structure and prior knowledge of images for minimization. Due to the popular L1 regularization can only achieve the suboptimal solution of L0 regularization, using the nonconvex regularization can often obtain better results in CS reconstruction. This paper proposes a nonconvex adaptive weighted Lp regularization CS framework via MSR strategy. We first proposed a nonconvex MSR based Lp regularization model, then we propose two algorithms for minimizing the resulting nonconvex Lp optimization problem. According to the fact that the sparsity levels of each regularizers are varying with these prespecified-dictionaries, an adaptive scheme is proposed to weight each regularizer for optimization by exploiting the difference of sparsity levels as prior knowledge. Simulated results show that the proposed nonconvex framework can make a significant improvement in CS reconstruction than convex L1 regularization, and the proposed MSR strategy can also outperforms the traditional nonconvex Lp regularization methodology.  相似文献   

3.
Social applications foster the involvement of end users in Web content creation, as a result of which a new source of vast amounts of data about users and their likes and dislikes has become available. Having access to users’ contributions to social sites and gaining insights into the consumers’ needs is of the utmost importance for marketing decision making in general, and to advertisement recommendation in particular. By analyzing this information, advertisement recommendation systems can attain a better understanding of the users’ interests and preferences, thus allowing these solutions to provide more precise ad suggestions. However, in addition to the already complex challenges that hamper the performance of recommender systems (i.e., data sparsity, cold-start, diversity, accuracy and scalability), new issues that should be considered have also emerged from the need to deal with heterogeneous data gathered from disparate sources. The technologies surrounding Linked Data and the Semantic Web have proved effective for knowledge management and data integration. In this work, an ontology-based advertisement recommendation system that leverages the data produced by users in social networking sites is proposed, and this approach is substantiated by a shared ontology model with which to represent both users’ profiles and the content of advertisements. Both users and advertisement are represented by means of vectors generated using natural language processing techniques, which collect ontological entities from textual content. The ad recommender framework has been extensively validated in a simulated environment, obtaining an aggregated f-measure of 79.2% and a Mean Average Precision at 3 (MAP@3) of 85.6%.  相似文献   

4.
Modeling user profiles is a necessary step for most information filtering systems – such as recommender systems – to provide personalized recommendations. However, most of them work with users or items as vectors, by applying different types of mathematical operations between them and neglecting sequential or content-based information. Hence, in this paper we study how to propose an adaptive mechanism to obtain user sequences using different sources of information, allowing the generation of hybrid recommendations as a seamless, transparent technique from the system viewpoint. As a proof of concept, we develop the Longest Common Subsequence (LCS) algorithm as a similarity metric to compare the user sequences, where, in the process of adapting this algorithm to recommendation, we include different parameters to control the efficiency by reducing the information used in the algorithm (preference filter), to decide when a neighbor is considered useful enough to be included in the process (confidence filter), to identify whether two interactions are equivalent (δ-matching threshold), and to normalize the length of the LCS in a bounded interval (normalization functions). These parameters can be extended to work with any type of sequential algorithm.We evaluate our approach with several state-of-the-art recommendation algorithms using different evaluation metrics measuring the accuracy, diversity, and novelty of the recommendations, and analyze the impact of the proposed parameters. We have found that our approach offers a competitive performance, outperforming content, collaborative, and hybrid baselines, and producing positive results when either content- or rating-based information is exploited.  相似文献   

5.
This paper describes a formalism to construct some kinds of algorithms useful to represent one structure about a set of data. It proves that if we do not take into account cost considerations of one algorithm, one can partialy replace the memory by an algorithm. It also proves that the remaining memory part is independant of the construction process. It then evaluate the affects of algorithms representation cost and gives the resulting memory gain obtained in two particular examples.  相似文献   

6.
针对后非线性盲源分离中非线性参数估计中存在的问题,提出一种基于改进的自适应遗传算法的后非线性盲源分离方法.该方法给出一种新的适应度函数,利用适应度函数值反馈调节交叉概率和变异概率的选取,并将优先进化策略和模拟退火机制引入遗传算法中,再通过线性分离算法得到分离矩阵.仿真验证表明,该方法较传统方法具有更快的收敛速度和较高的分离精度.  相似文献   

7.
Recommender Systems (RSs) aim to model and predict the user preference while interacting with items, such as Points of Interest (POIs). These systems face several challenges, such as data sparsity, limiting their effectiveness. In this paper, we address this problem by incorporating social, geographical, and temporal information into the Matrix Factorization (MF) technique. To this end, we model social influence based on two factors: similarities between users in terms of common check-ins and the friendships between them. We introduce two levels of friendship based on explicit friendship networks and high check-in overlap between users. We base our friendship algorithm on users’ geographical activity centers. The results show that our proposed model outperforms the state-of-the-art on two real-world datasets. More specifically, our ablation study shows that the social model improves the performance of our proposed POI recommendation system by 31% and 14% on the Gowalla and Yelp datasets in terms of Precision@10, respectively.  相似文献   

8.
With the increasing popularity and social influence of search engines in IR, various studies have raised concerns on the presence of bias in search engines and the social responsibilities of IR systems. As an essential component of search engine, ranking is a crucial mechanism in presenting the search results or recommending items in a fair fashion. In this article, we focus on the top-k diversity fairness ranking in terms of statistical parity fairness and disparate impact fairness. The former fairness definition provides a balanced overview of search results where the number of documents from different groups are equal; The latter enables a realistic overview where the proportion of documents from different groups reflect the overall proportion. Using 100 queries and top 100 results per query from Google as the data, we first demonstrate how topical diversity bias is present in the top web search results. Then, with our proposed entropy-based metrics for measuring the degree of bias, we reveal that the top search results are unbalanced and disproportionate to their overall diversity distribution. We explore several fairness ranking strategies to investigate the relationship between fairness, diversity, novelty and relevance. Our experimental results show that using a variant of fair ε-greedy strategy, we could bring more fairness and enhance diversity in search results without a cost of relevance. In fact, we can improve the relevance and diversity by introducing the diversity fairness. Additional experiments with TREC datasets containing 50 queries demonstrate the robustness of our proposed strategies and our findings on the impact of fairness. We present a series of correlation analysis on the amount of fairness and diversity, showing that statistical parity fairness highly correlates with diversity while disparate impact fairness does not. This provides clear and tangible implications for future works where one would want to balance fairness, diversity and relevance in search results.  相似文献   

9.
In the era of big data, it is extremely challenging to decide what information to receive and filter out in order to effectively acquire high-quality information, particularly in social media where large-scale User Generated Contents (UGC) is widely and quickly disseminated. Considering that each individual user in social network can take actions to drive the process of information diffusion, it is naturally appealing to aggregate spreading information effectively at the individual level by regarding each user as a social sensor. Along this line, in this paper, we propose a framework for effective information acquisition in social media. To be more specific, we introduce a novel measurement, the preference-based Detection Ability to evaluate the ability of social sensors to detect diffusing events, and the problem of effective information acquisition is then reduced to achieving social sensing maximization through discovering valid social sensors. In pursuit of social sensing maximization, we propose two algorithms to resolve the longstanding problems in traditional greedy methods from the perspectives of efficiency and performance. On the one hand, we propose an efficient algorithm termed LeCELF, which resolves the redundant re-evaluations in the traditional Cost-Effective Lazy Forward (CELF) algorithm. On the other hand, we observe the participation paradox phenomenon in the social sensing network, and proceed to propose a randomized selection-based algorithm called FRIENDOM to choose social sensors to improve the effectiveness of information acquisition. Experiments on a disease spreading network and real-world microblog datasets have validated that LeCELF greatly reduces the running time, whereas FRIENDOM achieves a better detection performance. The proposed framework and corresponding algorithms can be applicable in many other settings in resolving information overload problems.  相似文献   

10.
The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.  相似文献   

11.
A news article’s online audience provides useful insights about the article’s identity. However, fake news classifiers using such information risk relying on profiling. In response to the rising demand for ethical AI, we present a profiling-avoiding algorithm that leverages Twitter users during model optimisation while excluding them when an article’s veracity is evaluated. For this, we take inspiration from the social sciences and introduce two objective functions that maximise correlation between the article and its spreaders, and among those spreaders. We applied our profiling-avoiding algorithm to three popular neural classifiers and obtained results on fake news data discussing a variety of news topics. The positive impact on prediction performance demonstrates the soundness of the proposed objective functions to integrate social context in text-based classifiers. Moreover, statistical visualisation and dimension reduction techniques show that the user-inspired classifiers better discriminate between unseen fake and true news in their latent spaces. Our study serves as a stepping stone to resolve the underexplored issue of profiling-dependent decision-making in user-informed fake news detection.  相似文献   

12.
Credit default swap transaction data repositories are frequently applied with credit default swap spread estimation and financial market risk assessment. However, in practical applications, there is poor liquidity, some missing data, and inaccurate definitions. Small samples tend to lead to poor prediction accuracy and poor adaptability of the statistical algorithm. Data generation can effectively increase the sample size and improve the effect of the risk assessment model. In this paper, a credit default swap data generation algorithm based on a sequence generative adversarial network (SeqGAN) is proposed, and the policy gradient algorithm in reinforcement learning is introduced to optimize the traditional generative adversarial network (GAN) algorithm to solve the gradient disappearance and poor data adaptability problems in the traditional algorithm. Gradient disappearance is due to the generator network in GAN being designed to be able to adjust the output continuously, which does not work on discrete data generation. The optimization algorithm proposed in this paper is used to train randomly distributed sequence data and generate credit default swap transactions with diversity and good model applicability. The credit default swap data generated in this paper are verified by the synthetic ranking agreement (SRA) index. The results show that SeqGAN can effectively synthesize various simulation samples, which can provide support for the risk discrimination model.  相似文献   

13.
With the growing focus on what is collectively known as “knowledge management”, a shift continues to take place in commercial information system development: a shift away from the well-understood data retrieval/database model, to the more complex and challenging development of commercial document/information retrieval models. While document retrieval has had a long and rich legacy of research, its impact on commercial applications has been modest. At the enterprise level most large organizations have little understanding of, or commitment to, high quality document access and management. Part of the reason for this is that we still do not have a good framework for understanding the major factors which affect the performance of large-scale corporate document retrieval systems. The thesis of this discussion is that document retrieval—specifically, access to intellectual content—is a complex process which is most strongly influenced by three factors: the size of the document collection; the type of search (exhaustive, existence or sample); and, the determinacy of document representation. Collectively, these factors can be used to provide a useful framework for, or taxonomy of, document retrieval, and highlight some of the fundamental issues facing the design and development of commercial document retrieval systems. This is the first of a series of three articles. Part II (D.C. Blair, The challenge of commercial document retrieval. Part II. A strategy for document searching based on identifiable document partitions, Information Processing and Management, 2001b, this issue) will discuss the implications of this framework for search strategy, and Part III (D.C. Blair, Some thoughts on the reported results of Text REtrieval Conference (TREC), Information Processing and Management, 2002, forthcoming) will consider the importance of the TREC results for our understanding of operating information retrieval systems.  相似文献   

14.
Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of documents and a user query, these functions are able to precisely predict a score for each of the documents, in turn exploited to effectively rank them. Although the scoring efficiency of LtR models is critical in several applications – e.g., it directly impacts on response time and throughput of Web query processing – it has received relatively little attention so far.The goal of this work is to experimentally investigate the scoring efficiency of LtR models along with their ranking quality. Specifically, we show that machine-learned ranking models exhibit a quality versus efficiency trade-off. For example, each family of LtR algorithms has tuning parameters that can influence both effectiveness and efficiency, where higher ranking quality is generally obtained with more complex and expensive models. Moreover, LtR algorithms that learn complex models, such as those based on forests of regression trees, are generally more expensive and more effective than other algorithms that induce simpler models like linear combination of features.We extensively analyze the quality versus efficiency trade-off of a wide spectrum of state-of-the-art LtR, and we propose a sound methodology to devise the most effective ranker given a time budget. To guarantee reproducibility, we used publicly available datasets and we contribute an open source C++ framework providing optimized, multi-threaded implementations of the most effective tree-based learners: Gradient Boosted Regression Trees (GBRT), Lambda-Mart (λ-MART), and the first public-domain implementation of Oblivious Lambda-Mart (Ωλ-MART), an algorithm that induces forests of oblivious regression trees.We investigate how the different training parameters impact on the quality versus efficiency trade-off, and provide a thorough comparison of several algorithms in the quality-cost space. The experiments conducted show that there is not an overall best algorithm, but the optimal choice depends on the time budget.  相似文献   

15.
Sequential recommendation models a user’s historical sequence to predict future items. Existing studies utilize deep learning methods and contrastive learning for data augmentation to alleviate data sparsity. However, these existing methods cannot learn accurate high-quality item representations while augmenting data. In addition, they usually ignore data noise and user cold-start issues. To solve the above issues, we investigate the possibility of Generative Adversarial Network (GAN) with contrastive learning for sequential recommendation to balance data sparsity and noise. Specifically, we propose a new framework, Enhanced Contrastive Learning with Generative Adversarial Network for Sequential Recommendation (ECGAN-Rec), which models the training process as a GAN and recommendation task as the main task of the discriminator. We design a sequence augmentation module and a contrastive GAN module to implement both data-level and model-level augmentations. In addition, the contrastive GAN learns more accurate high-quality item representations to alleviate data noise after data augmentation. Furthermore, we propose an enhanced Transformer recommender based on GAN to optimize the performance of the model. Experimental results on three open datasets validate the efficiency and effectiveness of the proposed model and the ability of the model to balance data noise and data sparsity. Specifically, the improvement of ECGAN-Rec in two evaluation metrics (HR@N and NDCG@N) compared to the state-of-the-art model performance on the Beauty, Sports and Yelp datasets are 34.95%, 36.68%, and 13.66%, respectively. Our implemented model is available via https://github.com/nishawn/ECGANRec-master.  相似文献   

16.
If only experimental measurements are available, direct data-driven control design becomes an appealing approach, as control performance is directly optimized based on the collected samples. The direct synthesis of a feedback controller from input-output data typically requires the blind choice of a reference model, that dictates the desired closed-loop behavior. In this paper, we propose a data-driven design scheme for linear parameter-varying (LPV) systems to account for soft performance specifications. Within this framework, the reference model is treated as an additional hyper-parameter to be learned from data, while the user is asked to provide only indicative performance constraints. The effectiveness of the proposed approach is demonstrated on a benchmark simulation case study, showing the improvement achieved by allowing for a flexible reference model.  相似文献   

17.
Along with informed consent, anonymization is an accepted method of protecting the interests of research participants, while allowing data collected for official statistical purposes to be reused by other agencies within and outside government. The Decennial Census, carried out in a number of countries, including the United Kingdom, is a major event in the production of research data and provides an important resource for a variety of organizations. This article combines ethical evaluation, a review of relevant law and guidance, and analysis of 30 qualitative interviews (carried out during the period of the 2001 UK Census), in order to explore the adequacy of the current framework for the protection of informational privacy in relation to census data. Taking account of Nissenbaum's concept of “contextual integrity,” Vedder's concept of “categorical privacy,” and Sen's call to heed of the importance of “actual behavior,” it will be argued that the current “contractarian” view of the relationship between an individual participant and the organization carrying out the Census does not engage sufficiently with actual uses of data. As a result, people have expectations of privacy that are not matched by practice and that the current normative—including the governance—framework cannot capture.  相似文献   

18.
Real-world datasets often present different types of data quality problems, such as the presence of outliers, missing values, inaccurate representations and duplicate entities. In order to identify duplicate entities, a task named Entity Resolution (ER), we may employ a variety of classification techniques. Rule-based techniques for classification have gained increasing attention from the state of the art due to the possibility of incorporating automatic learning approaches for generating Rule-Based Entity Resolution (RbER) algorithms. However, these algorithms present a series of drawbacks: i) The generation of high-quality RbER algorithms usually require high computational and/or manual labeling costs; ii) the impossibility of tuning RbER algorithm parameters; iii) the inability to incorporate user preferences regarding the ER results in the algorithm functioning; and iv) the logical (binary) nature of the RbER algorithms usually fall short when tackling special cases, i.e., challenging duplicate and non-duplicate pairs of entities. To overcome these drawbacks, we propose Rule Assembler, a configurable approach that classifies duplicate entities based on confidence scores produced by logical rules, taking into account tunable parameters as well as user preferences. Experiments carried out using both real-world and synthetic datasets have demonstrated the ability of the proposed approach to enhance the results produced by baseline RbER algorithms and basic assembling approaches. Furthermore, we demonstrate that the proposed approach does not entail a significant overhead over the classification step and conclude that the Rule Assembler parameters APA, WPA, TβM and Max are more suitable to be used in practical scenarios.  相似文献   

19.
The development of digital technology promotes the construction of the Intangible cultural heritage (ICH) database but the data is still unorganized and not linked well, which makes the public hard to master the overall knowledge of the ICH. An ICH knowledge graph (KG) can help the public to understand the ICH and facilitate the protection of the ICH. However, a general framework of ICH KG construction is lacking now. In this study, we take the Chinese ICH (nation-level) as an example and propose a framework to build a Chinese ICH KG combining multiple data sources from Baike and the official website, which can extend the scale of the KG. Besides, the data of ICH grows daily, requiring us to design an efficient model to extract the knowledge from the data to update the KG in time. The built KG is based on the triple 〈entity, attribute, attribute value〉 and we introduce the attribute value extraction (AVE) task. However, the public Chinese ICH annotated AVE corpus is lacking. To solve that, we construct a Chinese ICH AVE corpus based on the Distant Supervision (DS) automatically rather than employing traditional manual annotation. Currently, AVE is usually seen as the sequence tagging task. In this paper, we take the ICH AVE as a node classification task and propose an AVE model BGC, combining the BiLSTM and graph attention network, which can fuse and utilize the word-level and character-level information by means of the ICH lexicon generated from the KG. We conduct extensive experiments and compare the proposed model with other state-of-the-art models. Experimental results show that the proposed model is of superiority.  相似文献   

20.
Integrating different steps on a chip for cell manipulations and sample preparation is of foremost importance to fully take advantage of microfluidic possibilities, and therefore make tests faster, cheaper and more accurate. We demonstrated particle manipulation in an integrated microfluidic device by applying hydrodynamic, electroosmotic (EO), electrophoretic (EP), and dielectrophoretic (DEP) forces. The process involves generation of fluid flow by pressure difference, particle trapping by DEP force, and particle redirect by EO and EP forces. Both DC and AC signals were applied, taking advantages of DC EP, EO and AC DEP for on-chip particle manipulation. Since different types of particles respond differently to these signals, variations of DC and AC signals are capable to handle complex and highly variable colloidal and biological samples. The proposed technique can operate in a high-throughput manner with thirteen independent channels in radial directions for enrichment and separation in microfluidic chip. We evaluated our approach by collecting Polystyrene particles, yeast cells, and E. coli bacteria, which respond differently to electric field gradient. Live and dead yeast cells were separated successfully, validating the capability of our device to separate highly similar cells. Our results showed that this technique could achieve fast pre-concentration of colloidal particles and cells and separation of cells depending on their vitality. Hydrodynamic, DC electrophoretic and DC electroosmotic forces were used together instead of syringe pump to achieve sufficient fluid flow and particle mobility for particle trapping and sorting. By eliminating bulky mechanical pumps, this new technique has wide applications for in situ detection and analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号