首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对钢板表面缺陷图像分类传统深度学习算法中需要大量标签数据的问题,提出一种基于主动学习的高效分类方法。该方法包含一个轻量级的卷积神经网络和一个基于不确定性的主动学习样本筛选策略。神经网络采用简化的convolutional base进行特征提取,然后用全局池化层替换掉传统密集连接分类器中的隐藏层来减轻过拟合。为了更好的衡量模型对未标签图像样本所属类别的不确定性,首先将未标签图像样本传入到用标签图像样本训练好的模型,得到模型对每一个未标签样本关于标签的概率分布(probability distribution over classes, PDC),然后用此模型对标签样本进行预测并得到模型对每个标签的平均PDC。将两类分布的KL-divergence值作为不确定性指标来筛选未标签图像进行人工标注。根据在NEU-CLS开源缺陷数据集上的对比实验,该方法可以通过44%的标签数据实现97%的准确率,极大降低标注成本。  相似文献   

2.
Previous studies have adopted unsupervised machine learning with dimension reduction functions for cyberattack detection, which are limited to performing robust anomaly detection with high-dimensional and sparse data. Most of them usually assume homogeneous parameters with a specific Gaussian distribution for each domain, ignoring the robust testing of data skewness. This paper proposes to use unsupervised ensemble autoencoders connected to the Gaussian mixture model (GMM) to adapt to multiple domains regardless of the skewness of each domain. In the hidden space of the ensemble autoencoder, the attention-based latent representation and reconstructed features of the minimum error are utilized. The expectation maximization (EM) algorithm is used to estimate the sample density in the GMM. When the estimated sample density exceeds the learning threshold obtained in the training phase, the sample is identified as an outlier related to an attack anomaly. Finally, the ensemble autoencoder and the GMM are jointly optimized, which transforms the optimization of objective function into a Lagrangian dual problem. Experiments conducted on three public data sets validate that the performance of the proposed model is significantly competitive with the selected anomaly detection baselines.  相似文献   

3.
李一帆  王玙 《情报科学》2022,40(6):115-123
【目的/意义】随着学科交叉与学科融合的不断深入,科研工作越来越需要多个学者合作完成。识别潜在的 合作关系,为学者推荐适合的合作对象,能有效提高科研效率。【方法/过程】基于动态网络表示学习模型对学者合 作关系预测展开研究。首先,提出一种动态网络表示学习模型 DynNE_Atten。其次,根据图书情报领域的文献数 据构建动态科研合作网络和动态关键词共现网络,使用 DynNE_Atten 模型得到作者向量表示和关键词向量表示, 同时提取作者单位特征。最后,融合作者合作、主题与单位特征,预测未来可能产生的合作。【结果/结论】实验结果 表明,本文提出的动态网络表示学习模型在时序链路预测任务中只需要较少的输入数据,就能达到较高的准确性; 相比于未融合特征的学者表示,融合模型在合作关系预测中展现出明显的优势。【创新/局限】提出了一种新的动态 网络表示学习模型,并融合主题特征和作者单位特征进行科研合作预测,取得了较好的结果。目前模型在特征融 合的方式上只考虑了数据层面的异构,并未考虑网络层面的异构。  相似文献   

4.
一种基于活动围道的纹理图像分割方法   总被引:1,自引:0,他引:1  
本文将Gabor滤波器和各向异性扩散方程相结合,提出了一种基于活动围道的无监督纹理图像分割算法。采用基于总变分流的扩散函数,各向异性扩散方程可以有效地在保留纹理图像大尺度边界信息的同时对图像纹理区域进行平滑,获得比原始图像更易分割的简化图像。但是平滑过程中纹理信息的丧失,限制了该方法的通用性和有效性。为了在利用各向异性扩散方法的同时有效地提取和利用纹理信息,我们利用Gabor滤波器提取一组表征纹理方向性和尺度性的特征图像,同时将原始图像作为表征纹理灰度信息的一个特征通道考虑。再利用矢量形式的各向异性扩散方程对特征图像进行边界保持的各向异性平滑。我们将基于区域灰度统计参数估计的活动围道分割方法扩展到矢量空间,来对平滑后的纹理特征量进行分割。实验证明利用该纹理分割算法可以获得较好的效果。  相似文献   

5.
This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used.Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.  相似文献   

6.
Pedestrian gender recognition is a very challenging problem, since the viewpoint variations, illumination changes, occlusion, and poor quality are usually encountered in the pedestrian images. To address this problem, an effective HOG-assisted deep feature learning (HDFL) method is proposed in this paper. The key novelty lies in the design of HDFL network to effectively explore both deep-learned feature and weighted histogram of oriented gradient (HOG) feature for the pedestrian gender recognition. Specifically, the deep-learned and weighted HOG feature extraction branches are simultaneously performed on the input pedestrian image. A feature fusion process is subsequently conducted to obtain a more robust and discriminative feature, which is then fed to a softmax classifier for pedestrian gender recognition. Extensive experiments on multiple existing pedestrian image datasets have shown that the proposed HDFL method is able to effectively recognize the pedestrian gender, and consistently outperforms the state-of-the-art methods.  相似文献   

7.
Anomalous data are such data that deviate from a large number of normal data points, which often have negative impacts on various systems. Current anomaly detection technology suffers from low detection accuracy, high false alarm rate and lack of labeled data. Anomaly detection is of great practical importance as an effective means to detect anomalies in the data and provide important support for the normal operation of various systems. In this paper, we propose an anomaly detection classification model that incorporates federated learning and mixed Gaussian variational self-encoding networks, namely MGVN. The proposed MGVN network model first constructs a variational self-encoder using a mixed Gaussian prior to extracting features from the input data, and then constructs a deep support vector network with the mixed Gaussian variational self-encoder to compress the feature space. The MGVN finds the minimum hypersphere to separate the normal and abnormal data and measures the abnormal fraction by calculating the Euclidean distance between the data features and the hypersphere center. Federated learning is finally incorporated with MGVN (FL-MGVN) to effectively address the problems that multiple participants collaboratively train a global model without sharing private data. The experiments are conducted on the benchmark datasets such as NSL-KDD, MNIST and Fashion-MNIST, which demonstrate that the proposed FL-MGVN has higher recognition performance and classification accuracy than other methods. The average AUC on MNIST and Fashion-MNIST reached 0.954 and 0.937, respectively.  相似文献   

8.
Teaching images, as an important auxiliary tool in teaching and learning, are fundamentally different from the general domain images. Besides visually similar images being more likely to share common labels, teaching images also face the challenge of visual-knowledge inconsistency, including intra-knowledge visual difference and inter-knowledge visual similarity. To address the above challenges, we present KBHN, a knowledge-aware bi-hypergraph network, which not only considers coarse-grained visual features, but also extracts fine-grained knowledge features that reflect knowledge intention hidden in teaching images. In detail, a visual hypergraph is constructed to connect images with visual similarity. It further enriches coarse-grained visual features by modeling the high-order visual relations among teaching images. Moreover, a knowledge hypergraph based on typical images is built to aggregate images with similar knowledge information, which innovatively extracts fine-grained knowledge features by modeling high-order knowledge correlations between local regions. Furthermore, a multi-head attention mechanism is adopted to fuse visual-knowledge features for enriching image representation. A teaching image dataset is constructed to train and validate our model, which contains 20744 real-world images annotated with 24 knowledge points. Experimental results demonstrate that KBHN, incorporating visual-knowledge features, achieves state-of-the-art performance compared to existing methods.  相似文献   

9.
With the widespread application of 3D capture devices, diverse 3D object datasets from different domains have emerged recently. Consequently, how to obtain the 3D objects from different domains is becoming a significant and challenging task. The existing approaches mainly focus on the task of retrieval from the identical dataset, which significantly constrains their implementation in real-world applications. This paper addresses the cross-domain object retrieval in an unsupervised manner, where the labels of samples from source domain are provided while the labels of samples from target domain are unknown. We propose a joint deep feature learning and visual domain adaptation method (Deep-VDA) to solve the cross-domain 3D object retrieval problem by the end-to-end learning. Specifically, benefiting from the advantages of deep learning networks, Deep-VDA employs MVCNN for deep feature extraction and domain alignment for unsupervised domain adaptation. The framework can enable the statistical and geometric shift between domains to be minimized in an unsupervised manner, which is accomplished by preserving both common and unique characteristics of each domain. Deep-VDA can improve the robustness of object features from different domains, which is important to maintain remarkable retrieval performance.  相似文献   

10.
Deep multi-view clustering (MVC) is to mine and employ the complex relationships among views to learn the compact data clusters with deep neural networks in an unsupervised manner. The more recent deep contrastive learning (CL) methods have shown promising performance in MVC by learning cluster-oriented deep feature representations, which is realized by contrasting the positive and negative sample pairs. However, most existing deep contrastive MVC methods only focus on the one-side contrastive learning, such as feature-level or cluster-level contrast, failing to integrating the two sides together or bringing in more important aspects of contrast. Additionally, most of them work in a separate two-stage manner, i.e., first feature learning and then data clustering, failing to mutually benefit each other. To fix the above challenges, in this paper we propose a novel joint contrastive triple-learning framework to learn multi-view discriminative feature representation for deep clustering, which is threefold, i.e., feature-level alignment-oriented and commonality-oriented CL, and cluster-level consistency-oriented CL. The former two submodules aim to contrast the encoded feature representations of data samples in different feature levels, while the last contrasts the data samples in the cluster-level representations. Benefiting from the triple contrast, the more discriminative representations of views can be obtained. Meanwhile, a view weight learning module is designed to learn and exploit the quantitative complementary information across the learned discriminative features of each view. Thus, the contrastive triple-learning module, the view weight learning module and the data clustering module with these fused features are jointly performed, so that these modules are mutually beneficial. The extensive experiments on several challenging multi-view datasets show the superiority of the proposed method over many state-of-the-art methods, especially the large improvement of 15.5% and 8.1% on Caltech-4V and CCV in terms of accuracy. Due to the promising performance on visual datasets, the proposed method can be applied into many practical visual applications such as visual recognition and analysis. The source code of the proposed method is provided at https://github.com/ShizheHu/Joint-Contrastive-Triple-learning.  相似文献   

11.
Multi-feature fusion has achieved gratifying performance in image retrieval. However, some existing fusion mechanisms would unfortunately make the result worse than expected due to the domain and visual diversity of images. As a result, a burning problem for applying feature fusion mechanism is how to figure out and improve the complementarity of multi-level heterogeneous features. To this end, this paper proposes an adaptive multi-feature fusion method via cross-entropy normalization for effective image retrieval. First, various low-level features (e.g., SIFT) and high-level semantic features based on deep learning are extracted. Under each level of feature representation, the initial similarity scores of the query image w.r.t. the target dataset are calculated. Second, we use an independent reference dataset to approximate the tail of the attained initial similarity score ranking curve by cross-entropy normalization. Then the area under the ranking curve is calculated as the indicator of the merit of corresponding feature (i.e., a smaller area indicates a more suitable feature.). Finally, fusion weights of each feature are assigned adaptively by the statistically elaborated areas. Extensive experiments on three public benchmark datasets have demonstrated that the proposed method can achieve superior performance compared with the existing methods, improving the metrics mAP by relatively 1.04% (for Holidays), 1.22% (for Oxf5k) and the N-S by relatively 0.04 (for UKbench), respectively.  相似文献   

12.
The advantages of user click data greatly inspire its wide application in fine-grained image classification tasks. In previous click data based image classification approaches, each image is represented as a click frequency vector on a pre-defined query/word dictionary. However, this approach not only introduces high-dimensional issues, but also ignores the part of speech (POS) of a specific word as well as the word correlations. To address these issues, we devise the factorized deep click features to represent images. We first represent images as the factorized TF-IDF click feature vectors to discover word correlation, wherein several word dictionaries of different POS are constructed. Afterwards, we learn an end-to-end deep neural network on click feature tensors built on these factorized TF-IDF vectors. We evaluate our approach on the public Clickture-Dog dataset. It shows that: 1) the deep click feature learned on click tensor performs much better than traditional click frequency vectors; and 2) compared with many state-of-the-art textual representations, the proposed deep click feature is more discriminative and with higher classification accuracies.  相似文献   

13.
14.
Emotional recognition contributes to automatically perceive the user’s emotional response to multimedia content through implicit annotation, which further benefits establishing effective user-centric services. Physiological-based ways have increasingly attract researcher’s attention because of their objectiveness on emotion representation. Conventional approaches to solve emotion recognition have mostly focused on the extraction of different kinds of hand-crafted features. However, hand-crafted feature always requires domain knowledge for the specific task, and designing the proper features may be more time consuming. Therefore, exploring the most effective physiological-based temporal feature representation for emotion recognition becomes the core problem of most works. In this paper, we proposed a multimodal attention-based BLSTM network framework for efficient emotion recognition. Firstly, raw physiological signals from each channel are transformed to spectrogram image for capturing their time and frequency information. Secondly, Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) are utilized to automatically learn the best temporal features. The learned deep features are then fed into a deep neural network (DNN) to predict the probability of emotional output for each channel. Finally, decision level fusion strategy is utilized to predict the final emotion. The experimental results on AMIGOS dataset show that our method outperforms other state of art methods.  相似文献   

15.
Similarity search with hashing has become one of the fundamental research topics in computer vision and multimedia. The current researches on semantic-preserving hashing mainly focus on exploring the semantic similarities between pointwise or pairwise samples in the visual space to generate discriminative hash codes. However, such learning schemes fail to explore the intrinsic latent features embedded in the high-dimensional feature space and they are difficult to capture the underlying topological structure of data, yielding low-quality hash codes for image retrieval. In this paper, we propose an ordinal-preserving latent graph hashing (OLGH) method, which derives the objective hash codes from the latent space and preserves the high-order locally topological structure of data into the learned hash codes. Specifically, we conceive a triplet constrained topology-preserving loss to uncover the ordinal-inferred local features in binary representation learning. By virtue of this, the learning system can implicitly capture the high-order similarities among samples during the feature learning process. Moreover, the well-designed latent subspace learning is built to acquire the noise-free latent features based on the sparse constrained supervised learning. As such, the latent under-explored characteristics of data are fully employed in subspace construction. Furthermore, the latent ordinal graph hashing is formulated by jointly exploiting latent space construction and ordinal graph learning. An efficient optimization algorithm is developed to solve the resulting problem to achieve the optimal solution. Extensive experiments conducted on diverse datasets show the effectiveness and superiority of the proposed method when compared to some advanced learning to hash algorithms for fast image retrieval. The source codes of this paper are available at https://github.com/DarrenZZhang/OLGH .  相似文献   

16.
Detecting sentiments in natural language is tricky even for humans, making its automated detection more complicated. This research proffers a hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data. It reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual (written text) and visual (still images) and their combination within the online content using decision level multimodal fusion. The proposed contextual ConvNet-SVMBoVW model, has four modules, namely, the discretization, text analytics, image analytics, and decision module. The input to the model is multimodal text, m ε {text, image, info-graphic}. The discretization module uses Google Lens to separate the text from the image, which is then processed as discrete entities and sent to the respective text analytics and image analytics modules. Text analytics module determines the sentiment using a hybrid of a convolution neural network (ConvNet) enriched with the contextual semantics of SentiCircle. An aggregation scheme is introduced to compute the hybrid polarity. A support vector machine (SVM) classifier trained using bag-of-visual-words (BoVW) for predicting the visual content sentiment. A Boolean decision module with a logical OR operation is augmented to the architecture which validates and categorizes the output on the basis of five fine-grained sentiment categories (truth values), namely ‘highly positive,’ ‘positive,’ ‘neutral,’ ‘negative’ and ‘highly negative.’ The accuracy achieved by the proposed model is nearly 91% which is an improvement over the accuracy obtained by the text and image modules individually.  相似文献   

17.
提出一种基于遗传规划的合成孔径雷达图像识别方法.首先提取SAR图像的5种特征作为原始特征,然后利用遗传规划算法在5种原始特征上合成新的特征,最后采用支持向量机进行分类.实验结果表明了算法的有效性.  相似文献   

18.
19.
Automatically assessing academic papers has enormous potential to reduce peer-review burden and individual bias. Existing studies strive for building sophisticated deep neural networks to identify academic value based on comprehensive data, e.g., academic graphs and full papers. However, these data are not always easy to access. And the content of the paper rather than other features outside the paper should matter in a fair assessment. Furthermore, while BERT models can maintain general semantics by pre-training on large-scale corpora, they tend to be over-smoothing due to stacked self-attention layers among unfiltered input tokens. Therefore, it is nontrivial to figure out distinguishable value of an academic paper from its limited content. In this study, we propose a novel deep neural network, namely Dual-view Graph Convolutions Enhanced BERT (DGC-BERT), for academic paper acceptance estimation. We combine the title and abstract of the paper as input. Then, a pre-trained BERT model is employed to extract the paper’s general representations. Apart from hidden representations of the final layer, we highlight the first and last few layers as lexical and semantic views. In particular, we re-examine the dual-view filtered self-attention matrices via constructing two graphs, respectively. After that, two multi-hop Graph Convolutional Networks (GCNs) are separately employed to capture pivotal and distant dependencies between the tokens. Moreover, the dual-view representations are facilitated by each other with biaffine attention modules. And a re-weighting gate is proposed to further streamline the dual-view representations with the help of the original BERT representation. Finally, whether the submitted paper could be acceptable is predicted based on the original language model features cooperated with the dual-view dependencies. Extensive data analyses and the full paper based MHCNN studies provide insights into the task and structural functions. Comparison experiments on two benchmark datasets demonstrate that the proposed DGC-BERT significantly outperforms alternative approaches, especially the state-of-the-art models like MHCNN and BERT variants. Additional analyses reveal significance and explainability of the proposed modules in the DGC-BERT. Our codes and settings have been released on Github (https://github.com/ECNU-Text-Computing/DGC-BERT).  相似文献   

20.
Automated visual inspection of fabric defects is a challenge due to the diversity of the fabric patterns and defects. Although there are many automated inspection methods of identifying fabric defects, most methods process images containing the fabric patterns classified as the crystallographic group p1 and implicitly assume the fabric patterns are arranged in fixed directions. This paper proposes an automated defect inspection method which calibrates the fabric image and then segments the image into none-overlapped sub-images which are called lattices. Thus, the image is represented by hundreds of lattices sharing some common features instead of millions of unrelated pixels. The defect inspection problem is transformed to comparing the lattice similarity based on the shared features and identifying the defective lattices as the outliers in the feature space. The performance of the proposed method ILS (Isotropic Lattice Segmentation) is evaluated on the databases of images containing fabric patterns arranged orthogonally and arbitrarily. By comparing the resultant images with ground-truth images, an overall detection rate of 0.955 is achieved, which is comparable with the state-of-the-art methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号