首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Detecting sentiments in natural language is tricky even for humans, making its automated detection more complicated. This research proffers a hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data. It reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual (written text) and visual (still images) and their combination within the online content using decision level multimodal fusion. The proposed contextual ConvNet-SVMBoVW model, has four modules, namely, the discretization, text analytics, image analytics, and decision module. The input to the model is multimodal text, m ε {text, image, info-graphic}. The discretization module uses Google Lens to separate the text from the image, which is then processed as discrete entities and sent to the respective text analytics and image analytics modules. Text analytics module determines the sentiment using a hybrid of a convolution neural network (ConvNet) enriched with the contextual semantics of SentiCircle. An aggregation scheme is introduced to compute the hybrid polarity. A support vector machine (SVM) classifier trained using bag-of-visual-words (BoVW) for predicting the visual content sentiment. A Boolean decision module with a logical OR operation is augmented to the architecture which validates and categorizes the output on the basis of five fine-grained sentiment categories (truth values), namely ‘highly positive,’ ‘positive,’ ‘neutral,’ ‘negative’ and ‘highly negative.’ The accuracy achieved by the proposed model is nearly 91% which is an improvement over the accuracy obtained by the text and image modules individually.  相似文献   

2.
Image and text matching bridges visual and textual modality differences and plays a considerable role in cross-modal retrieval. Much progress has been achieved through semantic representation and alignment. However, the distribution of multimedia data is severely unbalanced and contains many low-frequency occurrences, which are often ignored and cause performance degradation, i.e., the long-tail effect. In this work, we propose a novel rare-aware attention network (RAAN), which explores and exploits textual rare content for tackling the long-tail effect of image and text matching. Specifically, we first design a rare-aware mining module, which contains global prior information construction and rare fragment detector for modeling the characteristic of rare content. Then, the rare attention matching utilizes prior information as attention to guide the representation enhancement of rare content and introduces the rareness representation to strengthen the similarity calculation. Finally, we design prior information loss to optimize the model together with the triplet loss. We perform quantitative and qualitative experiments on two large-scale databases and achieve leading performance. In particular, we conduct 0-shot test for rare content and improve rSum by 21.0 and 41.5 on Flickr30K (155,000 image and text pairs) and MSCOCO (616,435 image and text pairs), demonstrating the effectiveness of the proposed method for the long-tail effect.  相似文献   

3.
Image–text matching is a crucial branch in multimedia retrieval which relies on learning inter-modal correspondences. Most existing methods focus on global or local correspondence and fail to explore fine-grained global–local alignment. Moreover, the issue of how to infer more accurate similarity scores remains unresolved. In this study, we propose a novel unifying knowledge iterative dissemination and relational reconstruction (KIDRR) network for image–text matching. Particularly, the knowledge graph iterative dissemination module is designed to iteratively broadcast global semantic knowledge, enabling relevant nodes to be associated, resulting in fine-grained intra-modal correlations and features. Hence, vector-based similarity representations are learned from multiple perspectives to model multi-level alignments comprehensively. The relation graph reconstruction module is further developed to enhance cross-modal correspondences by constructing similarity relation graphs and adaptively reconstructing them. We conducted experiments on the datasets Flickr30K and MSCOCO, which have 31,783 and 123,287 images, respectively. Experiments show that KIDRR achieves improvements of nearly 2.2% and 1.6% relative to Recall@1 on Flicr30K and MSCOCO, respectively, compared to the current state-of-the-art baselines.  相似文献   

4.
5.
Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimedia databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary media. We perform retrieval in a two-stage fashion: first rank by a secondary medium, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a ‘better’ subset. Using a relatively ‘cheap’ first stage, efficiency is also improved via the fewer CBIR operations performed. Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that our dynamic two-stage method can be significantly more effective and robust than similar setups with static thresholds previously proposed. In additional experiments using local feature derivatives in the visual stage instead of global, such as the emerging visual codebook approach, we find that two-stage does not work very well. We attribute the weaker performance of the visual codebook to the enhanced visual diversity produced by the textual stage which diminishes codebook’s advantage over global features. Furthermore, we compare dynamic two-stage retrieval to traditional score-based fusion of results retrieved visually and textually. We find that fusion is also significantly more effective than single-medium baselines. Although, there is no clear winner between two-stage and fusion, the methods exhibit different robustness features; nevertheless, two-stage retrieval provides efficiency benefits over fusion.  相似文献   

6.
Visual Question Answering (VQA) requires reasoning about the visually-grounded relations in the image and question context. A crucial aspect of solving complex questions is reliable multi-hop reasoning, i.e., dynamically learning the interplay between visual entities in each step. In this paper, we investigate the potential of the reasoning graph network on multi-hop reasoning questions, especially over 3 “hops.” We call this model QMRGT: A Question-Guided Multi-hop Reasoning Graph Network. It constructs a cross-modal interaction module (CIM) and a multi-hop reasoning graph network (MRGT) and infers an answer by dynamically updating the inter-associated instruction between two modalities. Our graph reasoning module can apply to any multi-modal model. The experiments on VQA 2.0 and GQA (in fully supervised and O.O.D settings) datasets show that both QMRGT and pre-training V&L models+MRGT lead to improvement on visual question answering tasks. Graph-based multi-hop reasoning provides an effective signal for the visual question answering challenge, both for the O.O.D and high-level reasoning questions.  相似文献   

7.
Teaching images, as an important auxiliary tool in teaching and learning, are fundamentally different from the general domain images. Besides visually similar images being more likely to share common labels, teaching images also face the challenge of visual-knowledge inconsistency, including intra-knowledge visual difference and inter-knowledge visual similarity. To address the above challenges, we present KBHN, a knowledge-aware bi-hypergraph network, which not only considers coarse-grained visual features, but also extracts fine-grained knowledge features that reflect knowledge intention hidden in teaching images. In detail, a visual hypergraph is constructed to connect images with visual similarity. It further enriches coarse-grained visual features by modeling the high-order visual relations among teaching images. Moreover, a knowledge hypergraph based on typical images is built to aggregate images with similar knowledge information, which innovatively extracts fine-grained knowledge features by modeling high-order knowledge correlations between local regions. Furthermore, a multi-head attention mechanism is adopted to fuse visual-knowledge features for enriching image representation. A teaching image dataset is constructed to train and validate our model, which contains 20744 real-world images annotated with 24 knowledge points. Experimental results demonstrate that KBHN, incorporating visual-knowledge features, achieves state-of-the-art performance compared to existing methods.  相似文献   

8.
杨丹 《科教文汇》2014,(11):202-203
高校视觉识别系统的设计不是简单的形象设计,更不是相关要素的简单拼凑,而是由一些具体部件构成的一个统一体。本文通过对地方院校视觉识别系统构建现状进行分析,提出了地方院校视觉识别系统的相关构建策略。  相似文献   

9.
Think tanks have been proved helpful for decision-making in various communities. However, collecting information manually for think tank construction implies too much time and labor cost as well as inevitable subjectivity. A probable solution is to retrieve webpages of renowned experts and institutes similar to a given example, denoted as query by webpage (QBW). Considering users’ searching behaviors, a novel QBW model based on webpages’ visual and textual features is proposed. Specifically, a visual feature extraction module based on pre-trained neural networks and a heuristic pooling scheme is proposed, which bridges the gap that existing extractors fail to extract snapshots’ high-level features and are sensitive to the noise effect brought by images. Moreover, a textual feature extraction module is proposed to represent textual content in both term and topic grains, while most existing extractors merely focus on the term grain. In addition, a series of similarity metrics are proposed, including a textual similarity metric based on feature bootstrapping to improve model’s robustness and an adaptive weighting scheme to balance the effect of different types of features. The proposed QBW model is evaluated on expert and institute introduction retrieval tasks in academic and medical scenarios, in which the average value of MAP has been improved by 10% compared to existing baselines. Practically, useful insights can be derived from this study for various applications involved with webpage retrieval besides think tank construction.  相似文献   

10.
Visual Basic具有强大的图形图像处理功能,并广泛应用于图形设计、图像处理中.本文介绍了Visual Basic环境下城市绿化效果的展示.  相似文献   

11.
Existing approaches to learning path recommendation for online learning communities mainly rely on the individual characteristics of users or the historical records of their learning processes, but pay less attention to the semantics of users’ postings and the context. To facilitate the knowledge understanding and personalized learning of users in online learning communities, it is necessary to conduct a fine-grained analysis of user data to capture their dynamical learning characteristics and potential knowledge levels, so as to recommend appropriate learning paths. In this paper, we propose a fine-grained and multi-context-aware learning path recommendation model for online learning communities based on a knowledge graph. First, we design a multidimensional knowledge graph to solve the problem of monotonous and incomplete entity information presentation of the single layer knowledge graph. Second, we use the topic preference features of users’ postings to determine the starting point of learning paths. We then strengthen the distant relationship of knowledge in the global context using the multidimensional knowledge graph when generating and recommending learning paths. Finally, we build a user background similarity matrix to establish user connections in the local context to recommend users with similar knowledge levels and learning preferences and synchronize their subsequent postings. Experiment results show that the proposed model can recommend appropriate learning paths for users, and the recommended similar users and postings are effective.  相似文献   

12.
邓锋 《科技广场》2007,(7):166-168
在阐述基于Visual Basic 6.0(中文版)的虚拟波形发生器的软件设计过程的基础上,完成了在计算机上模拟传统波形发生器的控制面板和主要功能。  相似文献   

13.
Eye-in-hand systems have received significant attention for the task of approaching an object identified by its reference image shown in advance. Image based visual servoing (VS) methods demonstrate robustness to image noises, but encounter difficulties especially when camera displacement is large. More challenges exist when the object is cylindrical with less texture. This paper proposes new feature set and pertinent trajectory planning scheme to achieve a convergent path with tolerable violation of camera field of view (FOV) limits, allowing transient loss of part of the observed object. Specifically, new features and interaction matrices are developed to achieve global convergence. Feature trajectories are then planned with tolerable FOV violation through a constrained nonlinear minimization path-planning technique. They are later tracked via an adapted IBVS controller in a switching manner. Simulation with two views of a real drinking vessel validates the proposed method and demonstrates its adaption to partial FOV violation.  相似文献   

14.
【目的/意义】文物图像资源日益成为数字人文基础设施建设的重要内容,为了克服存在于图像资源内容与 形式上的“语义鸿沟”对其开发利用产生的消极影响,有必要面向文物图像资源底层视觉特征与高层语义特征的精 细化映射与细粒度知识表示进行相关研究。【方法/过程】本文在分析文物图像资源知识表示需求与表示策略的基 础上,提出了基于知识元构造的文物图像资源细粒度知识表示模型。在模型设计的基础上,以著名绘画文物《历代 帝王图卷》为例,阐述了面向文物图像资源细粒度表示需求进行知识元提取、构造与数据发布的具体流程。【结果/ 结论】实验结果表明,本文提出的基于知识元的文物图像资源细粒度知识表示方法能够在图像底层视觉特征与高 层语义特征之间建立有效的语义映射关系,并通过与外部知识库的数据关联实现与开放关联数据网络的深度融 合。【创新/局限】本文从知识元角度出发,提出了文物图像资源的细粒度知识表示方法,在未来的研究中还需对文 物图像知识元的自动提取以及基于知识元的图像资源知识发现方法进行更深入地探索。  相似文献   

15.
戎鹏柱 《科教文汇》2012,(12):128-128,143
直观教学是生物教学中最重要的教学手段之一,几乎所有的生物课都能利用这种方法教学,利用这一手段的效果能直接决定学生的学习效果。所以,教师必须对直观教学的意义、种类和运用方法有全面深入的了解,并掌握相关技能和技巧,才能不断提高教学质量。  相似文献   

16.
The advantages of user click data greatly inspire its wide application in fine-grained image classification tasks. In previous click data based image classification approaches, each image is represented as a click frequency vector on a pre-defined query/word dictionary. However, this approach not only introduces high-dimensional issues, but also ignores the part of speech (POS) of a specific word as well as the word correlations. To address these issues, we devise the factorized deep click features to represent images. We first represent images as the factorized TF-IDF click feature vectors to discover word correlation, wherein several word dictionaries of different POS are constructed. Afterwards, we learn an end-to-end deep neural network on click feature tensors built on these factorized TF-IDF vectors. We evaluate our approach on the public Clickture-Dog dataset. It shows that: 1) the deep click feature learned on click tensor performs much better than traditional click frequency vectors; and 2) compared with many state-of-the-art textual representations, the proposed deep click feature is more discriminative and with higher classification accuracies.  相似文献   

17.
18.
Driver behavior recognition has attracted extensive attention recently. Numerous methods have been developed on the basis of various deep neural networks. However, the existing models still suffer from various challenges in solving the downstream fine-grained recognition task, because many driver behaviors share similar global shape / appearances, and can only be discriminated by subtle but meaningful actions of local body parts. To remedy this emerging issue of fine-grained driver behavior recognition, we present a bilinear full-scale residual Swin-Transformer network (BiRSwinT) to learn and fuse the global shape/appearance and the local discriminative cues of driver actions. Specifically, a transformer-based dual-stream structure is designed, which contains two parallel branches: a global representation branch and a local residual branch. To learn the multi-scale local cues implied in driver actions, a set of full-scale skip connections are introduced in the local residual branch. Then a bilinear fusion with multi-step training strategy is employed to leverage and combine the global representation and local detail features of driver actions. The proposed method has been validated on the AUC V1 and V2 datasets, and achieved the average Top-1 accuracy of 93.235% and 92.253%, respectively. Comparison experiments show that the proposed method has superior performance when compared to the state-of-the-art methods. The generalization experiments on the State-Farm dataset also demonstrate the superiority of the proposed framework and strategy. The source code is available at https://github.com/shadow2469/BiRSwinT.git.  相似文献   

19.
视觉传达设计是一门使用创意思维进行信息传递的专业,通过对视觉传达设计专业的学生创意思维的培养,促使学生积累个人经验,提高对事物的理解和观察力,运用创造性的思维方法进行视觉表达,在教师与学生的互动交流过程中产生创意作品。  相似文献   

20.
可视化数据探索及其应用   总被引:1,自引:0,他引:1  
余红梅  梁战平 《情报科学》2007,25(4):599-603
可视化数据探索以其特有的与数据集直接交互的特点在信息可视化研究中占有非常重要的地位。本文在对可视化数据探索的概念、相关技术进行阐述的基础上用了三个例子来说明可视化数据探索的应用及其在情报分析中的作用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号