首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 40 毫秒
1.
《Research Policy》2023,52(3):104706
Concern that the selection of research projects by peer review disfavors risky science has called attention to ways to incorporate risk into the evaluation of research proposals. This discussion often occurs in the absence of well-defined and developed concepts of what risk and uncertainty mean in science. This paper sets out to address this void with the goal of providing building blocks to further the discussion of the meaning of risk and uncertainty in science. The core contributions of the paper are fourfold. First, we outline the meaning of risk in science, drawing on insights from literatures on risk and uncertainty. Second, based on this outline, we discuss possible ways in which programs can embrace a more comprehensive concept of risk and embed it in peer review of proposals, with the goal of not penalizing risky research proposals with the potential of high return when funding decisions are made. Third, we make an important distinction between research projects involving high-risk and research projects whose evaluation is subjected to ambiguity/radical uncertainty. Fourth, we discuss possible ways of addressing ambiguity/radical uncertainty by funding agencies.  相似文献   

2.
《Research Policy》2023,52(3):104707
In their Discussion Paper, Franzoni and Stephan (F&S, 2023) discuss the shortcomings of existing peer review models in shaping the funding of risky science. Their discussion offers a conceptual framework for incorporating risk into peer review models of research proposals by leveraging the Subjective Expected Utility (SEU) approach to decouple reviewers' assessments of a project's potential value from its risk. In my Response, I build on F&S's discussion and attempt to shed light on three additional yet core considerations of risk in science: 1) how risk and reward in science are related to assessments of a project's novelty and feasibility; 2) how the sunk cost literature can help articulate why reviewers tend to perceive new research areas as riskier than continued investigation of existing lines of research; and 3) how drawing on different types of expert reviewers (i.e., based on domain and technical expertise) can result in alternative evaluation assessments to better inform resource allocation decisions. The spirit of my Response is to sharpen our understanding of risk in science and to offer insights on how future theoretical and empirical work—leveraging experiments— can test and validate the SEU approach for the purposes of funding more risky science that advances the knowledge frontier.  相似文献   

3.
With the rapid development of internet, text data is becoming richer, but most part of them is unstructured. So compared to statistics data, the text data is more difficult to be utilized. How to apply the informetrics on financial network text mining is a supplement to the traditional research methods of finance. The paper tries to forecast exchange rate volatility through informetrics on financial network text mining by means of affective computing. We find that if the amount of informetrics on network is used during predicting, only the peak and valley values of its volatility and are synchronous with the volatility of exchange rate. While the volatility of emotional intensity of words of informetrics on network in text data can accurately predict not only the drastic volatility of exchange rate, but also the moderate volatility.  相似文献   

4.
5.
运用数据挖掘技术得到旅游文本属性与特征已成为旅游研究的重要领域,对旅游微博发文主题的研究有助于旅游机构形象塑造及内容传播推广,对旅游机构的微博信息供给及旅游形象的提升具有一定意义。本研究首先对内容分析法在旅游研究的运用状况及国内外旅游微博相关方面的研究进行梳理;其次,以国家旅游局新浪微博的网络文本内容为研究对象,借助Rost word parser词频分析软件提取网络文本的高频特征词并进行筛选;再次,采用内容分析法,结合社会网络及共词分析法,得到网络文本高频词之间的社会网络联系;最后,探索高频词的属性及其之间的联系特征,在国家旅游局新浪微博的内容分析基础上,提炼出其微博内容分为人文景观、自然景观、游客出行、旅游政务信息4个主题。  相似文献   

6.
Assessments of quality and productivity of academic research programs become more and more important in gaining financial support, in hiring and promoting research staff, and in building academic reputation. Most assessments are based on peer review or on bibliometric information. In this paper we analyze both bibliometric data and peer review assessments of 169 research groups in economics, econometrics and business administration. The evaluations are achieved in two independent rounds in 1995 and in 2001, permitting replication of our study.The purpose of this study is twofold. In the first part we want to see to what degree bibliometric information relates to peer review judgments. The results convey how evaluators weight different output categories in their final overall judgment of academic quality. The results also have practical meaning, since they indicate what the predictive ability of bibliometric data is for future peer review outcomes. In the second part of this study we aim at explaining differences in research output quality and productivity by organizational factors, like size of the research group, composition of staff, sources of research funding and academic discipline. In this part, a composite indicator is used to represent the review committees’ overall assessment. The bibliometric data most strongly related to the peer reviews’ overall assessment are used to construct data envelopment analyses’ efficiency scores as measure of research productivity.The main conclusions from our study are that the number of publications in international top journals is the best predictor of peer review assessment results. Changes in the classification of bibliometric information, as introduced in the second evaluation round, do not alter this conclusion. Size of the research group appears to be the only permanent characteristic associated with research quality and productivity. Size is positively related to research quality but negatively related to research productivity. Larger groups appear to have the potential to improve quality, but as groups become larger, they also experience problems in maintaining the research productivity of the research team's members. The remaining organizational characteristics appear to be temporarily related to research quality and productivity. In the first evaluation round, research productivity and quality are associated with the discipline variable: research programs in more quantitative areas and characterized by a higher level of paradigm development like econometrics and operations research achieved higher levels of research quality and productivity than programs in more diverse and less quantitative areas like business administration. This relation however is not permanent, since it becomes insignificant in the second evaluation round. Instead, funding relations become more apparent in the second review round. The relative amount of national funding in the research group's funding becomes positively related to academic quality, whereas the portion of income from committed research is negatively related to academic quality of the programs’ research output. This may have been caused by the increased importance of alternative sources of research funding in the period of the second review.  相似文献   

7.
Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.  相似文献   

8.
从法律制度体系、对利益冲突的界定、利益冲突类型及处理措施、豁免规定等方面,深入剖析了美国国立卫生研究院同行评议活动中的利益冲突管理体制机制及其规律,总结了其利益冲突管理方面的经验,以期对促进我国科学基金同行评议制度建设提供借鉴参考。  相似文献   

9.
Both structured and unstructured data, as well as structured data representing several different types of tuples, may be integrated into a single list for browsing or retrieval. Data may be arranged in the Gray code order of the features and metadata, producing optimal ordering for browsing. We provide several metrics for evaluating the performance of systems supporting browsing, given some constraints. Metadata and indexing terms are used for sorting keys and attributes for structured data, as well as for semi-structured or unstructured documents, images, media, etc. Economic and information theoretic models are suggested that enable the ordering to adapt to user preferences. Different relational structures and unstructured data may be integrated into a single, optimal ordering for browsing or for displaying tables in digital libraries, database management systems, or information retrieval systems. Adaptive displays of data are discussed.  相似文献   

10.
本文从"范式"概念出发,区分了科学研究中的两种创新方式,即累积式渐进和革命性突破,指出科学资助机构在依靠以寻求共识为特点的同行评议机制来遴选具有非共识特征的革命性创新研究项目中所面临的内在困境.美国国家科学基金会从设立小额探索性研究项目到支持变革性研究的政策变迁表明,科学资助机构不仅要进一步改进同行评议机制以有效甄别创...  相似文献   

11.
A Zipfian model of an automatic bibliographic system is developed using parameters describing the contents of it database and its inverted file. The underlying structure of the Zipf distribution is derived, with particular emphasis on its application to work frequencies, especially with regard to the inverted flies of an automatic bibliographic system. Andrew Booth developed a form of Zipf's law which estimates the number of words of a particular frequency for a given author and text. His formulation has been adopted as the basis of a model of term dispersion in an inverted file system. The model is also distinctive in its consideration of the proliferation of spelling errors in free text, and the inclusion of all searchable elements from the system's inverted file. This model is applied to the National Library of Medicine's MEDLINE. The model carries implications for the determination of database storage requirements, search response time, and search exhaustiveness.  相似文献   

12.
[目的/意义]近年来,科技文献资源呈爆炸性增长,海量科技文献中依旧存在大量非结构化摘要。非结构化摘要一方面不利于学者阅读与理解;另一方面不利于对摘要内部信息进行知识的自动化抽取和相应的检索。研究科技文献非结构化摘要的知识表示模型及其自动化抽取方法,对学者快速阅读和机器自动化处理具有重要意义。[方法/过程]文章在分析科技文献非结构化摘要结构的基础上,结合知识元本体理论,构建了一个面向科技文献非结构化摘要的知识元本体模型。通过分析非结构化摘要的写作特征,将文本按句子级划分为目的、方法、结果或结论三个要素,统计每个要素句中的线索词、句型和位置,建立相关规则库,根据本体模型和规则库构建相关抽取算法。最后,下载《计算机技术与发展》中的部分文献进行实验。[结果/结论]通过增加句型集和线索词集,完善了非结构化摘要的要素,构建了非结构化摘要知识元本体模型。实验结果表明,根据本文提出的模型能有效地对非结构化摘要中的知识元进行抽取。[局限]实验的不足之处是需要人工对摘要中的句型和线索词进行归纳总结。  相似文献   

13.
Argumentation mining is a rising subject in the computational linguistics domain focusing on extracting structured arguments from natural text, often from unstructured or noisy text. The initial approaches on modeling arguments was aiming to identify a flawless argument on specific fields (Law, Scientific Papers) serving specific needs (completeness, effectiveness). With the emerge of Web 2.0 and the explosion in the use of social media both the diffusion of the data and the argument structure have changed. In this survey article, we bridge the gap between theoretical approaches of argumentation mining and pragmatic schemes that satisfy the needs of social media generated data, recognizing the need for adapting more flexible and expandable schemes, capable to adjust to the argumentation conditions that exist in social media. We review, compare, and classify existing approaches, techniques and tools, identifying the positive outcome of combining tasks and features, and eventually propose a conceptual architecture framework. The proposed theoretical framework is an argumentation mining scheme able to identify the distinct sub-tasks and capture the needs of social media text, revealing the need for adopting more flexible and extensible frameworks.  相似文献   

14.
Undoubtedly, the change in consumers’ choices and expectations, stemming from the emerging technology and also significant availability of different products and services, created a highly competitive landscape in various customer service sectors, including the financial industry. Accordingly, the Canadian banking industry has also become highly competitive due to the threats and disruptions caused by not only direct competitors, but also new entrants to the market.The primary objective of this paper is to construct a predictive churn model by utilizing big data, including the structured archival data, integrated with unstructured data from sources such as online web pages, the number of website visits and phone conversation logs, for the first time in the financial industry. It also examines the effect of different aspects of customers’ behavior on churning decisions. The Datameer big data analytics tool on the Hadoop platform and predictive techniques using the SAS business intelligence system were applied to study the client retirement journey path and to create a churn prediction model. By deploying the above systems, we were able to uncover a wealth of data and information associated with over 3 million customers’ records within the retiree segment of the target bank, from 2011 to 2015.  相似文献   

15.
Log parsing is a critical task that converts unstructured raw logs into structured data for downstream tasks. Existing methods often rely on manual string-matching rules to extract template tokens, leading to lower adaptability on different log datasets. To address this issue, we propose an automated log parsing method, PVE, which leverages Variational Auto-Encoder (VAE) to build a semi-supervised model for categorizing log tokens. Inspired by the observation that log template tokens often consist of words, we choose common words and their combinations to serve as training data to enhance the diversity of structure features of template tokens. Specifically, PVE constructs two types of embedding vectors, the sum embedding and the n-gram embedding, for each word and word combination. The structure features of template tokens can be learned by training VAE on these embeddings. PVE categorizes a token as a template token if it is similar to the training data when log parsing. To improve efficiency, we use the average similarity between token embedding and VAE samples to determine the token type, rather than the reconstruction error. Evaluations on 16 real-world log datasets demonstrate that our method has an average accuracy of 0.878, which outperforms comparison methods in terms of parsing accuracy and adaptability.  相似文献   

16.
The traditional machine learning systems lack a pathway for a human to integrate their domain knowledge into the underlying machine learning algorithms. The utilization of such systems, for domains where decisions can have serious consequences (e.g. medical decision-making and crime analysis), requires the incorporation of human experts' domain knowledge. The challenge, however, is how to effectively incorporate domain expert knowledge with machine learning algorithms to develop effective models for better decision making.In crime analysis, the key challenge is to identify plausible linkages in unstructured crime reports for the hypothesis formulation. Crime analysts painstakingly perform time-consuming searches of many different structured and unstructured databases to collate these associations without any proper visualization. To tackle these challenges and aiming towards facilitating the crime analysis, in this paper, we examine unstructured crime reports through text mining to extract plausible associations. Specifically, we present associative questioning based searching model to elicit multi-level associations among crime entities. We coupled this model with partition clustering to develop an interactive, human-assisted knowledge discovery and data mining scheme.The proposed human-centered knowledge discovery and data mining scheme for crime text mining is able to extract plausible associations between crimes, identifying crime pattern, grouping similar crimes, eliciting co-offender network and suspect list based on spatial-temporal and behavioral similarity. These similarities are quantified through calculating Cosine, Jacquard, and Euclidean distances. Additionally, each suspect is also ranked by a similarity score in the plausible suspect list. These associations are then visualized through creating a two-dimensional re-configurable crime cluster space along with a bipartite knowledge graph.This proposed scheme also inspects the grand challenge of integrating effective human interaction with the machine learning algorithms through a visualization feedback loop. It allows the analyst to feed his/her domain knowledge including choosing of similarity functions for identifying associations, dynamic feature selection for interactive clustering of crimes and assigning weights to each component of the crime pattern to rank suspects for an unsolved crime.We demonstrate the proposed scheme through a case study using the Anonymized burglary dataset. The scheme is found to facilitate human reasoning and analytic discourse for intelligence analysis.  相似文献   

17.
Data availability and access to various platforms, is changing the nature of Information Systems (IS) studies. Such studies often use large datasets, which may incorporate structured and unstructured data, from various platforms. The questions that such papers address, in turn, may attempt to use methods from computational science like sentiment mining, text mining, network science and image analytics to derive insights. However, there is often a weak theoretical contribution in many of these studies. We point out the need for such studies to contribute back to the IS discipline, whereby findings can explain more about the phenomenon surrounding the interaction of people with technology artefacts and the ecosystem within which these contextual usage is situated. Our opinion paper attempts to address this gap and provide insights on the methodological adaptations required in “big data studies” to be converted into “IS research” and contribute to theory building in information systems.  相似文献   

18.
Named entity recognition aims to detect pre-determined entity types in unstructured text. There is a limited number of studies on this task for low-resource languages such as Turkish. We provide a comprehensive study for Turkish named entity recognition by comparing the performances of existing state-of-the-art models on the datasets with varying domains to understand their generalization capability and further analyze why such models fail or succeed in this task. Our experimental results, supported by statistical tests, show that the highest weighted F1 scores are obtained by Transformer-based language models, varying from 80.8% in tweets to 96.1% in news articles. We find that Transformer-based language models are more robust to entity types with a small sample size and longer named entities compared to traditional models, yet all models have poor performance for longer named entities in social media. Moreover, when we shuffle 80% of words in a sentence to imitate flexible word order in Turkish, we observe more performance deterioration, 12% in well-written texts, compared to 7% in noisy text.  相似文献   

19.
This study investigates the possibilities offered by the development of graphene environment technology that can contribute to new and effective approaches for the transition to an ecologically benign ecosystem. To analyze the research and development progress in graphene environment technology, graphene environment technology research trends of South Korea for the years 2009–2020 are investigated by acquiring information pertaining to national research and development projects related to graphene environment technology from the National Science and Technology Information Service. Both structured and unstructured data are analyzed using diverse text-mining methods, such as keyword frequency analysis, association rule mining, and topic modelling. The results indicate that graphene research in South Korea is focused primarily on graphene use in batteries and energy-storage devices, such as solar cells, fuel cells, and secondary batteries. This study can help understand the manner by which the South Korean government has been investing in the research and development of graphene environment technology; additionally, it discusses the future applications and prospects of graphene for the next decade.  相似文献   

20.
[目的/意义]实体语义关系分类是信息抽取重要任务之一,将非结构化文本转化成结构化知识,是构建领域本体、知识图谱、开发问答系统、信息检索系统的基础工作。[方法/过程]本文详细梳理了实体语义关系分类的发展历程,从技术方法、应用领域两方面回顾和总结了近5年国内外的最新研究成果,并指出了研究的不足及未来的研究方向。[结果/结论]热门的深度学习方法抛弃了传统浅层机器学习方法繁琐的特征工程,自动学习文本特征,实验发现,在神经网络模型中融入词法、句法特征、引入注意力机制能有效提升关系分类性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号