首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
王云丰 《科技广场》2007,(11):252-253
高校就业派遣工作是一项重要的日常工作,需要的报表种类较多,如派遣表、档案表、户口迁移表等等。原有的系统只能做一些就业登记工作,因此不能很好的满足工作的实际需要。本文就这些问题作一个初步的探讨,为以后进一步完善打下良好的基础。  相似文献   

2.
目录管理用于在操作系统下以文件视图的方式提供整个存储系统的文件和目录。目录管理是操作系统里最频繁的操作。文件系统的用户如果想要知道某一目录下面的文件内容,或者想要得到某一目录下面的所有文件和目录的元数据信息,文件系统调用目录管理模块。结合Windows下的对象存储系统客户端文件系统HIFS(Hust Installable File System),进行HIFS系统的目录管理设计。  相似文献   

3.
针对嵌入式系统应用,在分析传统循环冗余校验算法原理的基础上,提出一种基于查表法的循环冗余校验算法,在保障嵌入式系统数据可靠性的同时,提高数据校验处理速度。  相似文献   

4.
韩姣红 《中国科技信息》2005,(16A):47-47,43
对数据库的交叉表的实现有很多的实现方案,但是以前的实现方法不能在作为字段名称的原表中的字段数据未如时实现交叉表,本文提出的这种实现方法能解决这一情况,并结合实例介绍这种方案的具体实现。  相似文献   

5.
Along with the proliferation of big data technology, organizations are involved in an overwhelming data ocean, the huge volume of data makes them at a loss in the face of frequent data breaches due to their failure of efficient data security management. Data classification has become a hot topic as a cornerstone of data protection especially in China in recent years, by categorizing information types and distinguishing protective measures at different classification levels. Both the text and tables of the promulgated data classification-related regulations (for simplicity, laws, regulations, policies, and standards are collectively referred to as “regulations”) contain a wealth of valuable information which can guide the work of data classification. To best assist data practitioners, in this paper, we automatically “grasp” expert experience on how to classify data from the analysis of such regulations. We design a framework, GENONTO, that automatically extracts data classification practices (DCPs), such as information types and their corresponding sensitive levels to construct an information type lexicon as well as to encode a generic ontology on top of 38 real-world regulations promulgated in China. GENONTO employs machine learning techniques and natural language processing (NLP) to parse unstructured text and tables. To our knowledge, GENONTO is the first work that explores critical information like the category and the sensitivity of information types from regulations, and organizes them in a structured form of ontology, characterizing the subsumptive relations between different information types. Our research helps provide a well-defined integrated view across regulations and bridges the gap between what experts say and how data practitioners do.  相似文献   

6.
Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine.  相似文献   

7.
随着当今工业生产和科学研究对数据应用要求的日益提高,数据库技术已成为控制系统重要部分。本文依据国家机关办公建筑和大型公共建筑能耗监测数据库的相关技术导则对SQLServer数据库的组成进行了总结和概述,并按照相关技术导则的要求论述了能耗数据特点与数据表格的组成,对数据库表结构设计方法作了论述。该系统可操作性强,通用性好,快捷方便。因此,该研究成果将具有广泛的发展和应用前景。  相似文献   

8.
Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e., an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.  相似文献   

9.
Figures and tables in scientific articles serve as data sources for various academic data mining tasks. These tasks require input data to be in its entirety. However, existing studies measure the performance of algorithms using the same IoU (Intersection over Union) or IoU-based metrics that are used for natural situations. There is a gap between high IoU and detection entirety in scientific figures and tables detection tasks. In this paper, we demonstrate the existence of this gap and suggest that the leading cause is the detection error in the boundary area. We propose an effective detection method that cascades semantic segmentation and contour detection. The semantic segmentation model adopted a novel loss function to enhance the weights of boundary parts and a categorized dice metric to evaluate the imbalanced pixels in the segmentation result. Under rigorous testing criteria, the method proposed in this paper yielded a page-level F1 of 0.983 exceeding state-of-the-art academic figure and table detection methods. The research results in this paper can significantly improve the data quality and reduce data cleaning costs for downstream applications.  相似文献   

10.
We propose a new query reformulation approach, using a set of query concepts that are introduced to precisely denote the user’s information need. Since a document collection is considered to be a domain which includes latent primitive concepts, we identify those concepts through a local pattern discovery and a global modeling using data mining techniques. For a new query, we select its most associated primitive concepts and choose the most probable interpretations as query concepts. We discuss the issue of constructing the primitive concepts from either the whole corpus or from the retrieved set of documents. Our experiments are performed on the TREC8 collection. The experimental evaluation shows that our approach is as good as current query reformulation approaches, while being particularly effective for poorly performing queries. Moreover, we find that the approach using the primitive concepts generated from the set of retrieved documents leads to the most effective performance.  相似文献   

11.
随着信息时代的发展,数据库的信息容量也在飞速提升。为了解决数据库中越来越多的超大表的使用和维护问题,数据库引入了分区技术。全文介绍了Oracle数据库分区的概念、表分区、索引分区以及分区策略,并结合实例给出了Oracle数据库分区具体应用。  相似文献   

12.
论数据库检索系统用于文献计量分析   总被引:6,自引:1,他引:5  
陈光祚 《情报科学》1998,16(2):122-127
笔者自建了一个CDS/ISIS软件支持的、包括6万多篇文献记录的图书馆学、情报学、文献学、档案学的书目数据库,该库不仅收录期刊论文,而且也收录专著、教材、工具书、会议文献、学位论文以及大型手册、教材中的部分篇章子目,使之形成包含多种类型文献的综合性数据库,并开发利用检索软件中固有的功能,例如建立子库法、编辑倒排文件法、关键词标引与题名中单汉字检索相结合以增强检索语言性能的方法、后控词表(ANY词表)法、对文献发表年代的数值字段函数检索法等等,就80年代以来我国图书情报学科群的文献进行了各种指标的文献计量分析。笔者认为,基于包含多种类型文献、时间跨度较长的数据库检索系统,并应用各种软件功能的文献计量方法,是我国文献计量分析的发展方向。  相似文献   

13.
Automatic document summarization using citations is based on summarizing what others explicitly say about the document, by extracting a summary from text around the citations (citances). While this technique works quite well for summarizing the impact of scientific articles, other genres of documents as well as other types of summaries require different approaches. In this paper, we introduce a new family of methods that we developed for legal documents summarization to generate catchphrases for legal cases (where catchphrases are a form of legal summary). Our methods use both incoming and outgoing citations, and we show how citances can be combined with other elements of cited and citing documents, including the full text of the target document, and catchphrases of cited and citing cases. On a legal summarization corpus, our methods outperform competitive baselines. The combination of full text sentences and catchphrases from cited and citing cases is particularly successful. We also apply and evaluate the methods on scientific paper summarization, where they perform at the level of state-of-the-art techniques. Our family of citation-based summarization methods is powerful and flexible enough to target successfully a range of different domains and summarization tasks.  相似文献   

14.
分析了传统的设计方法在企业零件设计中存在的问题;研究了利用Pro/E的族表法进行零件的参数化设计;提出了这种方法在零件设计中的优势。  相似文献   

15.
Existing unsupervised keyphrase extraction methods typically emphasize the importance of the candidate keyphrase itself, ignoring other important factors such as the influence of uninformative sentences. We hypothesize that the salient sentences of a document are particularly important as they are most likely to contain keyphrases, especially for long documents. To our knowledge, our work is the first attempt to exploit sentence salience for unsupervised keyphrase extraction by modeling hierarchical multi-granularity features. Specifically, we propose a novel position-aware graph-based unsupervised keyphrase extraction model, which includes two model variants. The pipeline model first extracts salient sentences from the document, followed by keyphrase extraction from the extracted salient sentences. In contrast to the pipeline model which models multi-granularity features in a two-stage paradigm, the joint model accounts for both sentence and phrase representations of the source document simultaneously via hierarchical graphs. Concretely, the sentence nodes are introduced as an inductive bias, injecting sentence-level information for determining the importance of candidate keyphrases. We compare our model against strong baselines on three benchmark datasets including Inspec, DUC 2001, and SemEval 2010. Experimental results show that the simple pipeline-based approach achieves promising results, indicating that keyphrase extraction task benefits from the salient sentence extraction task. The joint model, which mitigates the potential accumulated error of the pipeline model, gives the best performance and achieves new state-of-the-art results while generalizing better on data from different domains and with different lengths. In particular, for the SemEval 2010 dataset consisting of long documents, our joint model outperforms the strongest baseline UKERank by 3.48%, 3.69% and 4.84% in terms of F1@5, F1@10 and F1@15, respectively. We also conduct qualitative experiments to validate the effectiveness of our model components.  相似文献   

16.
In order to evaluate the effectiveness of Information Retrieval (IR) systems it is key to collect relevance judgments from human assessors. Crowdsourcing has successfully been used as a method to scale-up the collection of manual relevance judgments, and previous research has investigated the impact of different judgment task design elements (e.g., highlighting query keywords in the document) on judgment quality and efficiency. In this work we investigate the positive and negative impacts of presenting crowd human assessors with more than just the topic and the document to be judged. We deploy different variants of crowdsourced relevance judgment tasks following a between-subjects design in which we present different types of metadata to the human assessor. Specifically, we investigate the effect of human metadata (e.g., what other human assessors think of the current document, as in which relevance level has already been selected by the majority crowd workers), machine metadata (e.g., how IR systems scored this document such as its average position in ranked lists, statistics about the document such as term frequencies). We look at the impact of metadata on judgment quality (i.e., the level of agreement with trained assessors) and cost (i.e., the time it takes for workers to complete the judgments) as well as at how metadata quality positively or negatively impact the collected judgments.  相似文献   

17.
In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term length. The weight scheme allows randomly setting a larger portion of the retrieved documents as relevance feedback, and lifts off the worry that very fewer relevant documents appear in top retrieved documents. It also helps to improve the performance of maximal marginal relevance (MMR) in document reranking. The method was evaluated by MAP (mean average precision), a recall-oriented measure. Significance tests showed that our method can get significant improvement against standard baselines, and outperform relevant methods consistently.  相似文献   

18.
本遥控方便桌属机械电子类,病人不能下床搬桌,通过遥控装置方便灵活移动与升降桌子。该项目已获实用新型专利证书,专利号:ZL201220436108.1。①该桌配备红外线遥控器控制功能,感应器反应快速。病人操作简单,使用方便。②桌面电动升降,安全可靠。根据病人医治、看书和上网所需,手动或自动遥控桌子进退,左右转及桌面升降。③利用单片机根据程序使桌子有自动行进,病人只按遥控器中一个按键,桌子即自动进退转弯和停止。④轻巧牢固,节约空间,方便携带,可叠放。桌子整体重约5kg。  相似文献   

19.
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system—the Query, Cluster, Summarize (QCS) system—which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic.We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines.Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence “trimming” and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format.Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.  相似文献   

20.
用Pro/E族表的设计方法进行产品系列化研究,能极大地提高设计效率,具有很强的实用价值。本文介绍Pro/E标准件库开发遵循的基本方法,通过实际典型例子,介绍了直接利用Pro/E的族表系列化产品及创建螺栓和螺母标准件的过程,具有较强的可读性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号