Corpus cleaning method, device and equipment and medium

A corpus, to-be-cleaned technology, applied in the field of data science, can solve the problems of high labor consumption and high labor costs

Active Publication Date: 2019-05-10
THE FOURTH PARADIGM BEIJING TECH CO LTD
View PDF9 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Either way requires a lot of labor fo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus cleaning method, device and equipment and medium
  • Corpus cleaning method, device and equipment and medium
  • Corpus cleaning method, device and equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like numerals refer to like parts throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

[0041] figure 1 A flowchart showing a corpus cleaning method according to an exemplary embodiment of the present invention. figure 1 The shown method can be implemented by a computer program, and can also be implemented by a special corpus cleaning device.

[0042] In step S110, the sentence vector extraction model structure is obtained.

[0043] The sentence vector extraction model structure is used to extract the sentence vector of the input question or answer sentence. In the present invention, the sentence vector extraction model structure can be taken from a part of the question-answer pair model that has been trained in advance to evaluate the matching situation betw

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a corpus cleaning method, device and equipment and a medium, and the method comprises the steps: obtaining a sentence vector extraction model structure, taking the sentence vector extraction model structure from one part of a pre-trained question and answer pair model for assessing the matching condition of a question and an answer, and extracting a sentence vector of an input question or answer; extracting at least one part of corpora from all corpora serving as question and answer pairs to be cleaned; obtaining a labeling result of at least one part of corpus; a classification model is trained on the basis of a training set composed of at least one part of corpora and annotation results of the corpora, and the classification model evaluates whether the corpora is suitable for being used as a question and answer pair or not on the basis of sentence vectors extracted from question and answer pairs of the input corpora through a sentence vector extraction model structure; and screening out corpora suitable for being used as question and answer pairs from the unmarked corpora in all corpora by utilizing the trained classification model. Therefore, a large number of corpora with high quality can be obtained through a small number of manual annotations.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner THE FOURTH PARADIGM BEIJING TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products