Paragraph merging method, device, storage medium and electronic equipment

一种段落、同一的技术,应用在存储介质及电子设备,装置,段落合并的一种方法领域,能够解决很难正确划分段落、限制划分段落准确率等问题,达到判断结果准确、解放人力、优化准确率的效果

Active Publication Date: 2019-10-22
BEIJING SHANNON HUIYU TECH CO LTD
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing methods for segmenting document paragraphs are generally based on the indentation and line spacing of document sentences, but this method is limited to simple plain text documents, due to the insertion of images, cross-page , inserting tables, etc. cause a paragraph in the document to contain long-distance sentences, making it difficult to divide the paragraphs correctly, which limits the accuracy when dividing paragraphs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] In describing the present invention, it should be understood that the terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", " Orientation or position indicated by "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. The relationship is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, therefore It should not be construed as a limitation of the present invention.

[0053] In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a paragraph merging method, a paragraph merging device, a storage medium and electronic equipment. The method comprises the steps: determining a position vector and a semantic vector of text data; sequentially selecting a plurality of target text data from the document content; determining a hidden layer vector of the target text data; judging whether the target text data and other target text data belong to the same paragraph or not according to the hidden layer vector of the target text data; then sequentially selecting the target text data again, and repeating the process until all text data in the document content is traversed; and counting all judgment results, and combining all text data belonging to the same paragraph into one paragraph according to a positionsequence. According to the paragraph merging method, the paragraph merging device, the storage medium and the electronic equipment provided by the embodiment of the invention, the judgment basis comprises the position vector and the semantic vector. The context semantic information in a larger range can be considered. The judgment result is more accurate, so that the paragraph merging accuracy can be optimized.

Description

technical field [0001] The present invention relates to the technical field of text processing, in particular, to a method, device, storage medium and electronic equipment for merging paragraphs. Background technique [0002] With the application and development of information technology, people write and create more and more documents, and the text content in the documents is also more and more diverse. In the field of machine learning, documents are generally divided by sentences; when information extraction needs to search for relevant information, due to the large number of sentences, dividing documents by sentences will lead to low extraction efficiency. At this time, sentences in the same paragraph are generally merged , so that the content of the document is divided into paragraphs, which is convenient for searching relevant information during information extraction, and can effectively improve the efficiency of information extraction. [0003] Existing methods for s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/253G06F40/211G06F40/279G06F40/30
Inventor 任翔远
Owner BEIJING SHANNON HUIYU TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products