Text segmentation method and device, electronic equipment and storage medium

一种文本、目标文本的技术,应用在电数字数据处理、仪器、计算等方向,能够解决工作流处理方法冗余庞大、句子生成有误差、词性涵盖不足等问题,达到保证准确性以及处理效率的效果

Active Publication Date: 2021-05-28
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This kind of processing workflow processing method is too redundant and huge, and word segmentation needs to rely on a huge thesaurus and word segmentation algorithm, and after word segmentation, it is necessary to return to the part of speech of word segmentation to generate sentences, and it also needs to rely on a huge part of speech model, and then combining sentences may also be due to Insufficient part-of-speech coverage or part-of-speech conflicts lead to errors in sentence generation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0029] A schematic diagram of the text segmentation method provided by the first embodiment of the present disclosure. Such as figure 1 As shown, the method includes:

[0030] S101: Divide the text to be processed based on punctuation marks to obtain L first clauses; L is an integer greater than or equal to 1;

[0031] S102: Determine M clauses to be output based on ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text segmentation method and device, electronic equipment and a storage medium, and relates to the field of information processing. According to the specific implementation scheme, a to-be-processed text is divided based on punctuation marks, and L first clauses are obtained; L is an integer greater than or equal to 1; M to-be-output clauses are determined based on the L first clauses, and the M to-be-output clauses are taken as segmentation results of the to-be-processed text; M is an integer greater than or equal to 1, wherein M to-be-output clauses are determined on the basis of the L first clauses, and the method comprises the steps that under the condition that the length of the ith first clause in the L first clauses is larger than a preset length threshold value, the ith first clause is processed on the basis of a matching rule, and the to-be-output clauses are obtained; and i is an integer greater than or equal to 1 and less than or equal to L.

Description

technical field [0001] The present disclosure relates to the field of information processing, in particular to the field of text information processing. Background technique [0002] In the prior art, the text is segmented, usually by first performing word segmentation and then generating clauses according to the part of speech. This kind of processing workflow processing method is too redundant and huge, and word segmentation needs to rely on a huge thesaurus and word segmentation algorithm, and after word segmentation, it is necessary to return to the part of speech of word segmentation to generate sentences, and it also needs to rely on a huge part of speech model, and then combining sentences may also be due to Insufficient part-of-speech coverage or part-of-speech conflicts lead to errors in sentence generation. Contents of the invention [0003] The disclosure provides a text segmentation method, device, electronic equipment and storage medium. [0004] According t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/289
CPCG06F40/211G06F40/289
Inventor 常炎隆
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products