Label data cleaning device and method

A technology for labeling data and cleaning devices, applied in the field of data processing, can solve problems such as poor learning performance, achieve the effects of improving efficiency, improving cleaning quality, and simplifying the operation process

Active Publication Date: 2020-05-19
SHANGHAI YITU NETWORK SCI & TECH
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

This technology divides up an annotated dataset into smaller subsets for different stages or tests before collecting any errors that may occur during this stage's performance evaluation. These subsets then become part of larger datasets called validation databases where they help identify specific issues with the models being evaluated by them. By repeating these steps multiple times over, it becomes possible to gather more accurate results without requiring specialized equipment like labelers. Additionally, there will also be fewer cleanings needed per iteration compared to traditional approaches such as random sampling techniques. Overall, this method improves both accuracy and effectiveness while reducing complexity and cost associated with running large amounts of data collection on real world systems.

Problems solved by technology

This patented technology describes methods used by labels to improve their accuracy during processing. However, these techniques have limitations such as requiring human interventions when selecting good ones manually or require complex algorithms designed specifically for each dataset being processed. Additionally, there may still exist some bad samples due to previous analysis steps afterward. These issues can lead to errors in analyzers' decisions made about how well they perform with respectable efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Label data cleaning device and method
  • Label data cleaning device and method
  • Label data cleaning device and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0052] It should be noted that in this article, relative terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these No such actual relationship or order exists between entities or operations.

[0053] The inventors of this solution found

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an annotation data cleaning device which comprises an annotation database, an algorithm model structure, a data equally-dividing module, a sub-data-set setting module, a training module and a testing module. The sub-data set setting module sequentially selects one sub-data set from the N sub-data sets as a sub-training set, the remaining sub-data sets are sub-test sets respectively, and one sub-training set and the N-1 sub-test sets form a training test group. And the training module adopts the sub-training sets in each training test group to train an algorithm model structure. And the test module adopts each sub test set to test the corresponding trained algorithm model in sequence and form a test result, collects all error examples from the test result and cleansthe labeled data corresponding to the error examples. The invention further discloses an annotation data cleaning method. According to the invention, automatic cleaning of labeled data can be realized, and the cleaning efficiency and the cleaning quality can be improved.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner SHANGHAI YITU NETWORK SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products