Method for clustering network-based short texts

A clustering method and text clustering technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve the problems of few clustering studies, unsatisfactory clustering results, Value is very sensitive and other issues, to achieve the effect of high clustering accuracy, ideal clustering effect, and strong practicability

Active Publication Date: 2015-08-26
QILU UNIV OF TECH
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] Grid-based clustering method (Clique clustering method, etc.) because the processing time of grid clustering is related to the number of cells divided in each dimension space, it is sensitive to isolated point processing and cannot handle large data, so to a certain extent Reduced the quality and accuracy of algorithm clustering;
[0010] The more classic partition-based clustering method is the traditional K-means clustering method, because the initial clustering center is randomly selected, which will reduce the accuracy of the clustering results, and the alg

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0062] Example:

[0063] 1. Experiment with TFIDF formula for weight calculation in preprocessing.

[0064] In this paper, user comment information is obtained from Zhongguancun Online as the experimental data set. First, the traditional TFIDF formula is used for calculation. The experimental data set is segmented by the Chinese Academy of Sciences word segmentation software ICTCLAS. Table 1 below is the result of removing stop words from the experimental part of the text.

[0065]

[0066] Now we select the first text in Table 1 after removing the stop words and use the original TFIDF formula to calculate the weight of their feature items. The results are shown in Table 2 below.

[0067]

[0068] From the number of texts containing feature items in text one, it can be seen that the highest number is not necessarily the most important. Therefore, although some words contain a large number of texts, they are not important keywords to distinguish texts. It can be seen that

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a method for clustering network-based short texts. The specific implementation process comprises: firstly acquiring a network-based comment; pre-processing the acquired network-based comment, wherein the pre-processing comprises performing word segmentation on the network-based comment, then removing the word that is not used, segmenting a keyword, and performing weighted calculation on the keyword; and clustering the pre-processed texts. The method for clustering network-based short texts, as compared with the prior art, implements collection and analysis of massive data over the network, such that a user conveniently searches for valued information. With this method, the precision in clustering the network-based short texts is high, thereby accommodating practical needs of the user. Therefore, the method according to the present invention has great practicability and can be simply promoted.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products