Web crawler method and system based on improved pagerank

A web crawler and webpage technology, applied in the Internet field, can solve problems such as lower crawler efficiency, and achieve fast data collection, improved crawler efficiency, and strong pertinence

Active Publication Date: 2022-04-29
HEFEI UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

This technology uses advanced techniques likePageRank algorithms for calculating weights from documents or images that are being searched over them. It also suggests controlling how frequently certain words within these files take up space when displayed next. By analyzing this we aimed towards achieving better searching performance with less duplicated effort compared to traditional methods.

Problems solved by technology

This patents discuss two technical problem addressed in this patented technology called "webspiders," where searching large amounts of data requires manual effort.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web crawler method and system based on improved pagerank
  • Web crawler method and system based on improved pagerank
  • Web crawler method and system based on improved pagerank

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further clarified below in conjunction with specific drawings.

[0038] In the prior art, although the web crawler has the ability to automatically extract webpage information, there is a problem that some pages reuse keywords to improve the search ranking; for this reason, the technical concept of the present invention includes: using the PageRank algorithm in the webpage crawler, according to The access relationship between crawled webpages generates a relationship matrix, and then generates an initial probability matrix according to the number of webpages, and finally iteratively calculates the webpage weights, and outputs the convergence results in descending order. Based on the above method, the problem of reusing keywords in some pages in the web crawler to improve the search ranking is solved.

[0039] Specifi

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention belongs to the technical field of the Internet, and in particular relates to a web crawler method and system based on improved PageRank. The method includes: (1) crawling web pages; (2) obtaining web page relationships; (3) obtaining relationship matrices; 4) Obtain the initial probability matrix; (5) PageRank calculation; after obtaining the relationship matrix, the initial probability matrix and the damping coefficient, calculate the PR value of the web page, and iteratively calculate until the probability matrix converges and terminate the iteration; by the method provided by the present invention, solve The problem of web page deception in which some pages reuse keywords in web crawlers to improve search rankings. The web crawler system provided by the present invention is easy to use and has easy-to-accept storage, conversion, and calculation forms, and can efficiently and quickly calculate the weight of each web page; at the same time, it includes a relatively complete crawler program, which shows the crawler in real life. Applications, including picture and file download, Baidu Encyclopedia search, video playback, web page relationship visualization and other functions.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner HEFEI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products