Multi-threaded intelligent scheduling of high-anonymity crawler system
An intelligent scheduling and crawler system technology, applied in the field of computer networks, can solve problems such as being blocked, account blocked, website restrictions, etc., and achieve the effect of fast and efficient aggregation and improved crawling efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0036] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.
[0037] see figure 1 As shown, the present invention discloses a high-aware crawler system with multi-thread intelligent scheduling, which is used for efficient crawling when the target website has a certain "anti-crawling strategy", improving the crawling efficiency and robustness of the crawler. performance and stability in a distributed crawler system environment, and then quickly and efficiently aggregate web page information and build a huge retrieval library.
[0038] The high-aware crawler system with multi-thread intelligent scheduling mainly includes the following six modules: proxy IP pool module, Cookies pool module, resource scheduling module, multi-thread crawler module, task queue generation module and background management module.
[0
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap