Welcome![Sign In][Sign Up]
Location:
Search - lucene crawler

Search list

[JSP/Javalucene

Description: lucene 是java 的版的搜索引擎公共模块, 本人使用此模块, 已经开发实现了网页的抓取。 -is java version of Lucene search engine public module, I use this module, has developed a web crawler.
Platform: | Size: 395264 | Author: chenbaoji | Hits:

[Search EngineIndexFiles

Description: 基于Lucene的网页生成工具,对于有网页爬行器从网络上下载下来的网页库,本软件可以对他们进行网页索引生成,生成网页索引是搜索引擎设计中核心的部分之一。也称网页预处理子系统。本程序用的是基于lucene而设计的。-Lucene-based web page generation tool, for Crawler has pages downloaded from the web page database, the software can index their web pages to generate, generate web pages search engine index is part of the design of one of the core. Also known as pre-processing subsystem website. This procedure used is based on the Lucene designed.
Platform: | Size: 3340288 | Author: 纯哲 | Hits:

[Search EngineAnalyzerViewer_source

Description: Lucene.Net is a high performance Information Retrieval (IR) library, also known as a search engine library. Lucene.Net contains powerful APIs for creating full text indexes and implementing advanced and precise search technologies into your programs. Some people may confuse Lucene.net with a ready to use application like a web search/crawler, or a file search application, but Lucene.Net is not such an application, it s a framework library. Lucene.Net provides a framework for implementing these difficult technologies yourself. Lucene.Net makes no discriminations on what you can index and search, which gives you a lot more power compared to other full text indexing/searching implications you can index anything that can be represented as text. There are also ways to get Lucene.Net to index HTML, Office documents, PDF files, and much more.-Lucene.Net is a high performance Information Retrieval (IR) library, also known as a search engine library. Lucene.Net contains powerful APIs for creating full text indexes and implementing advanced and precise search technologies into your programs. Some people may confuse Lucene.net with a ready to use application like a web search/crawler, or a file search application, but Lucene.Net is not such an application, it s a framework library. Lucene.Net provides a framework for implementing these difficult technologies yourself. Lucene.Net makes no discriminations on what you can index and search, which gives you a lot more power compared to other full text indexing/searching implications you can index anything that can be represented as text. There are also ways to get Lucene.Net to index HTML, Office documents, PDF files, and much more.
Platform: | Size: 320512 | Author: Yu-Chieh Wu | Hits:

[Search EngineCrawler_src_code

Description: 网页爬虫(也被称做蚂蚁或者蜘蛛)是一个自动抓取万维网中网页数据的程序.网页爬虫一般都是用于抓取大量的网页,为日后搜索引擎处理服务的.抓取的网页由一些专门的程序来建立索引(如:Lucene,DotLucene),加快搜索的速度.爬虫也可以作为链接检查器或者HTML代码校验器来提供一些服务.比较新的一种用法是用来检查E-mail地址,用来防止Trackback spam.-A web crawler (also known as a web spider or ant) is a program, which browses the World Wide Web in a methodical, automated manner. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a web site, such as checking links, or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).
Platform: | Size: 55296 | Author: lisi | Hits:

[JSP/Javaonto

Description: 通过建立领域本体片段,和使用lucene 技术,实现对互联网主题信息的采集和存储。-based—Ontology topic crawler,use API of lucene and database to implete the fuction of collection and storage of topic information on web.
Platform: | Size: 27648 | Author: 吕西 | Hits:

[JSP/JavaMySearch

Description: lucene htmlparser paoding customSpider webservice 一个完整的基于lucene工具包和庖丁分词加自定义实现爬虫分析数据的搜索引擎,少量改动即可使用-lucene htmlparser paoding customSpider webservice a complete tool kits and Paoding lucene-based word plus a custom analysis of data to achieve a search engine crawler
Platform: | Size: 44039168 | Author: zhangming | Hits:

[JSP/JavaLireV2.0.1

Description: lire 是基于lucene的图片搜索技术,很强大 ,很强大 ,很强大。-Internet Archive Web Crawler The archive-crawler project is building a flexible, extensib
Platform: | Size: 2829312 | Author: | Hits:

[Search Engine4pm

Description: 本文用lucene和Heritrix构建了一个Web 搜索应用程序 Lucene 是基于 Java 的全文信息检索包,它目前是 Apache Jakarta 家族下面的一个开源项目。 Lucene很强大,但是,无论多么强大的搜索引擎工具,在其后台,都需要一样东西来支援它,那就是网络爬虫Spider。网络爬虫,又被称为蜘蛛Spider,或是网络机器人、BOT等,这些都无关紧要,最重要的是要认识到,由于爬虫的存在,才使得搜索引擎有了丰富的资源。 Heritrix是一个纯由Java开发的、开源的Web网络爬虫,用户可以使用它从网络上抓取想要的资源。它来自于www.archive.org。Heritrix最出色之处在于它的可扩展性,开发者可以扩展它的各个组件,来实现自己的抓取逻辑。-In this paper, lucene and Heritrix build a Web search application Lucene is a Java-based full-text information retrieval package, it is now the Apache Jakarta family, following an open source project. Lucene is very powerful, but, no matter how powerful search engine tool, in its background, we need something to support it, that is, Web crawler Spider. Web crawlers, also known as Spider Spider, or robot network, BOT, etc., which are insignificant, the most important thing is to recognize that, due to the presence of reptiles, which makes the search engine there are plenty of resources. Heritrix is a pure Java developed by the, open source Web crawler, the user can use it to grab you want from the network resources. It comes from www.archive.org. Heritrix is that it is the best scalability, developers can extend its various components, to achieve their capture logic.
Platform: | Size: 2989056 | Author: 曹志聪 | Hits:

[Internet-NetworkLucene

Description: 小型搜索引擎,实现网络爬虫,下载网页,建立网页索引,提供关键字搜索-Small search engine Web crawler, download page, create web pages index and keyword search
Platform: | Size: 1439744 | Author: | Hits:

[JSP/Javalucene

Description: 这是java 版的搜索引擎公共模块, 本人使用此模块,已经开发实现了网页的抓取。-java lucene is the public version of the search engine module, I use this module has been developed to achieve a web crawler.
Platform: | Size: 2239488 | Author: 付平 | Hits:

[Internet-NetworkCquNews

Description: 这是一个基于lucene的新闻搜索引擎,使用Java编写的网络爬虫抓取数据-This is based on a news lucene search engine, written in Java Web crawler to crawl data
Platform: | Size: 13422592 | Author: 顾佳诚 | Hits:

CodeBus www.codebus.net