Larbin: Multi-purpose web crawler

Introduction

Larbin is a web crawler (also called (web) robot, spider, scooter…). It is intended to fetch a large number of web pages to fill the database of a search engine. With a network fast enough, Larbin should be able to fetch more than 100 millions pages on a standard PC.

Larbin is (just) a web crawler, NOT an indexer. You have to write some code yourself in order to save pages or index them in a database.

Larbin was initially developped for the XYLEME project in the VERSO team at INRIA. The goal of Larbin was to go and fetch xml pages on the web to fill the database of an xml-oriented search engine. Thanks to its origins, Larbin is very generalistic (and easy to customize).

How to use Larbin
How to customize Larbin

发表评论

电子邮件地址不会被公开。 必填项已用*标注

Sections

Shows

Local News

Tools

About Us

Follow Us

跳至工具栏