If you're reading this, chances are you've seen a Tokenizer robot visiting your site while looking through your server logs. Our software obeys robots.txt files and robot META tags in HTML. These are the standard mechanisms for webmasters to tell web robots which portions of a site a robot is welcome to access.
We'd like to hear about any bad behavior. We can be reached at agent[at]tokenizer.org
Our software obeys the robots.txt exclusion standard, described at http://www.robotstxt.org/wc/exclusion.html#robotstxt. To ban Tokenizer-crawler from your site, place the following in your robots.txt file:
User-agent: Tokenizer Disallow: /
If you do not have permission to edit the /robots.txt file on your server, you can still tell robots not to index your pages or follow your links. The standard mechanism for this is the robots META tag, as described at http://www.robotstxt.org/wc/meta-user.html.
If your site has problems or questions about the Tokenizer crawler, please send an email to the Tokenizer.
If you have any technology related questions: fuad[at]efendi.ca is an independent consultant and open source contributor specializing in enterprise software development, data mining, natural language processing, and search. For instance, you may wish to implement your own SOLR-based Faceted Browsing on your website. SOLR is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.
The Shopping Price Engine is currently powered by 4 hardware boxes: Quad-Core Opterons, SCSI, Seagate Cheetah 15k.5, SLES 10. Thanks to AMD: their chips outperform Intel 100 times with memory access routines.
You may wonder how fast is that including network latency: less than 0.4 seconds average response time over MxDSL which is 4 ADSL lines running in parallel. We need only high speed internet connection (such as dedicated 100Mbps synchronous), and only to increase frequency of crawl.