I am building a web spider to crawl through several different sites, but one of them uses javascript buttons instead of links for several functions. And while I could learn to follow them, it adds an extra layer of complexity I would rather avoid if possible. But, on the other hand, the mobile site, easily accessible with m.example.com
, uses all pure HTML links.
So here is my question, assuming I follow all other reasonable rules of crawling, is there any reason it is specifically considered rude/malicious to crawl the mobile site exclusively?
1
The answer you linked to is quite extensive about what you shouldn’t do. The only thing which, in my opinion, is missing there is POST requests: the crawler shouldn’t submit forms or do any POST AJAX requests.
Mobile sites are no different from desktop ones. You shouldn’t consider it rude or malicious to crawl them.
However:
-
A mobile site is sometimes not a full replacement of its desktop variant. Often, mobile sites are not only deprived of many desktop features (and ads, that’s a good thing), but may also lack content that you would like indexing.
-
If your search engine redirects someone who uses a PC to a mobile version of a website, the person may be lost (especially if she knows how the site is usually displayed on a PC). If the site has no automatic redirection of PC users from the mobile site to the corresponding desktop site page (as opposed to the home page), you need to implement the mapping yourself in your search engine.
1