Readers: 29 | Updated: 04-12

Googlebot Crawls Through HTML Forms

Translate Into:

Google will stop at nothing in its quest to index the world's information. Last year it ate through 100 exabytes of data, but there's still a lot that it can't get access to. Known as the deep web (or hidden web, or invisible web, etc.), it is estimated that the majority of online data is hidden safely from Google's prying eyes -- private intranets, unlinked pages, some non-textual content, and until today dynamic content returned via form input was all inaccessible to the search engine. Google today announced that its Googlebot web crawler would begin to fill out HTML forms and crawl the results.

"For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made," explained Jayant Madhavan and Alon Halevy in a blog post. "If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page."

Google, which says that the crawling of dynamic form results doesn't affect the "crawling, ranking, or selection of other web pages in any significant way," also assured webmasters today that their enhanced crawl would respect robots.txt as usual. Any form forbidden in robots.txt won't be crawled.

It is estimated that the deep web is several orders of magnitude larger than the regular, public world wide web. While there is some content that Google will never -- and should never -- get its hands on, by crawling form results Google is now peering just a little bit deeper into the Internet. As Matt Cutts points out, this is less about indexing search results (something Google has generally not liked to do) and more about finding new links that are only available via dynamically created pages.

It should be noted that Google is only crawling GET forms (i.e., forms used to retrieve dynamic content, such as search results) and not POST forms. That's mildly disappointing as we were looking forward to befriending Googlebot on MySpace...



From The Blogs

Electric Sheep Company

03-26
Death to sign up forms!
Sign up forms must die, proclaims Luke Wroblewski in his new book Web Form Design: Filling in the Blanks.I couldnt agree more.There is an emerging trend among Web 2.0 sites to employ a gradual engagem... 查看全文

KillerStartups.com - all

03-26
Modumobile.com - A Phone That Takes All Forms
What it doesWould you like the ability to give your phone a facelift every now and then so that it can accompany your current style?Or perhaps you’d like to slip your phone into your digital camera an... 查看全文

Vandelay Website Design

05-28
40 Options for Converting PSD to HTML
Im sure youve seen advertisements for a few different services that will convert your PSD files into HTML and CSS. If youre looking to save some time on your next project you may want to consider tryi... 查看全文

MakeUseOf.com

03-09
HTML 2 PDF : Convert any Web page to PDF
Htm2pdf is an HTML to PDF converter that lets you convert web pages to PDF and save them on your computer. Simply type in the address of the page (or copyn paste the html code directly) and let it con... 查看全文

Download Squad

03-22
Compile stats for text and HTML docs with Text Stat
Filed under: Utilities, Windows, Open Source Most word processors will give you some basic statistics about documents you're working on, like a word count. But what if you want to see how many words, ... 查看全文

Socyberty

07-19
Forms of Birth Control
Let's get it on. But before we do, let's talk about preventative methods. I know you're all frisky, and in the mood, but we don't want to get pregnant do we? Neither did some of our inventive predeces... 查看全文

gHacks technology news

07-20
Use The Computer To Fill Out Paper Forms
It looks much nicer and is better readable if a paper form is filled out by using a type writer or computer printer. The person who is working with the filled out paper forms will thank you if you do ... 查看全文

全文检索博客

07-20
Flex中如何从HTML容器中接受FlashVars动态创建ActionScript暗点的例子
在前面的例子中分别了解过FlashVars以及暗点的创建(右上角Search……),接下来的例子演示了Flex中如何从HTML容器中接受FlashVars动态创建ActionScript暗点。 让我们... 查看全文

生活点滴Enjoy Life

07-21
花5分钟升级cos-html-cache
此文专门测试升级使用,请大家忽略之 查看全文

生活点滴Enjoy Life

07-21
cos-html-cache升级到2.6
看来侥幸是没有用的,今天收到JackyChen的邮件,说我的blog出问题了,因为他看到的界面居然是已经登陆的界面(当然只是假登录,不是安全隐患),原来wp2.6居然又改变了cookie的相关东西 A... 查看全文
More Articles