webcrawler

Displaying 1 - 2 of 2

How to login to any website using Curl from the command line or shell script

There are times you need to scrape/crawl some field on a page but the page requires authentication (logging in). Unless the site is using Basic Auth, where you can have the username and password in the url like http://username:[email protected]/ then you'll need to curl with more sophistication. Besides curl, there are other web tools which you can use on the command line such as links/elinks (elinks is an enhanced version of links which also supports JavaScript to a very limited extent). Links and curl will not execute JavaScript though, so if that's necessary to get...

Selenium IDE vs Selenium Webdriver vs CasperJS

Or more specifically: Selenium IDE (Firefox plugin) vs Selenium Webdriver (Python and other languages) vs CasperJS (and PhantomJS or SlimerJS)

Selenium allows you, a programmer or non-programmer, to control a web browser and make it do things that you would otherwise do manually. With that ability, you can test your website over and over (and automatically from cron), similate users, or visit any number of web pages and read data (web scraping) on them and save to a file for processing.

If you go to the Selenium website you will...