Maybe it is because I’m kinda new to Ruby, but this took me awhile to find. I wanted to build a Ruby script that can check just the HTTP status code for an array of urls.
To handle a lot of the dirty work I used the following gems
Nokogiri, and Open-URI to grab a sitemap.xml file and parse it, and Mechanize to check the http status code
1 2 3 4 | require 'rubygems' require 'nokogiri' require 'open-uri' require 'mechanize' |
Ok, build the code to grab a sitemap.xml and parse the urls into an array was easy.
1 2 3 4 5 6 7 8 9 10 | sitemap_url = 'http://www.example.com' urls = Array.new() ## load sitemap.xml doc = Nokogiri::HTML(open(sitemap_url+'/sitemap.xml')) ## grab all of the loc's doc.xpath('//url/loc').each{|link| urls.push(link.content) } |
I know I could have check the http status code from the xpath each, but I plan on doing a few things to the array of url not just the single check.
At this point I have an array of urls. Now I need to check the HTTP status of each one…
1 2 3 4 5 6 7 8 9 10 11 12 | a = Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' } urls.each {|url| page = a.head(url) if page.code.to_i == 200 puts url.to_s + 'is good' else puts url.to_s + 'is bad' end } |
After instanciating the mechanize class, I iterate through the array and perform a “head” call to checek the status of each url. In the result object that is returned the http status of the request is in the property called code. I make sure that code is an integer (to_i) and then compare it to 200, if it equals 200 it is good.
I’m really starting to dig Ruby !!!
Thanks for creating great post regarding the topic. I’m a fan of your site. Keep up the good job.
This is great, thanks!