This feed contains pages in the “ruby” category.

ELinks hacks

The ELinks browser has the nice feature of being scriptable in a number of languages (ruby, perl, guile, and lua, plus python in the development version). I was browsing Amazon and getting annoyed at the adverts and whatnot, and also single-pixel transparent GIFs on that and some other sites (I force all images to be displayed, so that I can open them in an image viewer even if they have no ALT text), so I decided to do something about it.

The really useful bit in this case is ELinks::pre_format_html_hook, which is passed a URL and the page content and expects the page content (modified or unmodified) to be returned.

I originally tried using regular expressions, which works fine for removing images (/<img [^>]*src="[^"]*(spacer|transparent-pixel).gif"[^>]*\/?>/ is what I used, for the record). However, for more complex structures this approach is less useful, as it’s not possible with regular expressions to make sure that the nesting of several elements is correct (i.e. that the number of elements that close is the same as the number that opened).

Enter Hpricot. Hpricot is a very forgiving HTML parsing library, which allows you to use XPath or CSS selector syntax, as well as (I think) a DOM-style API. You can do things like doc.search(‘div.nonmemberEnclosure’).remove, which removes the Amazon Prime nag box, or more complicated things.

My hooks.rb, therefore, contains the following:

def ELinks::pre_format_html_hook(url, html)
    require 'rubygems'
    require 'hpricot'
    doc = Hpricot(html)
    if url.grep(/amazon\.co(m|\.uk)/)
        doc.search('div').collect!{|n|n if /A9Ads/ =~ n[:id] }.compact.remove # ads
        doc.search('img').collect!{|n|
            n if /(transparent-pixel|navPackedSprites)/ =~ n[:src]
        }.compact.remove # random pointless images
        doc.search('div.nonmemberEnclosure').remove # Amazon Prime nag box
        doc.search('div#more-buying-choice-content-div//table').remove # tl;dr
    elsif url.grep(/smile\.co\.uk/)
        doc.search('img').collect!{|n|
            n if /(blackarrow|littlesmile).(gif|png)/ =~ n[:src]
        }.compact.remove
    end
    doc.search('img').collect!{|n|
        n if /(spacer.gif|doubleclick.net)/ =~ n[:src]
    }.compact.remove
    html = doc.to_html
    return html
end

I also wanted to test out the Python scripting capabilities of ELinks 0.12, so came up with the following, using BeautifulSoup:

from BeautifulSoup import BeautifulSoup
def pre_format_html_hook(url,html):
    doc = BeautifulSoup(html)
    if "wikipedia.org" in url:
        for e in doc.findAll('td',{'class':'mbox-image'}):
            e.extract()

    return doc.prettify()
Ruby vs. Python

Well, after Dan’s Python presentation earlier this week, I decided to give Python another go, and attempt to properly evaluate the two.

Python Pros:

  • Faster than Ruby, at least until Ruby 2.0 with YARV is released.
  • Looks cleaner, because there are no end statements all over the place.
  • More library support.
  • Python 3000 will use UTF-8 internally—no more faffing around converting, no more having to mark every file I write with -*- coding: utf-8 -*- so I can use copyright symbols, no more obscure errors trying to process reST (I know that it isn’t an ASCII character, I never claimed it was!).
  • Ruby doesn’t have a native implementation of reST. I like reST, as a text markup language; it seems less frivolous than markdown.

Ruby Pros:

  • .each{} is nicer than for i in j:. On the other hand, Python has list comprehensions, which can be even nicer.
  • No need to type () unless not doing so would be ambiguous.
  • Blocks are more powerful than Python’s lambda, though lambda looks cleaner (compare lambda {|i|i*i}.call(4) to (lambda i: i*i)(4). Python lambda can only be one statement, though.
  • More objecty goodness—I like having everything as an object. I also greatly dislike having some stuff as methods and some as functions, like Python does—either go entirely OO or entirely not.
Posted
bugger!

Bugger! is the bug-tracking software that I’m writing (and yes, the exclamation mark is part of the name). It’s written in Ruby, and designed to have as few external dependencies as possible (so far, it looks likely that it’ll need an MTA, web server, or command shell to manipulate the database, and possibly GnuPG for authentication - other than that, I hope to keep it to just the core Ruby libraries).

Partly I’m writing this for programming experience, or just for something to fill my free time. Partly it’s because I haven’t yet found a BTS that I like (Trac is quite good, but the Darcs support is still experimental; Bugzilla is awful and requires MySQL, though I think it can do Postgres now; Debbugs works how I want, in general, but seems very complicated - probably because it’s Debian-specific).

Bugger lives in my public Darcs repository: darcs.bmalee.eu/repo/bugger/; the web interface is at darcs.bmalee.eu and the project home page is right here.

Posted
missing ruby

After using Python a lot recently, I’m starting to miss Ruby; it has some nice features that Python lacks (object.each iterators, for a start, and the very nice Date (or is it Datetime or Time?) object that parses input in gods know how many different formats then allows you to output it again in any format you like - I know Python has strptime, but it doesn’t guess at the format like Ruby can).

It’s a grass-is-always-greener thing, of course; if I was using Ruby a lot I’d miss Python’s syntax (I hate having to type out “begin” and “end”, partly because I cock it up so often).

I’ve been looking at Haskell recently, but I’m a long way from understanding functional programming well enough to actually use it. All I’m sure of is that I don’t want to have to use Perl much in the near future (dollar signs get on my wick).