This feed contains pages in the “python” category.
The ELinks browser has the nice feature of being scriptable in a number of languages (ruby, perl, guile, and lua, plus python in the development version). I was browsing Amazon and getting annoyed at the adverts and whatnot, and also single-pixel transparent GIFs on that and some other sites (I force all images to be displayed, so that I can open them in an image viewer even if they have no ALT text), so I decided to do something about it.
The really useful bit in this case is ELinks::pre_format_html_hook, which is passed a URL and the page content and expects the page content (modified or unmodified) to be returned.
I originally tried using regular expressions, which works fine for removing images (/<img [^>]*src="[^"]*(spacer|transparent-pixel).gif"[^>]*\/?>/ is what I used, for the record). However, for more complex structures this approach is less useful, as it’s not possible with regular expressions to make sure that the nesting of several elements is correct (i.e. that the number of elements that close is the same as the number that opened).
Enter Hpricot. Hpricot is a very forgiving HTML parsing library, which allows you to use XPath or CSS selector syntax, as well as (I think) a DOM-style API. You can do things like doc.search(‘div.nonmemberEnclosure’).remove, which removes the Amazon Prime nag box, or more complicated things.
My hooks.rb, therefore, contains the following:
def ELinks::pre_format_html_hook(url, html)
require 'rubygems'
require 'hpricot'
doc = Hpricot(html)
if url.grep(/amazon\.co(m|\.uk)/)
doc.search('div').collect!{|n|n if /A9Ads/ =~ n[:id] }.compact.remove # ads
doc.search('img').collect!{|n|
n if /(transparent-pixel|navPackedSprites)/ =~ n[:src]
}.compact.remove # random pointless images
doc.search('div.nonmemberEnclosure').remove # Amazon Prime nag box
doc.search('div#more-buying-choice-content-div//table').remove # tl;dr
elsif url.grep(/smile\.co\.uk/)
doc.search('img').collect!{|n|
n if /(blackarrow|littlesmile).(gif|png)/ =~ n[:src]
}.compact.remove
end
doc.search('img').collect!{|n|
n if /(spacer.gif|doubleclick.net)/ =~ n[:src]
}.compact.remove
html = doc.to_html
return html
end
I also wanted to test out the Python scripting capabilities of ELinks 0.12, so came up with the following, using BeautifulSoup:
from BeautifulSoup import BeautifulSoup
def pre_format_html_hook(url,html):
doc = BeautifulSoup(html)
if "wikipedia.org" in url:
for e in doc.findAll('td',{'class':'mbox-image'}):
e.extract()
return doc.prettify()
Wrote a script to display Twitter updates via notification-daemon (used quite heavily in Gnome, I believe, but I’m not sure about KDE). It’s written in Python, and it’s here.
Well, after Dan’s Python presentation earlier this week, I decided to give Python another go, and attempt to properly evaluate the two.
Python Pros:
- Faster than Ruby, at least until Ruby 2.0 with YARV is released.
- Looks cleaner, because there are no end statements all over the place.
- More library support.
- Python 3000 will use UTF-8 internally—no more faffing around converting, no more having to mark every file I write with -*- coding: utf-8 -*- so I can use copyright symbols, no more obscure errors trying to process reST (I know that it isn’t an ASCII character, I never claimed it was!).
- Ruby doesn’t have a native implementation of reST. I like reST, as a text markup language; it seems less frivolous than markdown.
Ruby Pros:
- .each{} is nicer than for i in j:. On the other hand, Python has list comprehensions, which can be even nicer.
- No need to type () unless not doing so would be ambiguous.
- Blocks are more powerful than Python’s lambda, though lambda looks cleaner (compare lambda {|i|i*i}.call(4) to (lambda i: i*i)(4). Python lambda can only be one statement, though.
- More objecty goodness—I like having everything as an object. I also greatly dislike having some stuff as methods and some as functions, like Python does—either go entirely OO or entirely not.
A few tweaks to the template used by Planet TermiSoc, and now the Atom feed include’s the poster’s name in the title. This means I can actually tell who the post is by. :)
I can do the same to the RSS feeds, if you like, but frankly Atom is a better format, so you should be using that anyway. :)
After using Python a lot recently, I’m starting to miss Ruby; it has some nice features that Python lacks (object.each iterators, for a start, and the very nice Date (or is it Datetime or Time?) object that parses input in gods know how many different formats then allows you to output it again in any format you like - I know Python has strptime, but it doesn’t guess at the format like Ruby can).
It’s a grass-is-always-greener thing, of course; if I was using Ruby a lot I’d miss Python’s syntax (I hate having to type out “begin” and “end”, partly because I cock it up so often).
I’ve been looking at Haskell recently, but I’m a long way from understanding functional programming well enough to actually use it. All I’m sure of is that I don’t want to have to use Perl much in the near future (dollar signs get on my wick).
I couldn’t find a terminal emulator that suited me - gnome-terminal is too bloated, it depends on a stack on Gnome libraries; xterm appears to eat meta keys, making using IRSSI or anything that needs Alt key combinations a little awkward, and XFCE-Terminal is not only uninstallable in Debian, it doesn’t compile even with some major hacking around, since the last update was over a year ago.
So, I bashed out a basic terminal using PyGTK and VTE. It more-or-less seems to work, despite having taken less than half an hour. It’s not at all configurable, unless you want to edit the source. But it works.