The Dark Balloon

A weblog by Hao Lian.
A journey into the soft of night.
A terrible secret guarded by golems.

§
The carefully woven ack duality amid the surreptitious cloak of night.

Update: Thanks, AndyArmstrong! This magic is specific not to ack but to pl2bat, and unfortunately I can’t track down the person who wrote that sexy voodoo, so I am removing the attribution below. There are similar tools for Python (by Christian Schaller) and Ruby.

How does Ack on Windows work? After all, Windows’ command prompt doesn’t support shebangs or any other terminal features invented in the last century. Once you type ack, it actually matches to the ack.bat file in your PATH. And when you open that, you get a huge surprise because somebody has smuggled an entire Perl program into a batch script. Furthermore, Emacs is in Perl mode for no apparent reason. What the hell is going on?

@rem = '--*-Perl-*--
@echo off
perl -x -S %0 %*
goto endofperl
@rem ';
#!/usr/local/bin/perl
#line 15

use warnings; use strict;
our $VERSION = '1.84';
use App::Ack ();

[rest of the code]

:endofperl

These are the edited first lines to ack.bat, the executable for Windows. It’s part of the infinitely superior grep alternative, Ack. (To install, type cpan App::Ack into your terminal unless you’re using ActiveState Perl. In that case, you should switch to Strawberry Perl to preserve sanity.)

Why is this terribly clever? The @rem = '--*--Perl-*-- should tip you off. It’s a MS-DOS comment (that is, cmd.exe totally ignores that line), but it’s also a Perl array assignment. In fact, the single quotation mark starts a string that doesn’t end until the semicolon at which point the ack script takes over.

Now the big picture emerges: When you run ack, you run cmd.exe ack.bat. That, in turn, it runs perl.exe ack.bat %*, where %* is a list of arguments you passed to ack.bat. And, boom, the rest follows: cmd.exe ignores the Perl bits thanks to goto endofperl and @rem. Perl ignores the command prompt bits with an assignment to the fake array @rem.

Two more juicy and more obvious bits: The assignment to @rem doesn’t throw a warning because it happens before use strict. And the --*-Perl-*-- tells Emacs, vim, and your favorite text editor to switch to Perl mode.

[(2008 June 21, 3!) .]

Recent comments (HAO, Jammies.) • (Tim, Jammies.) • (Prashanth, Wedding.) • (Hao, Hands.) • (Prashanth, Hands.).

Recent posts (03/18, The Daily Show: Oscar Romero and textbooks.) • (02/03, Butter-related greetings.) • (01/18, Happy Martin Luther King, Jr. day.) • (01/18, Chances, part one.) • (01/02, Jammies.).

§
Puzzle: brass instrument?

Deduce what this does, and I’ll talk about you on this fine literary establishment. Clue: one of the simplest brass instruments. Difficulty: 3/4.


use strict; use warnings; use 5.010; my $d =
'http://www.timesonline.co.uk'; while (<>) {/value="(.*?)"/; my $u =
$1; if ($u =~ m{^/}) {$u = $d . $u;} my $o = `wget -q -O
- "$u"`; for ($o =~ m/ef="(.*?)">.*?DOW/) {say $1;}}
[(2008 April 8, 1!) .]

§
2007 predictions, final updates

Let’s wrap up the 2007 predictions with as little crying as possible, OK?

[Mac OS X failing], with Vista, will leave a power vacuum in the OS market, one that is not filled by Linux, which will lack an OS to copy from which to copy any more features.

TRUE. 2007 was not a good year for operating systems. Unless, of course, you count the release of Emacs 22.1. What happened to version 22, you ask? It’s currently lost in the beard of Richard Stallman, and no amount of open source communism is enough to entice anybody to go and retrieve it.

Web 2.0 will stagnate.

TRUE. Web 2.0 is no longer cool. Lame Internet fads are cool again.

Language interpreters will become a bottleneck.

TRUE. Python is cool now. We all know Python is not interpreted; it’s, in fact fed through van Rossum’s head before he manually flips the diodes in your monitor. That’s right, I’m making this public: Python only runs on LCDs. Deal with it, bitches.

Perl 6.0 will not be released. Python 3000 will not be released.

DOUBLE TRUE. Perl now has Rakudo, Pugs, and about twenty other mini-languages. Apparently, “focus” in the Perl community is a weird way of spelling “vapor.” As in vaporware. MORPHEME BURN. Sizzle sizzle, bitches. (Seriously, when C++0x adds a whole bunch of unnecessary features, it’s lame. But when Perl 6 does it, it’s “Come on, we have nightly builds?”)

dotfloofy dotblog will reach its fourth anniversary.

FALSE. We had a continuity jump and we are, indeed, in our sixth anniversary by now. Scientists, with their sciencing, widely believe this is due to anomalies caused by the pesto sauce I spilled back around June of 2007.

[(2008 April 1) .]

§
Perls.

Would you like to changes your singular nouns to plural? You can do that and much much more, in Perl (via). Note the authors have struggled with people over octopuses and viruses. Clearly they have yet to meet real grammar nazi-quibblers.

[(2008 January 21, 3!) .]

§
Hum-drum.

This started as really ugly Perl code. But, like a flower or a cocoon or a flower named Cocoon, it blossomed into something beautiful after an hour of reduction, that wonderful sweet process where everything clicks and hums. Java folks call it “refactor,” Haskell folks call it “second-level static typed meromorphism arrow structure on the category Dog”: the fact of the matter is that this is a universal, pan-language feeling. Anyway, what this does is take a Genbank full listing for an organism’s genes, for example Salmonella’s genes, and convert that into a tab-delimited file with–for each gene–the name, start and stop of the gene, and the description. If you have the Salmonella genome, you can then pull all the genes from Salmonella using slurp_ecogene.pl or whatever file we’ll have 1000 revisions from now. A word of caution: The Genbank links go to extremely large web pages in the order of megabytes, megabytes, like all four of them.

use strict; use warnings;

open(my $h, shift);
my @lines = <$h>;
close($h);

sub infer {
    my ($i, $start, $stop) = @_;

    # Lucky for us, the next line is always the /gene. Then we'll
    # grope around in the dark and try to find /product.
    my ($gene) = $lines[++$i] =~ /gene="(.*?)"/;
    $i++ until $lines[$i] =~ /product=/;

    # Some products span more than one line. We'll need to remove
    # initial whitespace and newlines before extracting $desc.
    my $desc;
    for (join '::', @lines[$i .. $i+3]) {
        s/::\s+//g; s/(\r|\n)/ /g;
        ($desc) = /product="(.*?)"/;
    }
    return "$gene\t$start\t$stop\t$desc";
}

# Doing a foreach loop over <$h> (linear) and parsing (decidedly
# nonlinear) is hard, so we'll sacrifice elegance for sanity.
foreach my $i (0 .. $#lines) {
    local $_ = $lines[$i];
    next unless /^\s+CDS/;
    my ($start, $stop) = /(\d+)..(\d+)/;

    # We're matching for complement(#..#). It's good to know, so
    # we'll mark it with a negative $start.
    $start *= -1 if /complement\(/;
    print infer($i, $start, $stop), "\n";
}
[(2008 January 20) .]