Implementation notes

Obscure solution to a common problem.

----------------------------------------------------------------------
0. Problem

The common problem here is to transfer binary files across machines.
In particular, these are machines where I have SSH access, but due to
how the network is setup, I can't use publickey, so I had to type in a
few passwords every time I want to get stuff in and out of them, and
passwords are always difficult to type correctly by design.  This
common problem is a common loss of productivity for me.

I figured, since I already have a terminal open for these connections,
without doing any extra magic, I can transfer *text* fairly easily via
copy & paste.  I used to do something like this:

   tar cf - some_files* | bzip2 -9 -c | uuencode a.tar.bz2

Then on the receiving end, I can do something like this:

   uudecode -o - | tar xjf -

This works fairly well, except uuencode is quite inefficient, encoding
45 bytes per line.  I know I can do much better than that.

----------------------------------------------------------------------
1. Encoding

The main problem with uuencode is this base64 format, which uses 4
characters to encode 3 bytes.  Maybe there were only 64 printable
characters back in 1987, but these days we have about 94 in ASCII, so
we can do 4 bytes in 5 characters:

   256^4 < 94^5

If we are going to use 5 characters, we don't need to use all 94
ASCII characters, just 85 will do:

   256^4 < 85^5 < 94^5

Encoding 4 byte blocks works as follows:

   x = some integer between 85 and 94.
   a, b, c, d = input bytes
   A, B, C, D, E = output bytes (+33 to make them printable)

   a*256^3 + b*256^2 + c*256 + d = A*x^4 + B*x^3 + C*x^2 + D*x + E

If the input block is shorter than 4 bytes, we can chop those
characters off of the output, since:

   256^3 < 85^4 < 256^4
   256^2 < 85^3 < 256^3
   256^1 < 85^2 < 256^2

This means with some input padding and output truncating logic, we can
encode binary data of any size in base85.

This isn't a new idea, searching for base85 will find quite a few
tools and libraries that already implemented it, including PostScript.
The thought of running PostScript to transfer files doesn't really
appeal to me though, and I thought it would be fun to implement it
just for kicks, so I did.

----------------------------------------------------------------------
2. Implementation

Following the base conversion described in the previous section, the
code is trivial to implement.  More trivial than PostScript's ASCII85
encoding, because I don't have the 0^4 = 'z' logic (because I usually
filter input and output with bzip2, streams of zeroes are fairly
uncommon in my input, so it's not worth the extra cycles to special
case them).

Since any number between 85 and 94 would work, I picked 90 due to my
obsession with Touhou characters, of which both Cirno and Akyuu have
strong association with the number "9".  Cirno is the encoder and
Akyuu is the decoder: seems more intuitive to me that Cirno can freeze
data while Akyuu can recall them.

For convenience, both the encoder and decoder are implemented in Perl,
so that I can transfer the decoder itself over text terminals via copy
and paste, and no recompile is needed because every real operating
system has Perl installed.  Perl is mostly fine except it's a bit
slow, so I also implemented both the encoder and decoder in C.  These
tools are now suitable for everyday use.

I also implemented these tools in PostScript for comparison.  The
source size is very small because PostScript has base85 built-in, and
the running time is just horrendous because it's PostScript.

----------------------------------------------------------------------
3. Testing

test.pl feeds deterministic and random input through one encode->decode
cycle, and checks that the output is identical to input.

You can see the speed differences between the different
implementations by timing test.pl.  Test results on my machine:

Perl:
13.666u 0.966s 0:10.51 139.1%   0+0k 0+0io 0pf+0w
13.645u 0.956s 0:10.50 138.9%   0+0k 0+0io 0pf+0w
13.723u 0.950s 0:10.56 138.9%   0+0k 0+0io 0pf+0w

PostScript:
24.873u 2.658s 0:17.40 158.1%   0+0k 0+0io 0pf+0w
24.827u 2.694s 0:17.41 158.0%   0+0k 0+0io 0pf+0w
24.843u 2.644s 0:17.37 158.2%   0+0k 0+0io 0pf+0w

C:
4.224u 0.720s 0:04.90 100.8%    0+0k 0+0io 0pf+0w
4.189u 0.745s 0:04.89 100.6%    0+0k 0+0io 0pf+0w
4.249u 0.734s 0:04.94 100.6%    0+0k 0+0io 0pf+0w

C is fastest by far.  No surprises here ;)

----------------------------------------------------------------------
4. Finally...

Now I can transfer data at 60 bytes per line!

Actually, I have increased the transfer efficiency a lot more just by
increasing my terminal's scrollback buffer size.