Implementation notes Obscure solution to a common problem. ---------------------------------------------------------------------- 0. Problem The common problem here is to transfer binary files across machines. In particular, these are machines where I have SSH access, but due to how the network is setup, I can't use publickey, so I had to type in a few passwords every time I want to get stuff in and out of them, and passwords are always difficult to type correctly by design. This common problem is a common loss of productivity for me. I figured, since I already have a terminal open for these connections, without doing any extra magic, I can transfer *text* fairly easily via copy & paste. I used to do something like this: tar cf - some_files* | bzip2 -9 -c | uuencode a.tar.bz2 Then on the receiving end, I can do something like this: uudecode -o - | tar xjf - This works fairly well, except uuencode is quite inefficient, encoding 45 bytes per line. I know I can do much better than that. ---------------------------------------------------------------------- 1. Encoding The main problem with uuencode is this base64 format, which uses 4 characters to encode 3 bytes. Maybe there were only 64 printable characters back in 1987, but these days we have about 94 in ASCII, so we can do 4 bytes in 5 characters: 256^4 < 94^5 If we are going to use 5 characters, we don't need to use all 94 ASCII characters, just 85 will do: 256^4 < 85^5 < 94^5 Encoding 4 byte blocks works as follows: x = some integer between 85 and 94. a, b, c, d = input bytes A, B, C, D, E = output bytes (+33 to make them printable) a*256^3 + b*256^2 + c*256 + d = A*x^4 + B*x^3 + C*x^2 + D*x + E If the input block is shorter than 4 bytes, we can chop those characters off of the output, since: 256^3 < 85^4 < 256^4 256^2 < 85^3 < 256^3 256^1 < 85^2 < 256^2 This means with some input padding and output truncating logic, we can encode binary data of any size in base85. This isn't a new idea, searching for base85 will find quite a few tools and libraries that already implemented it, including PostScript. The thought of running PostScript to transfer files doesn't really appeal to me though, and I thought it would be fun to implement it just for kicks, so I did. ---------------------------------------------------------------------- 2. Implementation Following the base conversion described in the previous section, the code is trivial to implement. More trivial than PostScript's ASCII85 encoding, because I don't have the 0^4 = 'z' logic (because I usually filter input and output with bzip2, streams of zeroes are fairly uncommon in my input, so it's not worth the extra cycles to special case them). Since any number between 85 and 94 would work, I picked 90 due to my obsession with Touhou characters, of which both Cirno and Akyuu have strong association with the number "9". Cirno is the encoder and Akyuu is the decoder: seems more intuitive to me that Cirno can freeze data while Akyuu can recall them. For convenience, both the encoder and decoder are implemented in Perl, so that I can transfer the decoder itself over text terminals via copy and paste, and no recompile is needed because every real operating system has Perl installed. Perl is mostly fine except it's a bit slow, so I also implemented both the encoder and decoder in C. These tools are now suitable for everyday use. I also implemented these tools in PostScript for comparison. The source size is very small because PostScript has base85 built-in, and the running time is just horrendous because it's PostScript. ---------------------------------------------------------------------- 3. Testing test.pl feeds deterministic and random input through one encode->decode cycle, and checks that the output is identical to input. You can see the speed differences between the different implementations by timing test.pl. Test results on my machine: Perl: 13.666u 0.966s 0:10.51 139.1% 0+0k 0+0io 0pf+0w 13.645u 0.956s 0:10.50 138.9% 0+0k 0+0io 0pf+0w 13.723u 0.950s 0:10.56 138.9% 0+0k 0+0io 0pf+0w PostScript: 24.873u 2.658s 0:17.40 158.1% 0+0k 0+0io 0pf+0w 24.827u 2.694s 0:17.41 158.0% 0+0k 0+0io 0pf+0w 24.843u 2.644s 0:17.37 158.2% 0+0k 0+0io 0pf+0w C: 4.224u 0.720s 0:04.90 100.8% 0+0k 0+0io 0pf+0w 4.189u 0.745s 0:04.89 100.6% 0+0k 0+0io 0pf+0w 4.249u 0.734s 0:04.94 100.6% 0+0k 0+0io 0pf+0w C is fastest by far. No surprises here ;) ---------------------------------------------------------------------- 4. Finally... Now I can transfer data at 60 bytes per line! Actually, I have increased the transfer efficiency a lot more just by increasing my terminal's scrollback buffer size.