Implementation notes

Taking md5sum and made it even less secure.

----------------------------------------------------------------------
0. Concept

About a year ago when TRICK 2015 was happening, there was this
question on how to uniquely identify each entry:
https://twitter.com/mametter/status/660836410149089280

So the scheme used for TRICK 2013 was apparently to just take the
first 32 bits of MD5.  And memorizing 32 bit numbers is indeed
cumbersome.

Thus I set out to write this utility, which takes just the minimal
subset of bits from MD5, and then translate those bits to a human
readable string.

----------------------------------------------------------------------
1. Selecting bits

Despite MD5 having been widely declared as unfit for anything that
demands security, it's still a fine scheme for differentiating files.
And it's so good at doing it, most of the time the first 8 bits would
uniquely identify each file in a set.

Obviously, there are going to be some occasional collisions if we use
just 8 bits, and guaranteed collisions if there are more than 256
files.  This can be avoided by using a slightly more complicated
scheme, by selecting 8 bits from somewhere in the middle of the MD5
digest as opposed to always using the first 8 bits, using 4 extra bits
to encode the byte offset that was selected.

12 bits, as it turns out, works extremely well.  But if even this
fails (e.g. if there are more than 4096 files), we have a fallback
scheme that selects variable number of bytes.  In effect, this scheme
is never worse than MD5.

----------------------------------------------------------------------
2. Generating words

Having selected 12 bits, we would like to translate these into easily
memorizable words.  This is basically an arbitrary selection of
dictionary words, anything other than numbers are probably fine.  We
don't want numbers because that was what made MD5 difficult to
memorize in the first place.

In the end, I chose fruits.

   4 bits -> taste or attribute
   4 bits -> color or material
   4 bits -> fruit

Another alternative that I considered were animals, but multi-colored
fruits felt less strange to me compared to multi-colored animals.

Yet another alternative considered were different versions of Mobuko,
such as "Tsundere Mobuko", "Doji Mobuyo", etc.  But I think these will
be just as difficult to memorize as numbers.

----------------------------------------------------------------------
3. Testing and formatting

This time around, I have decided on the template image before writing
any code.  Because this is a utility for differentiating files, I
thought the various roles that Mobuko played in "Tesagure!  Bukatsumono"
would be perfect.

Because I haven't published any ASCII art for a year, I wanted a
simpler project to get me back into shape, so I decided to write this
in Perl.  Perl is especially nice for a few reasons:
- MD5 library is part of the standard distribution.

- Fitting text to template can be easily automated.

- Since fitting text to template will be automated, I didn't bother
  recording the coding process, which meant lower development overhead.

So the task is well defined, and the desired end result was planned
ahead of time.  What was left was to just write the first version,
write some shell scripts to verify functionality, and then
incrementally reduce the file size until it's appropriate for ASCII
art.  It was all very straightforward, the only difficulty was finding
time to write it.

----------------------------------------------------------------------
4. Finally...

It has been a while since I wrote recreational code.  Every time I do
it, I do it with slightly more rigor than the previous one.
Eventually I would turn everything into "work".

But I guess that's fine.