Implementation notes Taking md5sum and made it even less secure. ---------------------------------------------------------------------- 0. Concept About a year ago when TRICK 2015 was happening, there was this question on how to uniquely identify each entry: https://twitter.com/mametter/status/660836410149089280 So the scheme used for TRICK 2013 was apparently to just take the first 32 bits of MD5. And memorizing 32 bit numbers is indeed cumbersome. Thus I set out to write this utility, which takes just the minimal subset of bits from MD5, and then translate those bits to a human readable string. ---------------------------------------------------------------------- 1. Selecting bits Despite MD5 having been widely declared as unfit for anything that demands security, it's still a fine scheme for differentiating files. And it's so good at doing it, most of the time the first 8 bits would uniquely identify each file in a set. Obviously, there are going to be some occasional collisions if we use just 8 bits, and guaranteed collisions if there are more than 256 files. This can be avoided by using a slightly more complicated scheme, by selecting 8 bits from somewhere in the middle of the MD5 digest as opposed to always using the first 8 bits, using 4 extra bits to encode the byte offset that was selected. 12 bits, as it turns out, works extremely well. But if even this fails (e.g. if there are more than 4096 files), we have a fallback scheme that selects variable number of bytes. In effect, this scheme is never worse than MD5. ---------------------------------------------------------------------- 2. Generating words Having selected 12 bits, we would like to translate these into easily memorizable words. This is basically an arbitrary selection of dictionary words, anything other than numbers are probably fine. We don't want numbers because that was what made MD5 difficult to memorize in the first place. In the end, I chose fruits. 4 bits -> taste or attribute 4 bits -> color or material 4 bits -> fruit Another alternative that I considered were animals, but multi-colored fruits felt less strange to me compared to multi-colored animals. Yet another alternative considered were different versions of Mobuko, such as "Tsundere Mobuko", "Doji Mobuyo", etc. But I think these will be just as difficult to memorize as numbers. ---------------------------------------------------------------------- 3. Testing and formatting This time around, I have decided on the template image before writing any code. Because this is a utility for differentiating files, I thought the various roles that Mobuko played in "Tesagure! Bukatsumono" would be perfect. Because I haven't published any ASCII art for a year, I wanted a simpler project to get me back into shape, so I decided to write this in Perl. Perl is especially nice for a few reasons: - MD5 library is part of the standard distribution. - Fitting text to template can be easily automated. - Since fitting text to template will be automated, I didn't bother recording the coding process, which meant lower development overhead. So the task is well defined, and the desired end result was planned ahead of time. What was left was to just write the first version, write some shell scripts to verify functionality, and then incrementally reduce the file size until it's appropriate for ASCII art. It was all very straightforward, the only difficulty was finding time to write it. ---------------------------------------------------------------------- 4. Finally... It has been a while since I wrote recreational code. Every time I do it, I do it with slightly more rigor than the previous one. Eventually I would turn everything into "work". But I guess that's fine.