\input texinfo @setfilename manual.texi @settitle Hazuki @titlepage @title Hazuki @subtitle Moon Phase Encryption @end titlepage @contents @ifnottex @node Top @top Hazuki @end ifnottex Moon phase encryption. @menu * Usage:: Command line settings * Features:: Not a typical encryption utility * Internals:: More than security by obscurity * Miscellaneous:: Random thoughts that inspired this utility @end menu @c ---------------------------------------------------------------------- @c {{{ @node Usage @chapter Usage To encrypt: @example ./hazuki @var{yyyy-mm-dd} @var{key} @var{input} @var{output} @end example To decrypt: @example ./hazuki @var{key} @var{input} @var{output} @end example Arguments: @itemize @bullet @item @var{yyyy-mm-dd} - date which the file can be decrypted. @item @var{key} - encryption/decryption key. Only the first 1K is used. Empty files can be used as keys. @item @var{input} - input file name. If this is not specified or if it's @code{-}, Hazuki will read from stdin. @item @var{output} - output file name. If this is not specified or if it's @code{-}, Hazuki will write to stdout. @end itemize Hazuki decides whether to operate in encryption mode or decryption mode based on the first argument, and will always operate in encryption mode if the first argument looks like a date. Thus it's not a good idea to have the key file name start with a digit. For encryption, moon phase is computed from the date argument, which determines the cipher to be used. For decryption, moon phase is computed from current date. If the moon phases do not match, encryption and decryption ciphers will not match, and the files will not decrypt correctly. Example: encrypt a file named @code{input.txt} to @code{output.bin}, using an empty key, and set the target date so that the output can be decrypted today (or ~29 days from today): @example ./hazuki `date +'%Y-%m-%d'` /dev/null input.txt output.bin @end example @c }}} @c ---------------------------------------------------------------------- @c {{{ @node Features @chapter Features Bits that are specific to Hazuki and not found in many typical encryption utilities. @menu * Moon phase:: * Compile time options:: * Embedded text:: @end menu @node Moon phase @section Moon phase The main reason for using Hazuki over some other encryption utility is the built-in moon phase calculator. Hazuki will compute moon phase for a particular date, and use that moon phase to construct the encryption cipher. For decryption, current date is used, and files intended for a different moon phase will not decrypt correctly. Caveat for Windows users: For both encryption and decryption, timestamps are aligned to midnight local time. @code{localtime} is used to get current time, and @code{mktime} is used to convert to seconds and align to midnight. These functions don't always handle time zones the same way, be sure to set @code{TZ} environment string to something sane to ensure consistent behavior, such as @code{GMT+7}. @node Compile time options @section Compile time options Hazuki comes with 3 constants that can be adjusted at compile time. That is, there are some constants that can be easily edited in the source, and recompiling @code{hazuki.cc} with those edits produces a new tool. All these constants can be found on line 14. @itemize @bullet @item @code{swap_byte_order} - whether to swap byte order or not on reads and writes. This should be set to 1 if you are on a big-endian machine, so that files encrypted by little-endian machines can be decrypted. The default is 0, so no byte order swaps are done. This makes Hazuki run slightly faster. Hazuki has not been tested on big-endian machines. @item @code{R} - number of encryption rounds. Increasing number of rounds makes Hazuki run slower, but potentially increasing encryption strength. Default is 64 rounds. Setting it below 32 is likely a bad idea. @item @code{B} - block size for reads and writes. Must be a multiple of 16. This is for tuning IO performance only. 16K has been found to be most optimal on my machine. @end itemize @node Embedded text @section Embedded text Peculiar strings embedded in @code{hazuki.cc}: @itemize @bullet @item CRC32 for the code can be found at line 58. It would be great if Hazuki can check herself to resist tampering, but doing that requires quite a bit more extra code. @item The source code also runs under brainf***. (This is quite common for code found on uguu.org, 6 out of the 20 most recent programs have brainf*** programs embedded). @end itemize @c }}} @c ---------------------------------------------------------------------- @c {{{ @node Internals @chapter Internals This chapter is meant to give a bit more details to how Hazuki operates, the intent is to show that Hazuki provides more than just security by obscurity. @menu * Encrypted file format:: * Salt:: * Final block length:: * Block operation:: * Cipher:: * Pseudorandom number generator:: @end menu @node Encrypted file format @section Encrypted file format An encrypted file consists of 3 parts: @enumerate @item 16 bytes of salt. @xref{Salt}. @item Encrypted contents, in 16 byte blocks. @xref{Block operation}. @item Length of last block as a single byte. @xref{Final block length}. @end enumerate Thus the output file size is always a multiple of 16 plus 1. Minimum output file size is 17 bytes for an empty input file. @node Salt @section Salt Encrypted files are prefixed with 16 bytes of salt, which affects how the rest of the file is encrypted (@xref{Block operation}). Presence of this salt means all encrypted files are different, even if the original plaintext is the same. This is a countermeasure against dictionary attacks. The salt is generated from a random number generator (@xref{Pseudorandom number generator}), which is seeded from encryption key, current time, and current process ID. Current time is determined using @code{gettimeofday}, which typically yields at least at least millisecond resolution (~10 bits of entropy from the sub-second portion alone, the second portion is predictable), and process IDs are usually 16 bits wide, so the salt contains about ~26 bits of entropy. @node Final block length @section Final block length Because Hazuki operates in 16 byte blocks, the final input block needs to be padded for encryption if it's not already 16 bytes long. To determine the actual size of the final block for decryption, a single byte it appended to the end of the file. This byte contains a nonzero value from 1 to 15 if the final block is not 16 bytes long. If the final block is an even 16 bytes, this value will be zero. This final block length byte is not encrypted in any way. In earlier designs, there were many variations where the file length, the final block length, or some other indication of where to truncate the output were stored encrypted somehow. All of these has the same weakness: while the file contents are expected to be fairly unpredictable, the file length is much easier to guess, and by storing the encrypted length, that becomes a known plaintext that an attacker can use. @node Block operation @section Block operation Hazuki operates in cipher-block chaining mode, meaning the ciphertext of the previous block is XORed with the plaintext of the current block before encryption. For the first block, salt is XORed with plaintext (@xref{Salt}). Any change in an earlier part of the input (including salt), will affect all subsequent output. Thus, even if the input file consists of identical 16 byte blocks repeated over and over, all output blocks will be different. Due to random salt and cipher-block chaining, encrypting the same file twice will produce two different output files. For decryption, a copy of the ciphertext for each block is saved before decrypting that block, and the saved ciphertext will be used to decrypt the next block. Because of this property, if the encrypted file is corrupted, only the corrupted block plus the one following block will fail to decrypt, all subsequent blocks will still decrypt correctly. @node Cipher @section Cipher The heart of Hazuki is a 128-bit block cipher. This cipher is a substitution-permutation network, with similar constructions as the Serpent cipher. In fact, the permutation part is copied directly from Serpent. The substitution parts (also known as s-boxes) are unique to Hazuki. These are 8 bit s-boxes with the following qualities: @itemize @bullet @item For any single bit flip in input, there will always be at least 2 bits flipped in output. This is a countermeasure against differential attacks. @item Every s-box contains one 256 substitution cycle. That is, applying @code{x=sbox[x]} 256 times will result in the original @code{x} being returned, but all 256 @code{x} will be different along the way. @end itemize The second item is not found in the original Serpent design, probably because it doesn't matter too much for 4-bit s-boxes. My thinking is this: if we have a s-box with more than one cycle, such as the following: @example 1 -> 2 -> 3 -> 1 4 -> 5 -> 4 @end example Then it's possible to determine which group an input is in. In the example above, a byte is either in the 1-2-3 group or the 4-5 group regardless of the number of substitutions that are applied, and this is a weakness that reduces the range of the substitutions. S-boxes used by Hazuki do not have this weakness. Hazuki applies 16 different s-boxes for encryption, in the following sequence: @example xor key 1 -> apply sbox 1 -> permute xor key 2 -> apply sbox 2 -> permute ... @end example The default setting for Hazuki is 64 encryption rounds, so each s-box is used 4 times. There are actually 9 sets of 16 s-boxes defined. Hazuki selects one set from the first 8 depending on current moon phase, and use the last set for key schedule. Because each moon phase results in a completely different set of 16 s-boxes, a different moon phase results in a different cipher. This adds 3 bits to the encryption strength. I would have just used Serpent since it was the strongest cipher. But typical of cipher code, Serpent is implemented using a lot of preprocessor magic to inline critical operations. These preprocessor bits makes Serpent run a lot faster, but makes the code unfriendly to making ASCII art. Since I care more about ASCII art than I do about speed, I have decided to build my own cipher out of straightforward (but slow) array operations. I still care about the strength of the cipher though, and hopefully the rest of the manual is convincing that Hazuki is not totally weak. @node Pseudorandom number generator @section Pseudorandom number generator Hazuki uses pseudorandom number generator (PRNG) in 3 places: @itemize @bullet @item The 128-bit used internally for encryption is generated by seeding a PRNG with the key file, skip the first 8192 values, then use the next four 32-bit values as the key. @item Random values are used to generate salt (@xref{Salt}). @item Random bytes are padded to the last block to make sure that it's 16 bytes long. @end itemize IBAA pseudorandom number generator by Jenkins (a predecessor to ISAAC) is used to generate the random numbers. I didn't use the state of the art ISAAC because the code needed to implement IBAA is slightly smaller. @c }}} @c ---------------------------------------------------------------------- @c {{{ @node Miscellaneous @chapter Miscellaneous I was fortunate to observe two interesting astronomical events in recent months: there was the annular solar eclipse on 2012-05-20, and the transit of Venus on 2012-06-05. Both were pretty fun, and the transit of Venus was an especially rare event to observe. After the solar eclipse and in anticipation of the transit of Venus, I had this epiphany of how ancient people had more connection with these astronomical events, much more so than the modern day person. For example, there are some ancient monuments where some special effects can only be observed under some alignment of the stars. Most people don't build new stuff that requires those stellar alignments anymore. What a shame, I thought. So I figured, the next thing I write will require some astronomical alignment. This seems like a perfect requirement for an encryption utility -- if I created an encrypted file that can only be decrypted when the sun and moon align in some way, this drastically decreases the time window that an attacker has to break the encryption. A more enterprising attacker will of course just fork the decryption code to account for all the different alignments, but that increases the resource requirements for the attacker by that much. Of course, I could just increase the key length by a few bits and achieve the same effect, but requiring astronomical alignments is just that much more fun. I try hard not to pass on opportunities to build elaborate jokes. Having decided what I wanted to do, my next concern was which alignment to use. The first thing that immediately came to mind was moon phase, which was easy to compute, but adds only ~3 bits of protection. My next thought was to also incorporate the weekday into the equation, which will give me ~2 more bits, but adding any more extra constraints makes the utility a lot less convenient (as if the moon phase requirement wasn't already enough inconvenience). In the end, it was moon phase only: if you missed the 3 day window to decrypt the file, it will be another ~26 days before you can try again. Since the cipher depends on moon phase, it's only fitting that the corresponding ASCII art would come from "Tsukuyomi - Moon Phase", which is what I made. As a result of this process, I learned a bit more about encryption and construction of ciphers, and I now have an encryption utility that I believe has sufficient strength and utility. Furthermore, I now have a new connection to astronomical alignments. It's all for the best. @c }}}