Implementation notes Exercise the preprocessor. ---------------------------------------------------------------------- 0. Concept January 2020 was another season for IOCCC submissions, and usually the time when I think about unique features and misfeatures of C. One such feature is the preprocessor, which in earlier years has seen a fair bit of abuse, but not so much in the past 2 years. Also, while there has been a great use of the macro expansion features, there hasn't been much use of the conditional evaluations since Daniel Vik's entry from 2004. Thus for IOCCC 2020, I thought I would try to do some compile-time computations with the preprocessor. Except I didn't, probably due to the same realization that many other people already had: it's just too tedious to write code using only the preprocessor. And the preprocessor pragmas takes too much space to fit with IOCCC rules. So instead of writing preprocessor code directly, I opted to generating preprocessor code. The usual thing I implement when I wanted to generate code is a filter that converts input to programs with some flavor of encryption, and that is what I did here. ---------------------------------------------------------------------- 1. Preprocessor computation C preprocessor provides enough primitives for manipulating essentially bits: #ifdef / #define / #undef Arithmetic is doable if we remember digital logic design classes, and left/right bitwise shifts are trivial. Next we just need loops to make things Turing-complete programming language. An infinite loop is easily accomplished like this: #include __FILE__ (well, if there were no compiler bugs, see "testing" section) A finite loop requires some bitwise arithmetic to count the number of loop iterations. Figuring out how to do this is an interesting exercise that I encourage everyone to try once. The pattern I came up with for a 3-bit loop counter looks like this: #ifdef BIT2 #undef BIT2 #include __FILE__ #define BIT1 #include __FILE__ #define BIT1 #endif #ifdef BIT1 #undef BIT1 #include __FILE__ #define BIT0 #include __FILE__ #define BIT0 #endif #ifdef BIT0 /* Computation that needs to be repeated here */ #endif So now I have loops, and I know how to do bitwise shifts. This is enough to implement LFSR: https://en.wikipedia.org/wiki/Linear-feedback_shift_register Enough features to make an interesting filter. ---------------------------------------------------------------------- 2. Preprocessor obfuscation Since I got a pseudorandom number generator, conceptually I have enough to implement a stream cipher. But actually LFSR isn't cryptographically secure, so I will market this as a thing that just "obfuscates" the input. Seems appropriate. This obfuscator was implemented in these steps: - Implement LFSR encoder in C. - Port the C code to preprocessor. This will become the decoder. - Embed that decoder in the original C encoder code. - Golf it down to size. Core of this obfuscator is a 16bit LFSR. It's 16bits because: - Every extra bit costs an extra layer of #include, and supposedly the standard only requires compilers to support 15, so 16 seemed like a good cutoff threshold. - Limiting to 16 bits avoids issues with shifting signed 32bit integers. - More than 16 bits will take too long to run anyways. The main motivation for the code golf part was to make ASCII art of course. It's always about the ASCII art. The ASCII art for this year came from a certain light novel with a very long name, as is usually the trend for light novels these days. ---------------------------------------------------------------------- 3. Testing The encoder source code has been verified to work just fine in various environments: + gcc 9.2.0 on Cygwin + gcc 8.3.0 on Linux + gcc 7.4.0 on Cygwin + gcc 6.1.0 on JSLinux + gcc 5.4.0 on Linux + clang 8.0.1 on Linux + clang 8.0.1 on Cygwin + tcc 0.9.27 on Linux + tcc 0.9.25 on JSLinux Output has been verified to *not* work with a subset of the above. I knew that doing "#include __FILE__" a lot of times would be slow, what I didn't expect was Clang failing after merely a few thousand includes: https://bugs.llvm.org/show_bug.cgi?id=44480 GCC and TCC were a lot better, but still very slow because each "#include __FILE__" causes the entire source file to be read all over again, so compiling these kind of files result in O(n^2) behavior. In the end, I just wrote my own preprocessor that doesn't have this O(n^2) behavior, and also get an extra speed boost by not bothering with macro substitutes. If we stick to encoding small messages, or if we have enough patience with GCC before it runs out of memory, we can assert that the output is generally reasonable C code. ---------------------------------------------------------------------- 4. Finally... Mile won the "Best abuse of CPP" award in IOCCC 2020.