Implementation notes

Exercise the preprocessor.

----------------------------------------------------------------------
0. Concept

January 2020 was another season for IOCCC submissions, and usually the
time when I think about unique features and misfeatures of C.  One
such feature is the preprocessor, which in earlier years has seen a
fair bit of abuse, but not so much in the past 2 years.  Also, while
there has been a great use of the macro expansion features, there
hasn't been much use of the conditional evaluations since Daniel Vik's
entry from 2004.  Thus for IOCCC 2020, I thought I would try to do
some compile-time computations with the preprocessor.

Except I didn't, probably due to the same realization that many other
people already had: it's just too tedious to write code using only the
preprocessor.  And the preprocessor pragmas takes too much space to
fit with IOCCC rules.  So instead of writing preprocessor code
directly, I opted to generating preprocessor code.

The usual thing I implement when I wanted to generate code is a filter
that converts input to programs with some flavor of encryption, and
that is what I did here.

----------------------------------------------------------------------
1. Preprocessor computation

C preprocessor provides enough primitives for manipulating essentially
bits:

   #ifdef / #define / #undef

Arithmetic is doable if we remember digital logic design classes, and
left/right bitwise shifts are trivial.  Next we just need loops to
make things Turing-complete programming language.  An infinite loop is
easily accomplished like this:

   #include __FILE__

(well, if there were no compiler bugs, see "testing" section)

A finite loop requires some bitwise arithmetic to count the number of
loop iterations.  Figuring out how to do this is an interesting
exercise that I encourage everyone to try once.  The pattern I came up
with for a 3-bit loop counter looks like this:

   #ifdef BIT2
      #undef BIT2
      #include __FILE__
      #define BIT1
      #include __FILE__
      #define BIT1
   #endif

   #ifdef BIT1
      #undef BIT1
      #include __FILE__
      #define BIT0
      #include __FILE__
      #define BIT0
   #endif

   #ifdef BIT0
      /* Computation that needs to be repeated here */
   #endif

So now I have loops, and I know how to do bitwise shifts.  This is
enough to implement LFSR:
https://en.wikipedia.org/wiki/Linear-feedback_shift_register

Enough features to make an interesting filter.

----------------------------------------------------------------------
2. Preprocessor obfuscation

Since I got a pseudorandom number generator, conceptually I have
enough to implement a stream cipher.  But actually LFSR isn't
cryptographically secure, so I will market this as a thing that just
"obfuscates" the input.  Seems appropriate.

This obfuscator was implemented in these steps:

- Implement LFSR encoder in C.
- Port the C code to preprocessor.  This will become the decoder.
- Embed that decoder in the original C encoder code.
- Golf it down to size.

Core of this obfuscator is a 16bit LFSR.  It's 16bits because:

- Every extra bit costs an extra layer of #include, and supposedly the
  standard only requires compilers to support 15, so 16 seemed like a
  good cutoff threshold.

- Limiting to 16 bits avoids issues with shifting signed 32bit
  integers.

- More than 16 bits will take too long to run anyways.

The main motivation for the code golf part was to make ASCII art of
course.  It's always about the ASCII art.  The ASCII art for this year
came from a certain light novel with a very long name, as is usually
the trend for light novels these days.

----------------------------------------------------------------------
3. Testing

The encoder source code has been verified to work just fine in various
environments:
+ gcc 9.2.0 on Cygwin
+ gcc 8.3.0 on Linux
+ gcc 7.4.0 on Cygwin
+ gcc 6.1.0 on JSLinux
+ gcc 5.4.0 on Linux
+ clang 8.0.1 on Linux
+ clang 8.0.1 on Cygwin
+ tcc 0.9.27 on Linux
+ tcc 0.9.25 on JSLinux

Output has been verified to *not* work with a subset of the above.  I
knew that doing "#include __FILE__" a lot of times would be slow, what
I didn't expect was Clang failing after merely a few thousand includes:
https://bugs.llvm.org/show_bug.cgi?id=44480

GCC and TCC were a lot better, but still very slow because each
"#include __FILE__" causes the entire source file to be read all over
again, so compiling these kind of files result in O(n^2) behavior.  In
the end, I just wrote my own preprocessor that doesn't have this
O(n^2) behavior, and also get an extra speed boost by not bothering
with macro substitutes.

If we stick to encoding small messages, or if we have enough patience
with GCC before it runs out of memory, we can assert that the output
is generally reasonable C code.

----------------------------------------------------------------------
4. Finally...

Mile won the "Best abuse of CPP" award in IOCCC 2020.