[CONTACT]

[ABOUT]

[POLICY]

Another Tiny C Factorial S.

Found at: sdf.org:70/users/sgilles/phlog/2017-03-02_another_tiny_c_factorial.txt

Another Tiny C Factorial

S. Gilles

2017-03-02

An article has been making the rounds on the internet recently,
demonstrating a C one-liner with recursion in main and no #includes. In
53 columns or so, it computes the factorial of argc, returning the result
as the process status.

The article makes the claim that, without #includes, there is no way to
parse input, or to return output other than by the process status. I'm
not sure about the last, but the first is definitely false. Here's
a demonstration:

    int main(int c,char **v){return (c<2||*v[1]<50)?1:415-"๐™‡'"[*v[1]-49];}

    $ ./a 0; printf %s\\n $?
    1
    $ ./a 5; printf %s\\n $?
    120
    $ ./a 99; printf %s\\n $?
    158                               # Oh well - not in spec.

That's 70 columns or so, which isn't bad for having error handling.
It can be reduced to 52 columns (one better than the recursive program
if you count by wcwidth(3)!) as the following, which doesn't do
input-processing.

    int main(int c,char **v){return c<3?1:415-"ฯ๐™‡'"[c];}

Exercise for the reader: why is 415 = 256 + 159 used instead of 159,
even though the two are equivalent mod 256?

(Hint: the answer isn't โ€œFor obfuscationโ€)

I consider these programs more amusing than the recursive version, because
they illustrate the absurdity of the output mechanism.  Since process
status is reduced mod 256, there are exactly six valid inputs for this
program: the numbers 0 through 5.  There is no use actually computing
the results when we can just store them.  Since the goal is to pack
tight-ranged data into a short visual space, the clear answer is to take
byte lookup of a fancily encoded string. Since UTF-8 is the standard
these days, let's use that.

Two of the outputs are 1, and they occur at the bottom of the valid input
range, so error checking handles those.  That leaves four bytes worth of
data that need to be stored, which by coincidence is the maximum number of
bytes in a UTF-8 character.  Of course, things don't work out that nicely,
and I wasn't able to come up with a short enough function that would
transform a valid codepoint into {2, 6, 24, 120} (or the reverse). That's
not to say there' isn't one, but there are about seven characters to
build the function with, so I'd consider one impressive.

So the first byte is ignored and the apostrophe on the end carries the
value 120.  It remains to find a Unicode character whose last three bytes
(when encoded with UTF-8) can be easily transformed into {2, 6, 24} -
say, by subtracting from a constant.  A quick brute force gives a bunch
of codepoints I don't have fonts for, plus the two codepoints

  ๐™‡ U+1D647 MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL L

  ๐Ÿ›‰ U+1F6C9 BOYS SYMBOL

(they use different constants, so replacing one with the other won't
immediately work). Of the two, MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL
L demonstrates the absurdity of Unicode most clearly, so I chose it.
Consider the following:

    $ curl -s http://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt \
        | grep -E 'MATHEMATICAL .* CAPITAL L$'
    1D40B   MATHEMATICAL BOLD CAPITAL L
    1D43F   MATHEMATICAL ITALIC CAPITAL L
    1D473   MATHEMATICAL BOLD ITALIC CAPITAL L
    1D4DB   MATHEMATICAL BOLD SCRIPT CAPITAL L
    1D50F   MATHEMATICAL FRAKTUR CAPITAL L
    1D543   MATHEMATICAL DOUBLE-STRUCK CAPITAL L
    1D577   MATHEMATICAL BOLD FRAKTUR CAPITAL L
    1D5AB   MATHEMATICAL SANS-SERIF CAPITAL L
    1D5DF   MATHEMATICAL SANS-SERIF BOLD CAPITAL L
    1D613   MATHEMATICAL SANS-SERIF ITALIC CAPITAL L
    1D647   MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL L
    1D67B   MATHEMATICAL MONOSPACE CAPITAL L

All that, and I still can't type subscript โ€˜bโ€™, though subscript
โ€˜aโ€™ is at U+2090.

Detouring back to the original topic: as a runner-up, the program

    int main(int n,char **v){return --n?(1500176%(2*n+21))*n:1;}

does roughly the same thing, but in a more number-theoretic way:
the magic constant was obtained using the Chinese Remainder Theorem.
It's far too long, though, at 60 columns.


AD: