# Another Tiny C Factorial S.

## Found at: sdf.org:70/users/sgilles/phlog/2017-03-02_another_tiny_c_factorial.txt

Another Tiny C Factorial

S. Gilles

2017-03-02

An article has been making the rounds on the internet recently,

demonstrating a C one-liner with recursion in main and no #includes. In

53 columns or so, it computes the factorial of argc, returning the result

as the process status.

The article makes the claim that, without #includes, there is no way to

parse input, or to return output other than by the process status. I'm

not sure about the last, but the first is definitely false. Here's

a demonstration:

int main(int c,char **v){return (c<2||*v[1]<50)?1:415-"๐'"[*v[1]-49];}

$ ./a 0; printf %s\\n $?

1

$ ./a 5; printf %s\\n $?

120

$ ./a 99; printf %s\\n $?

158 # Oh well - not in spec.

That's 70 columns or so, which isn't bad for having error handling.

It can be reduced to 52 columns (one better than the recursive program

if you count by wcwidth(3)!) as the following, which doesn't do

input-processing.

int main(int c,char **v){return c<3?1:415-"ฯ๐'"[c];}

Exercise for the reader: why is 415 = 256 + 159 used instead of 159,

even though the two are equivalent mod 256?

(Hint: the answer isn't โFor obfuscationโ)

I consider these programs more amusing than the recursive version, because

they illustrate the absurdity of the output mechanism. Since process

status is reduced mod 256, there are exactly six valid inputs for this

program: the numbers 0 through 5. There is no use actually computing

the results when we can just store them. Since the goal is to pack

tight-ranged data into a short visual space, the clear answer is to take

byte lookup of a fancily encoded string. Since UTF-8 is the standard

these days, let's use that.

Two of the outputs are 1, and they occur at the bottom of the valid input

range, so error checking handles those. That leaves four bytes worth of

data that need to be stored, which by coincidence is the maximum number of

bytes in a UTF-8 character. Of course, things don't work out that nicely,

and I wasn't able to come up with a short enough function that would

transform a valid codepoint into {2, 6, 24, 120} (or the reverse). That's

not to say there' isn't one, but there are about seven characters to

build the function with, so I'd consider one impressive.

So the first byte is ignored and the apostrophe on the end carries the

value 120. It remains to find a Unicode character whose last three bytes

(when encoded with UTF-8) can be easily transformed into {2, 6, 24} -

say, by subtracting from a constant. A quick brute force gives a bunch

of codepoints I don't have fonts for, plus the two codepoints

๐ U+1D647 MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL L

๐ U+1F6C9 BOYS SYMBOL

(they use different constants, so replacing one with the other won't

immediately work). Of the two, MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL

L demonstrates the absurdity of Unicode most clearly, so I chose it.

Consider the following:

$ curl -s http://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt \

| grep -E 'MATHEMATICAL .* CAPITAL L$'

1D40B MATHEMATICAL BOLD CAPITAL L

1D43F MATHEMATICAL ITALIC CAPITAL L

1D473 MATHEMATICAL BOLD ITALIC CAPITAL L

1D4DB MATHEMATICAL BOLD SCRIPT CAPITAL L

1D50F MATHEMATICAL FRAKTUR CAPITAL L

1D543 MATHEMATICAL DOUBLE-STRUCK CAPITAL L

1D577 MATHEMATICAL BOLD FRAKTUR CAPITAL L

1D5AB MATHEMATICAL SANS-SERIF CAPITAL L

1D5DF MATHEMATICAL SANS-SERIF BOLD CAPITAL L

1D613 MATHEMATICAL SANS-SERIF ITALIC CAPITAL L

1D647 MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL L

1D67B MATHEMATICAL MONOSPACE CAPITAL L

All that, and I still can't type subscript โbโ, though subscript

โaโ is at U+2090.

Detouring back to the original topic: as a runner-up, the program

int main(int n,char **v){return --n?(1500176%(2*n+21))*n:1;}

does roughly the same thing, but in a more number-theoretic way:

the magic constant was obtained using the Chinese Remainder Theorem.

It's far too long, though, at 60 columns.