Iterated Log Coding

adamscherlis.github.io

109 points by snarkconjecture 4 months ago

nayuki 4 months ago

It reminds me of schemes like https://en.wikipedia.org/wiki/Elias_delta_coding , https://en.wikipedia.org/wiki/Elias_omega_coding

tromp 4 months ago

As well as https://en.wikipedia.org/wiki/Levenshtein_coding which also encodes iterated lengths.

fc417fc802 4 months ago

It's a neat encoding but the writeup is needlessly confusing. The introduction would really benefit from a graph or two. Or even just a textual sequence showing how the value evolves as you iterate on it.

Similarly, while the end result is quite elegant the consequences of the choices made to get there aren't really explained. It's quite a bit of extra effort to reason out what would have happened if things had been done differently.

BreakMaker9000 4 months ago

Agree, I had to ask ChatGPT to explain this to me with an example number and then I was able to understand the approach.
- fc417fc802 4 months ago
  
  It was successfully able to run through the math? I'm surprised. That's quite impressive.

Retr0id 4 months ago

How feasible is it to do arithmetic directly in this representation?

contravariant 4 months ago

Well the good news is that multiplication of n-bit numbers simplifies to handling the sign and adding n-1 bit numbers.
jillesvangurp 4 months ago

For low bit representations, you could just pre-calculate all the possible answers and cache those. A simple algorithm would be to decode numbers, do the arithmetic, and re-encode the answer. With the seven bit example used in the article, there are only 2^7 = 128 unique numbers. So you'd need 16KB of memory to store all possible multiplications. With 16 bit numbers that grows to 4GB. Beyond that, it probably gets too expensive rapidly. You might get some optimizations out of the symmetry for positive and negative.
kleiba 4 months ago

That's exactly the crux. Addition and subtraction in log space is not exactly straight-forward.

xixixao 4 months ago

Given an integer n, what is the number of bits required to represent every integer in the range -n..n ?

maggit 4 months ago

Exact integers doesn't seem to be its strong suite. Can it even represent 3 exactly?
Running code from the linked notebook (https://github.com/AdamScherlis/notebooks-python/blob/main/m...), I can see that a 32 bit representation of the number 3 decodes to the following float: 2.999999983422908
(This is from running `decode(encode(3, 32))`)
- dullcrisp 4 months ago
  
  Yeah I think any numbers away from 0 or infinity aren’t its strong suit.
lmm 4 months ago

> Given an integer n, what is the number of bits required to represent every integer in the range -n..n ?
(log n) + 1
- maggit 4 months ago
  
  That's true for normal binary encoding of integers, but I think we should understand the question in context of the post: What's the number of bits required in iterated log coding?
  - yorwba 4 months ago
    
    Empirically, it seems to grow more like 2*log2(n)+1. A handwavy argument can be made that the first bit serves to distinguish the positive values from the negative ones, but after that on average every second bit only adds more precision to values that are already distinguishable or out of range, but doesn't help with values whose representation has the same prefix. I don't know how to make that airtight, though...

dullcrisp 4 months ago

This feels like you can use e (or I guess any number) for the logarithm base rather than 2. I wonder if that’d make it a bit more uniform in some sense.

DannyBee 4 months ago

Isn't this just a variant of Dirac's solution for representing any number using sqrt, log, and the number 2?

yorwba 4 months ago

It is not. (There are no square roots, for instance.)
- karparov 4 months ago
  
  It very much is. sqrt(x) is just x^(1/2) which is x^(2^-1). Dirac's solution is using iterated square root of 2, effectively generating a sequence similar to what's used in this post.
  - yorwba 4 months ago
    
    Okay, but iterating square roots like √√2 = (2^(2^-1))^(2^-1) recurses into the base, whereas the equivalent iterated log is 2^(2^-1 × 2^-1) = 2^(2^-2) = +2^(+2^(-2^(+2^0))) with the bit representation [1 1 0 0 1 0 0 ...], i.e. it recurses into the exponent.
    
    DannyBee 4 months ago
    
    So uh, how is that not a variation, exactly?
    Dirac's solution was also arbitrarily restricted by the problem definition.
    Do you believe that if you asked someone familiar with the solution to come up with a bit efficient variant they would not have trivially come up with the encoding in this post and called it a variation?
    I don't, for a second, believe that.
snarkconjecture 4 months ago

Not really. Dirac's trick works entirely at a depth of two logs, using sqrt like unary to increment the number. It requires O(n) symbols to represent the number n, i.e. O(2^n) symbols to represent n bits of precision. This thing has arbitrary nesting depth of logs (or exps), and can represent a number to n bits of precision in O(n) symbols.
- DannyBee 4 months ago
  
  Sure, but that was because that was an arbitrary restriction on the problem dirac solved.
  Still seems a fairly simple variation once you remove the arbitrary restriction. To the point: I don't believe for a second that anyone familiar with his solution, asked to make it more bit efficient, would not have come up with this. Nor do I believe they would call it anything other than a variation.
  That doesn't make it less cool but I don't think it's like amazingly novel.

millipede 4 months ago

Neat, looks like it would work well for values clustered around 0.5, and which hang more closely around 0 and 1.

rossant 4 months ago

Nice and elegant! Surprising it was not discovered earlier (unless it was and we're unaware of it).

DannyBee 4 months ago

It was. It's a variant of Dirac's solution to representing any number using sqrt, log, and the number 2.

lblume 4 months ago

> Any value representable in n bits is representable in n+1 bits

> [1 1 1 1 1 1 1] 2.004e+19728

Does that mean that the 8 bits version has numbers larger than this? Doesn't seem very useful for 10^100 is already infinity for all practical purposes.

fc417fc802 4 months ago

> Does that mean that the 8 bits version has numbers larger than this?
Yes. Not just a bit larger but an even more ridiculous leap. Notice that you're iterating exponents. That last one (7 bits) was 2^65536 so the next one (8 bits) will be 2^2^65536.
Python objects to 2^65536 complaining that the base 10 string to represent the integer contains more than 4300 digits so it gave up.
> Doesn't seem very useful
By that logic we should just use fixed point because who needs to work with numbers as large as 2^1023 (64 bit IEEE 754). These things aren't useful unless you're doing something that needs them, in which case they are. I could see the 5 or 6 bit variant of such an iterated scheme potentially being useful as an intermediary representation for certain machine learning applications.
dullcrisp 4 months ago

Who said anything about practical purposes?
- lblume 4 months ago
  
  I usually like my float representations to have some use. But sure, this does seem like a very interesting idea.
  - dullcrisp 4 months ago
    
    > I usually like my float representations to have some use.
    If this does have some use, I don’t think representing real-world numbers is one of them, for the reasons you intuited.

zombot 4 months ago

I thought it was some log file hackery, but it's actually about logarithms.

vednig 4 months ago

holy Father! this is amazing, the things a human mind can do, ask an LLM for that and it'll give you 10 more complicated and unmaintainable ways to do it.

maggit 4 months ago

Cute! I wonder if it would be amenable for use as a variable-width encoding for, say, DCT coefficients in a JPEG-like codec..?

bionhoward 4 months ago

Could be good for AI representation theory, cool idea

jerrymartidia 4 months ago

[dead]

bflesch 4 months ago

Feels a bit like binary encoding with + and - instead of 0 and 1