fc417fc802 3 days ago

It's a neat encoding but the writeup is needlessly confusing. The introduction would really benefit from a graph or two. Or even just a textual sequence showing how the value evolves as you iterate on it.

Similarly, while the end result is quite elegant the consequences of the choices made to get there aren't really explained. It's quite a bit of extra effort to reason out what would have happened if things had been done differently.

  • BreakMaker9000 3 days ago

    Agree, I had to ask ChatGPT to explain this to me with an example number and then I was able to understand the approach.

    • fc417fc802 3 days ago

      It was successfully able to run through the math? I'm surprised. That's quite impressive.

Retr0id 3 days ago

How feasible is it to do arithmetic directly in this representation?

  • contravariant 3 days ago

    Well the good news is that multiplication of n-bit numbers simplifies to handling the sign and adding n-1 bit numbers.

  • jillesvangurp 3 days ago

    For low bit representations, you could just pre-calculate all the possible answers and cache those. A simple algorithm would be to decode numbers, do the arithmetic, and re-encode the answer. With the seven bit example used in the article, there are only 2^7 = 128 unique numbers. So you'd need 16KB of memory to store all possible multiplications. With 16 bit numbers that grows to 4GB. Beyond that, it probably gets too expensive rapidly. You might get some optimizations out of the symmetry for positive and negative.

  • kleiba 3 days ago

    That's exactly the crux. Addition and subtraction in log space is not exactly straight-forward.

xixixao 3 days ago

Given an integer n, what is the number of bits required to represent every integer in the range -n..n ?

  • maggit 3 days ago

    Exact integers doesn't seem to be its strong suite. Can it even represent 3 exactly?

    Running code from the linked notebook (https://github.com/AdamScherlis/notebooks-python/blob/main/m...), I can see that a 32 bit representation of the number 3 decodes to the following float: 2.999999983422908

    (This is from running `decode(encode(3, 32))`)

    • dullcrisp 3 days ago

      Yeah I think any numbers away from 0 or infinity aren’t its strong suit.

  • lmm 3 days ago

    > Given an integer n, what is the number of bits required to represent every integer in the range -n..n ?

    (log n) + 1

    • maggit 3 days ago

      That's true for normal binary encoding of integers, but I think we should understand the question in context of the post: What's the number of bits required in iterated log coding?

      • yorwba 3 days ago

        Empirically, it seems to grow more like 2*log2(n)+1. A handwavy argument can be made that the first bit serves to distinguish the positive values from the negative ones, but after that on average every second bit only adds more precision to values that are already distinguishable or out of range, but doesn't help with values whose representation has the same prefix. I don't know how to make that airtight, though...

dullcrisp 3 days ago

This feels like you can use e (or I guess any number) for the logarithm base rather than 2. I wonder if that’d make it a bit more uniform in some sense.

DannyBee 3 days ago

Isn't this just a variant of Dirac's solution for representing any number using sqrt, log, and the number 2?

  • yorwba 3 days ago

    It is not. (There are no square roots, for instance.)

    • karparov 3 days ago

      It very much is. sqrt(x) is just x^(1/2) which is x^(2^-1). Dirac's solution is using iterated square root of 2, effectively generating a sequence similar to what's used in this post.

      • yorwba 3 days ago

        Okay, but iterating square roots like √√2 = (2^(2^-1))^(2^-1) recurses into the base, whereas the equivalent iterated log is 2^(2^-1 × 2^-1) = 2^(2^-2) = +2^(+2^(-2^(+2^0))) with the bit representation [1 1 0 0 1 0 0 ...], i.e. it recurses into the exponent.

        • DannyBee 3 days ago

          So uh, how is that not a variation, exactly?

          Dirac's solution was also arbitrarily restricted by the problem definition.

          Do you believe that if you asked someone familiar with the solution to come up with a bit efficient variant they would not have trivially come up with the encoding in this post and called it a variation?

          I don't, for a second, believe that.

  • snarkconjecture 3 days ago

    Not really. Dirac's trick works entirely at a depth of two logs, using sqrt like unary to increment the number. It requires O(n) symbols to represent the number n, i.e. O(2^n) symbols to represent n bits of precision. This thing has arbitrary nesting depth of logs (or exps), and can represent a number to n bits of precision in O(n) symbols.

    • DannyBee 3 days ago

      Sure, but that was because that was an arbitrary restriction on the problem dirac solved.

      Still seems a fairly simple variation once you remove the arbitrary restriction. To the point: I don't believe for a second that anyone familiar with his solution, asked to make it more bit efficient, would not have come up with this. Nor do I believe they would call it anything other than a variation.

      That doesn't make it less cool but I don't think it's like amazingly novel.

millipede 3 days ago

Neat, looks like it would work well for values clustered around 0.5, and which hang more closely around 0 and 1.

rossant 3 days ago

Nice and elegant! Surprising it was not discovered earlier (unless it was and we're unaware of it).

  • DannyBee 3 days ago

    It was. It's a variant of Dirac's solution to representing any number using sqrt, log, and the number 2.

lblume 3 days ago

> Any value representable in n bits is representable in n+1 bits

> [1 1 1 1 1 1 1] 2.004e+19728

Does that mean that the 8 bits version has numbers larger than this? Doesn't seem very useful for 10^100 is already infinity for all practical purposes.

  • fc417fc802 3 days ago

    > Does that mean that the 8 bits version has numbers larger than this?

    Yes. Not just a bit larger but an even more ridiculous leap. Notice that you're iterating exponents. That last one (7 bits) was 2^65536 so the next one (8 bits) will be 2^2^65536.

    Python objects to 2^65536 complaining that the base 10 string to represent the integer contains more than 4300 digits so it gave up.

    > Doesn't seem very useful

    By that logic we should just use fixed point because who needs to work with numbers as large as 2^1023 (64 bit IEEE 754). These things aren't useful unless you're doing something that needs them, in which case they are. I could see the 5 or 6 bit variant of such an iterated scheme potentially being useful as an intermediary representation for certain machine learning applications.

  • dullcrisp 3 days ago

    Who said anything about practical purposes?

    • lblume 3 days ago

      I usually like my float representations to have some use. But sure, this does seem like a very interesting idea.

      • dullcrisp 3 days ago

        > I usually like my float representations to have some use.

        If this does have some use, I don’t think representing real-world numbers is one of them, for the reasons you intuited.

zombot 3 days ago

I thought it was some log file hackery, but it's actually about logarithms.

vednig 3 days ago

holy Father! this is amazing, the things a human mind can do, ask an LLM for that and it'll give you 10 more complicated and unmaintainable ways to do it.

maggit 3 days ago

Cute! I wonder if it would be amenable for use as a variable-width encoding for, say, DCT coefficients in a JPEG-like codec..?

bionhoward 3 days ago

Could be good for AI representation theory, cool idea

bflesch 3 days ago

Feels a bit like binary encoding with + and - instead of 0 and 1