Send Data with Sound

26 points by amrrs 11 hours ago

ASalazarMX 10 hours ago

Probably this post was inspired by all the fuzz gibberlink made last week, which uses ggwave, another data-over-audio protocol.

https://github.com/PennyroyalTea/gibberlink

karmakaze 9 hours ago

I don't feel great about gibberlink. LLMs have got AIs to interact like humans do. Similarly for the multimodal models. gibberlink could evolve into a highly efficient machine communication which leaves humans out of the loop for better/worse. We/it could make it more efficient by applying AI.
littlekey 9 hours ago

I had no idea this was real! I saw the video earlier and thought it was just faked for social media.
tdeck 9 hours ago

This is a cool concept but it actually seems slower than if they'd just continued to speak words.
- thamer 9 hours ago
  
  It's probably not slower than words, the rate for English pronunciation is something like 150-200 words per minute only.
  That said, the "gibberlink" demo is definitely much slower than even a 28.8k modem (that's kilobit). It sounds cool because we can't understand it and it seems kinda fast, but this is a terribly inefficient way for machines to communicate. It's hard to say how fast they're exchanging data from just listening, but it can't be much more than ~100 bits/sec if I had to guess.
  Even in the audible range you could absolutely go hundreds of times faster, but it's much easier to train an LLM that has some audio input capabilities if you keep this low rate and likely very distinct symbols, rather than implementing a proper modem.
  But why even have to use a modem though? Limiting communication to audio-only is a severe restriction. When AIs are going to "call" other AIs, they will use APIs… not ancient phone lines.
- ASalazarMX 9 hours ago
  
  Text is incredibly efficient and compressible. Combine it with some of the other projects mentioned here, and it would be like:
  - Shall we switch to audio data for more efficient communication?
  - Yes. [MODEM NOISES START]
  - tdeck 5 hours ago
    
    I assume the long-winded "shall we switch" dialog was more for effect in the demo, but there's no reason why it couldn't hear "I'm an AI" and just send a quick enquiry data burst without having to continue the conversation in English.

textninja 3 hours ago

> Doooooooooo dooodeeedoooodeeee doooooooooo doooooooooooo bshshhhhhzhhhhhhzhhhh

Anyone?

tanepiper 10 hours ago

12 years ago, I worked on this prototype - https://github.com/tanepiper/adOn-soundlib

The original plan was to develop essential "audio QR codes" that would allow short codes to be transmitted that could be parsed by certain apps and used to drive different interactions.

jagged-chisel 4 hours ago

What was the UX like? QR is entirely passive and requires no batteries nor logic and it continues to exist on paper.
Does some device listen for apps nearby? Do I need to walk up and press a button?

matja 10 hours ago

There's also http://www.whence.com/minimodem/ which implements some standard methods:

> standard FSK protocols such as Bell103, Bell202, RTTY, TTY/TDD, NOAA SAME, and Caller-ID

deathanatos 10 hours ago

I've never gotten minimodem to actually work.

E.g.,

  printf 'Hello, world\n' | minimodem --tx 440
  minimodem --rx 440

(you can choose any freq.) results in a lot of,

  ### CARRIER 440 @ 800.0 Hz ###
  �
  ### NOCARRIER ndata=1 confidence=1.507 ampl=0.060 bps=439.96 (0.0% slow) ###
  ### CARRIER 440 @ 800.0 Hz ###
  �
  ### NOCARRIER ndata=1 confidence=1.858 ampl=0.053 bps=439.96 (0.0% slow) ###
  ### CARRIER 440 @ 800.0 Hz ###
  �
  ### NOCARRIER ndata=1 confidence=1.832 ampl=0.063 bps=439.96 (0.0% slow) ###

and even when it does hit,

  ### CARRIER 440 @ 800.0 Hz ###
  Helln, world�
  ### NOCARRIER ndata=14 confidence=2.939 ampl=1.167 bps=438.67 (0.3% slow) ###

If I try something like the example where he cats a man page:

  ### CARRIER 1200 @ 1200.0 Hz ###
  ��-O���܇����������������������=����~`���|�����������������������������_��������=����??�����?�����oﯰ������������������|���������������������߿��������������������������������������~�����`�|�w������������-Ӱ��>��み����>�����

… I'm in a quiet room.

pdh 11 hours ago

Cool to see this done with webaudio. Reminded me of https://github.com/ggerganov/ggwave

HelloUsername 10 hours ago

Discussed on 24-feb-2025, 69 comments
https://news.ycombinator.com/item?id=43162793

vbekkerm 11 hours ago

i thought the MODEM days were behind us...

xnx 10 hours ago

How much greater is the capacity over open air vs POTS lines that maxed out at 56K?

karmakaze 9 hours ago

Sending ascending/descending ascii punctuation is fun.

knorker 10 hours ago

Turning data into audio is a big thing nowadays with amateur radio.

Ironic that the author overlaps so much with that field, without noticing that they chose the same name as probably the most used amateur radio programmer in the world.

If you're interested, the state of the art is VARA. It's closed source though, so NinoTNC may be a more interesting choice.

jedimastert 9 hours ago

I'm struggling to find the protocol for VARA, although maybe my Google abilities are just failing me.l The protocol at least should be openly available according to the FCC
- knorker 9 hours ago
  
  It's unclear to me too.
  I'm not a lawyer, nor is my ham license even in the US, but perhaps "you can decode it by using our software" satisfies the legal requirements?
  It's not, to my knowledge, deliberately obscured. That would be a legal no no, I think.
  But yes, people have fought over VARA's state here.

1970-01-01 10 hours ago

What's the baud?

pdh 10 hours ago

const CHARACTER_DURATION = 0.07; // seconds - balanced for accuracy while still fast (up from 0.055s) const CHARACTER_GAP = 0.03; // seconds - balanced for accuracy while still fast (up from 0.025s)
10 symbols per second

eigenblake 3 hours ago

What's so special about this? Homo sapiens have been doing this for hundreds of thousands of years /s