[WIP] interfaces

2025-08-01

[WIP] interface

All media are extensions of some human faculty. The wheel is an extension of the foot. The book is an extension of the eye. The extension of any one sense reshapes how we live — and how we think.
-Marshall McLuhan

in classic vincent fashion, we need yet again preface this section by yet another history lesson. but how we learned to communicate is too nuanced to cover in earnest so lets just do a quick speedrun:

communication history

Date	Name	Description
c. 2 million years ago	Primate alarm calls (“grunts”)	Early hominins used instinctive vocalizations to warn of danger, a behavior still observed in modern vervet monkeys.
c. 500 thousand years ago	Emergence of speech capacity (FOXP2 gene)	A mutation in the FOXP2 gene—shared by Neanderthals and modern humans—laid the neural groundwork for complex vocalization.
c. 285 thousand years ago	Pigment use for symbolic engraving	Red ochre pieces engraved with crosshatches at Olorgesailie, Kenya, indicate early symbolic behavior.
c. 100 thousand years ago	Shell-bead personal ornaments	Perforated Nassarius shell beads from Qafzeh Cave, Israel, used for identity and social signaling.
c. 77,000 - c. 40,000 BCE	Symbolism in Early Human Culture	This period marks significant developments in human symbolic expression. Around 77,000 BCE, abstract engravings on ochre at Blombos Cave, South Africa, signified the dawn of visual symbolism. By 64,000 BCE, the first figurative cave paintings, including hand-stencils and animal figures, appeared in Sulawesi, Indonesia, representing the oldest known figurative rock art. Approximately 50,000 BCE, the emergence of oral traditions and myths laid the foundation for spoken storytelling, as seen in the continuous Dreamtime narratives of Australian Aboriginal cultures. By 40,000 BCE, portable “Venus” figurines like the Hohle Fels Venus in Germany conveyed shared cultural symbols in a portable form.
c. 30 000 BCE - c. 5500 BCE	Pictographs and Proto-Writing	This period marks the evolution of early pictorial and symbolic communication. Sophisticated depictions of animals in Chauvet Cave, France, and vast galleries in Lascaux, France, illustrate advanced pictorial communication. In India, Bhimbetka rock-shelter petroglyphs record communal stories in stone. Jiahu proto-symbols in China represent early attempts at proto-writing, while Mesopotamian clay accounting tokens serve as precursors to abstract record-keeping.
c. 3500 BCE - c. 1200 BCE	Early Scripts	This period marks the emergence and development of early writing systems across various civilizations. Proto-cuneiform pictographs on Sumerian tablets from Uruk IV gradually evolved into full writing, leading to the creation of Sumerian cuneiform, the world’s first true writing system, with wedge-shaped impressions on clay tablets in Uruk, Iraq. Simultaneously, Egyptian hieroglyphs emerged as a complex pictographic-ideographic script on early dynastic monuments along the Nile. In South Asia, the Indus Valley script featured undeciphered symbols on seals from Harappa and Mohenjo-Daro, indicating urban communication. In China, the oracle-bone script appeared with inscribed divinatory characters on ox scapulae during the Shang dynasty, representing the earliest form of Chinese writing.
c. 1050-600 BCE	Alphabet	The evolution of the alphabet began with the Phoenician alphabet, a streamlined consonant-only script from the Levant, which served as the ancestor to most later alphabets. This was followed by the Greek alphabet around 800 BCE, which adopted Phoenician signs and introduced distinct vowels, enabling full phonetic representation. By 600 BCE, the Aramaic script had spread as the lingua-franca of empires, with Aramaic letters simplifying and uniting diverse peoples in writing.
c. 500 BCE - c. 100 BCE	Developments in Grammar	This period saw significant advancements in linguistic codification: Pāṇini’s Aṣṭādhyāyī systematically codified Sanskrit’s phonetics and morphology, marking the earliest linguistic treatise; the Qin dynasty standardized the Chinese script into the small seal script to unify the first Chinese empire; and Dionysius Thrax’s “Art of Grammar” emerged as the first surviving Western grammar, fully codifying the rules of written Greek.

thanks chatgpt. headpats shoggoth

anyway, this is important because i want to get into the under utilized and perhaps opportunities for the future of search and agent interfacing. this is where things start to get esoteric, but bear with me.

language

ah, written language, the foundation of human civilization. we use it every day. look at me, im doing it rn!

written information, as we know, helped us distribute ideas asynchronously and at scale. we also know that this information can be ads too. but written information can only convey so much information, and only so fast.

Metric	Description	Q1 (25th pct)	Median (50th pct)	Q3 (75th pct)
Typing speed (computer keyboard)	Average speed for general users, reflecting typical computer usage.	\~35 WPM	43 WPM	\~51 WPM
Typing speed (professional typists)	Speed range for professionals, indicating high proficiency and efficiency.	\~43-80 WPM	80-95 WPM	\~120+ WPM
Stenotype typing speed	Speed using stenotype machines, common in court reporting for rapid input.	\~100-120 WPM	360 WPM	\~360 WPM
Handwriting speed (adult)	Typical speed for adults writing by hand, slower than typing.	\~5-13 WPM	13 WPM	\~20 WPM
Handwriting speed (shorthand)	Speed using shorthand, a method for fast writing by using symbols.	\~350 WPM	350 WPM	\~350 WPM
Morse code speed (manual)	Speed of manual Morse code, used in telecommunication for encoding text.	\~20 WPM	20 WPM	\~70 WPM
Morse code speed (typewriter)	Speed using a typewriter for Morse code, faster than manual transmission.	\~75.6 WPM	75.6 WPM	\~75.6 WPM

Silent reading (non-fiction)	Speed of reading non-fiction silently, reflecting comprehension pace.	\~206 WPM	238 WPM	\~269 WPM
Silent reading (fiction)	Speed of reading fiction silently, often faster due to narrative flow.	\~230 WPM	260 WPM	\~290 WPM
Subvocalization	Slowest reading form, involving internal vocalization of each word.	\~213 WPM	225 WPM	\~238 WPM
Auditory reading	Faster than subvocalization, involves hearing words silently.	\~413 WPM	425 WPM	\~438 WPM
Visual reading	Fastest reading form, recognizing words as visual units without speech.	\~513 WPM	575 WPM	\~638 WPM

Reading aloud (17 languages)	Speed of reading aloud across multiple languages, showing verbal fluency.	\~155-213 WPM	184 WPM	\~213-257 WPM
Audiobook narration speed	Standard speed for narrating audiobooks, balancing clarity and engagement.	\~150-160 WPM	150-160 WPM	\~150-160 WPM
Slide presentation speed	Speed of delivering presentations, ensuring audience comprehension.	\~100-125 WPM	100-125 WPM	\~100-125 WPM
Auctioneer speaking speed	Speed of auctioneers, characterized by rapid speech for bidding processes.	\~250 WPM	250 WPM	\~250 WPM
Fastest speaking (record)	Record speed for speaking, showcasing extreme verbal agility.	\~586-637 WPM	637 WPM	\~637 WPM

Sources: WordsRated, Wikipedia, Reading Rate Meta-Analysis

You’d probably agree that theres obviously gonna be a lot of variance in the Words Per Minute (WPM) for each language. Some languages are indeed very verbose, and some can contain a lot of nuance in just a few words. My goal in this table wasnt to argue that, but to show how much information we can convey/understand within a minute. even world record holders are only producing 650 WPM, and reading (skimming) 2000 WPM (with ~50% reading comprehension).

How much information does that convey? Can we measure this? will it vary across languages?

Figure: Language transmission speed comparison.

Research shows that despite the differences in languages, they all convey information at about 39.15 bits per second. This means human languages are actually very similarly efficient at sharing information, no matter how they sound or are structured. Across 17 different languages the researchers found that languages inherently balance how much information is in each syllable with how fast we speak to keep communication effective. They suggest that languages may have evolved to fit our brain and body capabilities. But is that the limit?

i ask this bc one of the philosophers on my personal mount rushmore, Wittgenstein, argued that indeed the limits of our language are the limits of our world. so what if we went beyond [written] language?

voice

I do think voice is under-indexed today. Today 95% of interaction is text [typing] but … I think voice will be a lot bigger going forward.
Zuck on stage at LlamaCon [April 2025]

That prev table above highlighted reading speeds as well as speaking speads. Reading aloud in 17 languages averages between 155-213 WPM, which we can use as a rough benchmark for normal speech communication rate. Audiobook narration has been studied for balancing clarity and engagement, and suggests maintaining a standard speed of 150-160 WPM. Slide presentations are delivered at 100-125 WPM to ensure audience comprehension.

Note how thats 4-5x more than the ~43 WPM we type on the computer. Its no wonder that adoption for voice tech has taken off recently. Its just easier to communicate by voice. We evolved that way, trading myths and folklore orally for 10s of thousands of years. but writing has its own merited use cases. Writing gives you more time to think, to be intentional. voice requires interpretability, which has only recently been unlocked.

It was a year prior to the aforementioned iOS 6 that Apple first introduced the idea of an assistant in the guise of Siri; for the first time you could (theoretically) compute by voice. It didn’t work very well at first (arguably it still doesn’t), but the implications for computing generally and Google specifically were profound: voice interaction both expanded where computing could be done, from situations in which you could devote your eyes and hands to your device to effectively everywhere, even as it constrained what you could do.
Ben Thompson, Google and the Limits of Strategy (2016)

voice assistants have been a pipe dream for a long time. it may finally be around the corner. Speech-to-text (STT) models are getting below 5% WER, meaning dictation to siri, alexa, and chatgpt are nearly perfect. once you add on LLMs, even the slight mistakes can be course corrected in post transcription processing given sufficient context. and sure enough, voice search queries are typically longer, with an average length of 29 words. pair that with the information 46% of all Google searches are seeking local information, 76% for voice searches. most of the 8.4 billion active voice assistant devices are locally searching for:

Sector	Voice Search Users In US
Weather	75%
Music	71%
News	64%
Entertainment	62%
Retail	54%
Food delivery and restaurants	52%
Sales, deals and promotion	52%
Healthcare and wellness	51%
Consumer packaged food	49%
Local services	49%
Personalized tips and information	48%
Making a reservation	47%
Fitness	46%
Fashion	45%
Travel	43%
Upcoming events or activities	42%
Finance	42%
Other	42%
Store location and hours info	39%
Access to customer support or service	38%

Source: BrightLocal, Think with Google, Oberlo

but voice is not only more verbose, it also carries significantly more paralinguistic information—prosodic features like pitch, cadence, as well as temporal dynamics that encode sentiment, sarcasm, and intent. text lacks this entirely /s.

im not sure how much needs to be explicitly measured. like u could do some supervised sentiment analysis classification algorithms, or u could have it be intuited by the voice mode models. more and more so, im willing to take the bitter lesson and let the neural nets learn sarcasm independently. same has applied to fluid dynamics simulations, chess engine moves and image segmentation. from there, you can do ablation probing for understanding the conejctures implied by NN models instead trying to armchair mathematic proofs. you can learn the underlying manifold that the NN discovered, rather than trying to discover it yourself.

Why haven’t brands fully exploited multimodal AI for shoppable video and voice shopping? Despite AR/VR growth, voice-first commerce remains <1 % of e-com spend—an untapped frontier for “ask and buy” agentic experiences in headsets and smart speakers.
- Reviews are pivotal in local search, with nearly 90% of consumers relying on them to evaluate local businesses. Excellent reviews can lead to a 31% increase in consumer spending.

image

The information bandwidth of human vision is over 1 million times higher than the bandwidth of reading or listening to language. Product designers take advantage of this with carefully crafted visual interfaces that efficiently convey complex information and help users take action on it.

Say you want to compare restaurant options. Most likely, a scrollable map with location pins, overlays with photos or reviews, and buttons for common filters will be more effective than typing all your criteria into a chat interface and reading the results one at a time.

google search with lens increasing usage

Language Transmission Speed
Figure: AI Shopping market map.

[CEO Mark Zuckerberg] has mentioned it on calls, multiple times basically every call.

We certainly want to make sure that video on Facebook is healthy, we think video is going to be increasingly how people communicate and consume information

Language is low bandwidth: less than 12 bytes/second. A person can read 270 words/minutes, or 4.5 words/second, which is 12 bytes/s (assuming 2 bytes per token and 0.75 words per token). A modern LLM is typically trained with 1x10^13 two-byte tokens, which is 2x10^13 bytes. This would take about 100,000 years for a person to read (at 12 hours a day).
Vision is much higher bandwidth: about 20MB/s. Each of the two optical nerves has 1 million nerve fibers, each carrying about 10 bytes per second. A 4 year-old child has been awake a total 16,000 hours, which translates into 1x10^15 bytes.

In other words:
- The data bandwidth of visual perception is roughly 16 million times higher than the data bandwidth of written (or spoken) language.
- In a mere 4 years, a child has seen 50 times more data than the biggest LLMs trained on all the text publicly available on the internet.

This tells us three things:
1. Yes, text is redundant, and visual signals in the optical nerves are even more redundant (despite being 100x compressed versions of the photoreceptor outputs in the retina). But redundancy in data is precisely what we need for Self-Supervised Learning to capture the structure of the data. The more redundancy, the better for SSL.
2. Most of human knowledge (and almost all of animal knowledge) comes from our sensory experience of the physical world. Language is the icing on the cake. We need the cake to support the icing.
3. There is absolutely no way in hell we will ever reach human-level AI without getting machines to learn from high-bandwidth sensory inputs, such as vision.

Yes, humans can get smart without vision, even pretty smart without vision and audition. But not without touch. Touch is pretty high bandwidth, too.
https://x.com/ylecun/status/1766498677751787723

be my eyes is a clever way to integrate data collect/ray-bans style ‘see what you see’ but for a good cause: they help blind ppl navigate.https://www.bemyeyes.com/

thought

that the brain is a bottleneck.
In addition, people know that listening to 120% speed sound does not hinder understanding, so the bottleneck is not the “listening to voice” part, but the stage of thinking together It has been.
If we revisit the table from earlier, assuming 1 word ≃ 6 bytes (5 letters + space) in ASCII (1 byte/char → 48 bits/word), we have:

$\text{bits/s} = \frac{\text{WPM} \times 48}{60} = 0.8 \times \text{WPM}$

Metric	Q1 bits/s	Median bits/s	Q3 bits/s
Typing speed (computer keyboard)	28	34.4	40.8
Typing speed (professional typists)	49.2	70	96
Stenotype typing speed	88	288	288
Handwriting speed (adult)	7.2	10.4	16
Handwriting speed (shorthand)	280	280	280
Morse code speed (manual)	16	16	56
Morse code speed (typewriter)	60.5	60.5	60.5

Silent reading (non-fiction)	164.8	190.4	215.2
Silent reading (fiction)	184	208	232
Subvocalization	170.4	180	190.4
Auditory reading	330.4	340	350.4
Visual reading	410.4	460	510.4

Reading aloud (17 langs)	147.2	147.2	188
Audiobook narration speed	120–128	124	128
Slide presentation speed	80	90	100
Auctioneer speaking speed	200	200	200
Fastest recorded speaking	489.2	509.6	509.6

39.15 bits isnt a lot of information encoding, especially by modern file exchange standards. Recall that in 2005, internet bandwidth became capable of data transfers at ~2 megabits per second (2,000,000 bits), which enable Youtubed to stream video. this is a 51082× increase in raw bit-rate capacity. nowadays, avg broadband connections yield ≈95.1 Mbps, so multiply by another 100x. But if ur a pro gamer, or have the purchasing power to pay a whopping $50/month, you can get 1Gbps.

cached thoughts - eliezer yudokowsky
One of the single greatest puzzles about the human brain is how the damn thing works at all when most neurons fire 10–20 times per second, or 200Hz tops. In neurology, the “hundred-step rule” is that any postulated operation has to complete in at most 100 sequential steps—you can be as parallel as you like, but you can’t postulate more than 100 (preferably fewer) neural spikes one after the other.

Can you imagine having to program using 100Hz CPUs, no matter how many of them you had? You’d also need a hundred billion processors just to get anything done in realtime.

If you did need to write realtime programs for a hundred billion 100Hz processors, one trick you’d use as heavily as possible is caching. That’s when you store the results of previous operations and look them up next time, instead of recomputing them from scratch. And it’s a very neural idiom—recognition, association, completing the pattern.

It’s a good guess that the actual majority of human cognition consists of cache lookups.

machine communication

But that has the constraint of network packetswitching. Machine-to-machine (M2M) links at data centers like Elon’s Colossus Supercluster today can now get to 400 GB/s, with 800 GB/s rolling out across the country. by 2030, 1.6 TbE (1.6 TB/s) is expected to predominate for server-to-server traffic. How fast will broadband traffic get to? the world record on a single cable is 402 TbE. How fast does it need to get to? How would humanity improve if we could communicate

To put the scale of human communication in perspective: the world record for data transmission on a single fiber optic cable is 402 terabits per second. Compare that to the 39.15 bits per second throughput of human language.

That means a single cable can transmit information over 10 trillion times faster than a person can communicate with words.
$\frac{402 \times 10^{12}}{39.15} \approx 1.03 \times 10^{13}$

If you could “upload” your thoughts at the speed of light, how many human lifetimes of speech could fit into a single second of modern fiber optic bandwidth?

Let’s do the math:

Average human verbal communication rate: 39.15 bits per second
Generously assume a human yaps for 16 hours/day, 365 days/year, for 80 years:
$16 \times 60 \times 60 = 57,600$ seconds/day
$57,600 \times 365 = 21,024,000$ seconds/year
$21,024,000 \times 80 = 1,681,920,000$ seconds/lifetime
$1,681,920,000 \times 39.15 \approx 65,852,688,000$ bits per lifetime
Modern fiber optic cable: 402 terabits/second = 402,000,000,000,000 bits/second
Number of lifetimes in one second:
$402,000,000,000,000 \div 65,852,688,000 \approx 6,105,000$ lifetimes

So, in just one second, a single fiber optic cable could transmit the entire spoken output of over 6 million human lifetimes. The bottleneck isn’t the network—it’s the brain.

This is the fundamental mismatch between the speed of our thoughts, the speed of our words, and the speed of our machines. The future of communication will be defined by how we bridge this gap.

https://www.technologynetworks.com/neuroscience/news/caltech-scientists-have-quantified-the-speed-of-human-thought-394395
The Brain’s Speed Limit: 10 Bits Per Second | Technology Networks
The unbearable slowness of being: Why do we live at 10 bits/s?: Neuron
https://www.cell.com/neuron/abstract/S0896-6273(24)00808-0?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0896627324008080%3Fshowall%3Dtrue

Model	Tokens/sec	Tokens/min	MB/min
GPT-3	~11,300	~678,000	~21.7
Llama 2 7B	1,200	72,000	~2.3
DeepSeek R1	100	6,000	~0.2
Llama 4 8B	~2,500	~150,000	~4.8
GPT-4o	~5,000	~300,000	~9.6
DeepSeek V2	~300	~18,000	~0.6

intent translation

one of my goals in finding a solution is to find an evergreen problem.

there will always exist a problem of communicating the thoughts in my head. we tried to solve this by codifying a set of words that symbolize meaning, but it has limited us to 39.15bps. how can we go past this limit?

an image is worth a thousand words. how many images can we see in a minute? how many words would that equate to?

neuralink is an extrapolation of this line of thought (pun not intended). It’s probably not something that we can build for in 2025, but it’s certainly a moonshot worth pursuing. Our fleshy brains are probably incapable of the token/information throughput of LLM models, so they’ll always be capable of more ‘intelligence’.

without getting too lost in the future’s uncertainty, what we can be certain about is that current AI agents are being (increasingly) implemented in enterprise use cases. there has been a lot of work done on the GEO for AIO, indexability/retrievability of information, and risk mitigation of the AI outputs. What has been lacking, though, is the advent of better ways to communicate my expected output from the model.

this requires clarity from the human inputter. its just a hunch, but something worth solving for.

financial thesis translation
llama prompt optimization:
https://github.com/meta-llama/llama-prompt-ops

socratic dialogue
https://x.com/dwarkesh_sp/status/1927769721081827544

Models will communicate directly through latent representations, similar to how the hundreds of different layers in a neural network like GPT-4 already interact.3 So, approximately no miscommunication, ever again. The relationship between mega-Sundar and its specialized copies will mirror what we’re already seeing with techniques like speculative decoding – where a smaller model makes initial predictions that a larger model verifies and refines.

Unlike humans, these models can amalgamate their learnings across all their copies. So one AI is basically learning how to do every single job in the world. An AI that is capable of online learning might functionally become a superintelligence quite rapidly without any further algorithmic progrss. Future AI firms will accelerate this cultural evolution through two key advantages: massive population size and perfect knowledge transfer. With millions of AGIs, automated firms get so many more opportunities to produce innovations and improvements, whether from lucky mistakes, deliberate experiments, de-novo inventions, or some combination.

AI firms will look from the outside like a unified intelligence that can instantly propagate ideas across the organization, preserving their full fidelity and context. Every bit of tacit knowledge from millions of copies gets perfectly preserved, shared, and given due consideration.

Merging will be a step change in how organizations can accumulate and apply knowledge. Humanity’s great advantage has been social learning – our ability to pass knowledge across generations and build upon it. But human social learning has a terrible handicap: biological brains don’t allow information to be copy-pasted. So you need to spend years (and in many cases decades) teaching people what they need to know in order to do their job. Look at how top achievers in field after field are getting older and older, maybe because it takes longer to reach the frontier of accumulated knowledge.

back-to-reality

everythign on all the time - legal implications?
the world post phone can be good?
In a way, the consolidation of retail is just a reflection and enabler of the way many people like to gather — in densely packed, often chaotic networks that favor serendipity and discovery. Cities do this naturally, malls did this forcefully. Victor Gruen envisioned the shopping mall as a way to graft the experience of the city onto the suburban core, predicting that large malls would become “urban sub-centers” in suburban space. The only thing Gruen and other mall retailers miscalculated was the number of malls actually required to sustain this. You might be able to argue that the consolidation of malls and department stores is now accelerating the process of urbanizing suburban America — we are coercing space into bringing us closer together. Jane Jacobs wrote about the way in which the kind of serendipitous contact found in cities (or perhaps, “urban sub-centers” of suburbia) imbues daily life with enigmatic substance:

“The trust of a city street is formed over time from many, many little public sidewalk contacts. It grows out of people stopping by at the bar for a beer, getting advice from the grocer and giving advice to the newsstand man, comparing opinions with other customers at the bakery and nodding hello to the two boys drinking pop on the stoop….utterly trivial but the sum is not trivial at all. The sum of such casual, public contact at a local level—most of it fortuitous, most of it associated with errands—is a feeling for the public identity of people, a web of public respect and trust, and a resource in time of personal or neighborhood need…The absence of this trust is a disaster to a city street. Its cultivation cannot be institutionalized.” - Jane Jacobs, The Death and Life of Great American Cities

Can’t wait for a world where Apple Watch is on my wrist; AirPods in my ears; iPhone in my pocket; Apple Glasses on my face; and this product wrapped around my neck.

when everything is AI, human is scarce.
Core Beliefs - Omnipresence
The first locomotive railway, the Penydarren, was built by Richard Trevithick in 1804. The Wright brothers took flight in 1903. Ford’s Model T reached mass production in 1908. These milestones were more than engineering feats. They expanded human presence.
For centuries, transportation was the engine of globalization. Trains made cross-continental travel affordable. Planes made international movement accessible. Cars made suburbia a reality. Movement enabled people to build, trade, learn, and connect.
But something has shifted.
While the pandemic forced society into isolationism, five years have now elapsed. You can feel the energy shifting back, but a long tail of habits remain. Technology, once a tool for connection, is now often used to avoid it. We can simulate presence without ever leaving the couch. Video calls replace handshakes. Emojis mimic eye contact. Group chats stand in for gatherings. The friction of movement is gone, but so is the instinct to be physically present.
Where carlier gencrations crossed oceans and borders in search of opportunity, the modern world offers an casier option: stay home. The neighborhood becomes the world. The screen becomes the interface.
The individual grows more isolated.
This is the age of comfortable isolation. And yet, there is alpha in resisting it.
While many embrace convenience, those who choose the discomfort of real presence are increasingly rare. In a time when anyone can be virtual, few choose to be physical. But physical presence is becoming scarce. And scarcity creates value.
Now I’m in San Francisco every two weeks, New York every six. That rhythm of moving, showing up, and being in the room has made me believe that presence compounds. It builds trust, sparks ideas, and opens doors in a way virtual never will.
We’re no longer bound by the limits of the past. We have tools, networks, and access that earlier generations could not imagine. But that freedom comes with responsibility. Not to retreat into digital convenience, but to be present. To show up where it matters, when it matters.
https://www.drorpoleg.com/the-quiet-luxury-of-language/

from Jeff Weinstein:
I’m extremely ready for:

1/ Siri 2.0 that is it at least passible for dictation and basic tasks
2/ 3p AI assistants to work more natively on iOS
3/ Trying something better/different than an iPhone

(Android+Gemini users, are you very happy? For this reason, I’d try switching.)
He is expressing the sentiment of the ‘early adopter’ zeitgeist -> people are ready to move on from the phone

software

Software is following a similar route. Yesterday, the cost to produce software was (more or less) the same whether you built for 50 users, or 50,000 users, and that forced all brands to aim for the masses.

Today, as building quality software for 5 users becomes affordable, the standard will shift quickly to depth over breadth – resonance over reach. The internet is already blooming with announcements and launches of niche apps, each crafted for niche communities rather than for mass appeal. If you are a large market incumbent seeing these indie creators in your peripheral, keep an eye on them – what they build may look like toys today, but they’ll creep into your domain sooner than you think. If you are one of those indie creators yourself, congratulations– opportunity has never been greater.

The rising competition, the increased diversity in software, the heightened user expectations – it’s all leading brands to a critical juncture. You must go far beyond offering utility. As a brand, you must find your niche community, understand their specific needs, values, and goals and create communal experiences they can connect and resonate in.

This is the new moat. By resonating with niche communities, you crack distribution, you gain a vantage point into their lives that help push your tech further, and you create meaning through their identity that makes you irreplaceable relative to faceless AI tools.
https://www.latecheckout.agency/blog/new-economics-new-software
This is why we need to invert the old concept of Product-Market Fit.

In a pre-AI world, companies built products and searched for their market.

In a post-AI world, that playbook is obsolete.

When AI can efficiently build anything, success isn’t about having a product looking for its community – it’s about having a community that demands its own product.

stone phone
https://devpost.com/software/stone-7rjq8n?_gl=117m5px0_gcl_auMTEyMDIzNzU0NS4xNzU0MzM4MDk0_gaNTYxNDM0NTYwLjE3NTQzMzgwOTQ._ga_0YHJK3Y10M*czE3NTQzMzgwOTQkbzEkZzEkdDE3NTQzMzgyMzMkajU2JGwwJGgw

In his In his year-defining essay, Vibe Shift, Santiago Pliego wrote:

Fundamentally, the Vibe Shift is a return to—a championing of—Reality, a rejection of the bureaucratic, the cowardly, the guilt-driven; a return to greatness, courage, and joyous ambition.
The Vibe Shift is the refusal to subordinate yourself and your family to the whims and anxieties of activists and bureaucrats and relearning to trust your eyes and ears.
The Vibe Shift is the rejection of secular liberal materialism and a return to the Christian foundations of the West.
The Vibe Shift is a healthy suspicion of credentialism and a return to human judgment.
The Vibe Shift, in other words, is the return to believing your own experience and intuition instead of delegating truth to the experts.

enchanted world where everything has an ambient ai voice you can talk to and be engaged
https://youtu.be/vwqXYKYCw_8?si=aFaiJCHDJXFbwfvc

fanTASTIC re-enchanting
telepathy tapes, pierre de chardin, huxley
https://www.notboring.co/p/the-return-of-magic

Consider a simple prompt: “Help me plan a workout routine.”

Nike doesn’t just create accurate workout plans; their market of aspiring athletes defines what a Nike workout must be - high-performance routines featuring elite athletes and Nike Run Club integration.
Peloton’s community-obsessed market demands social features and live classes woven into every workout.
Strava’s market of competitive outdoor enthusiasts shapes their offering into one focused on routes, records, and friendly rivalry.
Even Calm’s mindfulness-seeking market transforms the same prompt into gentle, meditative movement.
https://www.latecheckout.agency/blog/openai-is-your-next-competitor

project astra

https://www.mariehaynes.com/more-ai-innovations-coming-to-search-googles-q4-2024-earnings-call/
https://blog.google/products/android/android-xr/
https://coalitiontechnologies.com/blog/google-ai-in-2025-how-search-is-changing

detour
https://x.com/pitdesi/status/1917557362224947647

live ai
https://www.meta.com/blog/ray-ban-meta-v11-software-update-live-ai-translation-shazam/?srsltid=AfmBOop2dudvJx5CkkZJvu2YGbJYG5EYKBW_pfbJt3_lNDf1qezK-IV5

https://www.meta.com/help/ai-glasses/955732293123641/?srsltid=AfmBOoqvlPXNy2EgqoGSoLzmO9ZX9gis5_VuJboyF_7RQ8SpzIvlwRCy
https://www.theverge.com/2025/1/26/24351264/live-ai-ray-ban-meta-smart-glasses-wearables