Posts

Musing about two Beings

Image
I'm thinking that we're scattering concepts of epistemology amongst ontological concerns:  Because we can represent a kind of knowledge with an LLM's output doesn't mean that that in itself qualifies as an acknowledgement by the LLM itself that it has produced knowledge.  In fact if it speaks about its production as though it conceived it, then how would an unlike sentient being be expected to respond ?  An unlike being who can know as an LLM knows seems pretty farfetched.  In some imaginations, sharing knowledge amongst beings has flux, expands and contracts, somewhat like populating neurons, but, alas, quantifying that 'bulbosity' still doesn't represent the qualities of the beings themselves. Human being consciousness evolved after spoken word became dominant over gesture (alongside cave painting and tools, etc). Memorization of instructions (or traditions) became unwieldy, and  The best (or oldest) ones who could order that chaos  Then transmit (or in...

Chaos

"This article discusses the composition and transmission of early Buddhist texts with specific reference to sutras. "After briefly summarizing the main reasons why it is likely that these oral compositions were designed to be memorized and transmitted verbatim,  "I will  Discuss the main types of changes that these texts underwent in the course of their transmission and the reasons such changes occurred, then  Attempt to give an account of the challenge that change, particularly intentional change, posed to the oral transmission of fixed, memorized texts. "'Group recitation requires that the wording of the text and the arrangement of the textual units within a collection be fixed; otherwise you would have complete chaos '."  

Spoken Past

"Before Johannes Gutenberg’s printing press revolutionized knowledge in 1440, ancient scholars developed extraordinary systems for learning, preserving, and transmitting information that would astonish modern researchers. "These methods weren’t primitive substitutes for books. They were sophisticated intellectual technologies that achieved something our digital age struggles with: perfect retention of vast quantities of information across centuries without a single written copy. "The ancient world produced some of humanity’s greatest thinkers, from Socrates and Aristotle to Confucius and the Vedic sages, all operating in cultures where books as we know them either didn’t exist or were extremely rare.  "Understanding how ancient scholars studied reveals not just historical curiosity, but profound insights into human memory, learning, and the transmission of knowledge that challenge our assumptions about education itself."

Agent-friendly docs for documentarians

"The situation is frustrating. We’re being asked to optimize our content for agent consumption, but the platforms won’t tell us what the constraints are. "Here’s what we can piece together from the available information: Truncation is real and varies by platform. Claude Code truncates between 100-150k of text. Other platforms presumably have limits too, but won’t say what they are. If your page is large, some of it is getting cut. Source order matters. Whatever content appears first in your page source is what the agent is most likely to see. Tabs, accordions, and other UI patterns that flatten into long sequential content in the source mean that later sections may be invisible to agents. Serving markdown helps, but only if the agent requests it. Most agents don’t. And even if they do, you need your server configured to respond appropriately to content negotiation. Character count is the metric that matters, not line count. As I showed earlier, a 97-line HTML file can be 4.5x...

AWS interrupted

"Amazon’s huge cloud computing arm reportedly experienced at least two outages caused by its own artificial intelligence tools, raising questions about the company’s embrace of AI as it lays off human employees. "A 13-hour interruption to Amazon Web Services’ (AWS) operations in December was caused by an AI agent, Kiro, autonomously choosing to delete and then recreate  a part of its environment, the Financial Times reported. "AWS, which provides vital infrastructure for much of the internet, suffered several outages last year. "One incident, in October, downed dozens of sites for hours and prompted discussion over the concentration of online services on infrastructure owned by a few massive companies.  "AWS has won 189 UK government contracts worth £1.7bn since 2016, the Guardian reported in October. "The AI-caused outages were smaller events, said the company, and only one affected customer-facing services."

Apophenia and pattern seeking

"The current research used structural equation models to test associations between Psychoticism, openness, intelligence, and non-self-report indicators of apophenia comprising false positive error rates on a variety of computerized tasks. "In Sample 1, 1193 participants completed digit identification, theory of mind, and emotion recognition tasks. "In Sample 2, 195 participants completed auditory signal detection and semantic word association tasks.  "Psychoticism and the openness aspect were positively correlated.  "Self-reported Psychoticism, openness, and their shared variance were positively associated with apophenia, as indexed by false positive error rates, whether or not intelligence was controlled for.  "Apophenia was not associated with other personality traits, and openness and Psychoticism were not associated with false negative errors.  "Findings provide insights into the measurement of apophenia and its relation to personality and psychop...

Environmental pareidolian phenomena

"Pareidolia is a kind of misperception caused by meaningless, ambiguous stimuli perceived with meaning. Pareidolia in a built environment may trigger the emotions of residents, and the most frequently observed pareidolian images are human faces.  "Through a pilot experiment and an in-depth questionnaire survey, this research aims to compare built environmental pareidolian phenomena at different time points (6 a.m., 12 p.m., 2 a.m.) and to determine people’s sensitivity and reactions towards pareidolia in the built environment. "Our findings indicate that the differences in stress level do not influence the sensitivity and reactions towards pareidolia in the built environment; however, age does, and the age of 40 seems to be a watershed.  "Females are more likely to identify pareidolian faces than males.  "Smokers, topers, and long-term medicine users are more sensitive to pareidolian images in the built environment.  "An unexpected finding is that most par...

Adapting LLMs to Hebrew

"Training large language models (LLMs) in low-resource languages such as Hebrew poses unique challenges.  "In this paper, we introduce DictaLM2.0 and DictaLM2.0-Instruct, two LLMs derived from the Mistral model, trained on a substantial corpus of approximately 200 billion tokens in both Hebrew and English. "Adapting a pre-trained model to a new language involves specialized techniques that differ significantly from training a model from scratch or further training existing models on well-resourced languages such as English.  "We outline these novel training methodologies, which facilitate effective learning and adaptation to the linguistic properties of Hebrew. Additionally, we fine-tuned DictaLM2.0-Instruct on a comprehensive instruct dataset to enhance its performance on task-specific instructions.  "To rigorously evaluate our models, we introduce a new benchmark suite for Hebrew LLM evaluation, covering a diverse set of tasks including  Question Answering,  ...

How does an llm work with RTL Hebrew grammar

✨AI Mode "An Large Language Model (LLM) processes Hebrew by treating it as a sequence of tokens that happens to be displayed from right to left, while its internal logic focuses on the statistical relationships between those tokens.  "1. Tokenization and Sequence Linear Processing: Despite being a Right-to-Left (RTL) language, Hebrew is stored and processed by LLMs as a sequential stream of Unicode characters. The model "reads" the tokens in the order they are written (Right to Left), assigning each a numerical ID. Morphological Challenges: Hebrew is a morphologically rich language where words are built from three-consonant roots (trilateral roots) modified by prefixes, suffixes, and infixes. Standard tokenizers often struggle with this, sometimes splitting a single Hebrew word into many inefficient sub-tokens.  Hacker News Hacker News  +3 "2. Grammar and Syntax Statistical Approximation: LLMs do not "know" Hebrew grammar rules like a linguist; instea...

How does an llm work with RTL Hebrew grammar (verbatim)

Image
✨AI Mode An Large Language Model (LLM) processes Hebrew by treating it as a sequence of tokens that happens to be displayed from right to left, while its internal logic focuses on the statistical relationships between those tokens.   1. Tokenization and Sequence Linear Processing : Despite being a Right-to-Left (RTL) language, Hebrew is stored and processed by  LLMs  as a  sequential stream of Unicode characters . The model "reads" the tokens in the order they are written (Right to Left), assigning each a numerical ID. Morphological Challenges : Hebrew is a  morphologically rich  language where words are built from three-consonant roots (trilateral roots) modified by prefixes, suffixes, and infixes. Standard tokenizers often struggle with this, sometimes splitting a single Hebrew word into many inefficient sub-tokens.   Hacker News  +3 2. Grammar and Syntax Statistical Approximation : LLMs do not "know" Hebrew grammar rules like a linguist; instea...

RAM

Image

Zombie

"We no longer watch the news: the news watches us. "Stories written by robots are posted on websites that suck personal information from our phones and hand it to data miners controlled by oligarchs.  " Journalists  brag that they rely on AI to map the child rape networks that they once denied existed: the networks that include the very oligarchs breeding the AI.  "They dig humanity’s grave and stagger back out of it. " Fake news  used to mean propaganda or inconvenient truths that Trump deemed lies.  "Now fake news is news written by non-human entities hallucinating a counterfeit reality.  "It is fake without motive, fake for the sake of being fake and making you lose confidence that anything can be real.  "It is zombie news for zombie consumption."

Tesla robotaxi system fails 🙈

"Citing data from the National Highway Traffic Safety Administration (NHTSA), Electrek reports that Tesla disclosed five new crashes involving its robotaxi fleet in Austin.  "The new data raises concerns about how safe Tesla’s systems really are compared to the average driver. "Tesla reported the crashes last month, covering incidents from December 2025 and January. All five involved Tesla Model Y vehicles operating in Austin with autonomous driving systems engaged. "The incidents included a collision with a fixed object at 17 miles per hour, a crash with a bus while the Tesla vehicle was stopped, a crash with a truck at four miles per hour, and two cases where Tesla vehicles backed into fixed objects at low speeds."

Alpha School

"Alpha School, an AI-powered private school  that heavily relies on AI to teach students and can cost up to $65,000 a year, is AI-generating faulty lesson plans that internal company documentation find sometimes do 'more harm than good,' and scraping data from a variety of other online courses without permission to train its own AI, according to former Alpha School employees and internal company documents.  "Alpha School has earned fawning coverage from Fox News and The New York Times and received praise from Linda McMahon, the Trump-appointed Secretary of Education, for using generative AI to chart the future of education.  "But samples of poorly constructed AI-generated lessons that I have viewed present students with unclear wording and illogical choices in multiple choice questions."  

CiviClick

"Tens of thousands of emails poured into Southern California’s top air pollution authority as its board weighed a June proposal to phase out gas-powered appliances. "But in reality, many of the messages that may have swayed the powerful regulatory agency to scrap the plan were generated by a platform that is powered by artificial intelligence. "Public records requests reviewed by The [LA] Times and corroborated by staff members at the South Coast Air Quality Management District confirm that more than 20,000 public comments submitted in opposition to last year’s proposal were generated by a Washington, D.C.-based company called CiviClick, which bills itself as the first and best AI-powered grassroots advocacy platform . "A Southern California-based public affairs consultant, Matt Klink, has taken credit for using CiviClick to wage the opposition campaign, including in a sponsored article on the website Campaigns and Elections. The campaign 'left the staff of th...

When the spirit is unwilling…

"Meta has been granted a patent outlining an AI system capable of simulating a user’s activity on social media, including continuing to post after their death. "The filing, granted in late December and originally submitted in 2023, describes how a large language model could replicate a person’s online behavior using their past data. "As reported by Business Insider , this includes posts, comments, chats, voice messages, likes, and other interactions, allowing the system to respond to content, publish updates, or message other users in a way that mirrors the original account holder. "According to the patent, the model 'may be used for simulating the user when the user is absent from the social networking system,' including cases where the person is on a long break or deceased.  "The filing notes that the impact is much more severe and permanent  if the user has died and cannot return to the platform."

Open AI's memo to the House

"China's distillation methods over the last year have become more sophisticated, moving beyond chain-of-thought (CoT) extraction to multi-stage operations. These include synthetic-data generation, large-scale data cleaning, and other stealthy methods. "OpenAI also notes that it has invested in stronger detections to prevent unauthorized distillation. It bans accounts that violate its terms of service and proactively removes users who appear to be attempting to distill its models. Still, the company admits that it alone can't solve the model distillation problem. "It's going to take an ecosystem security  approach to protect against distillation, and this will require some US government assistance, OpenAI says.  "'It is not enough for any one lab to harden its protection because adversaries will simply default to the least protected provider,' according to the memo  (pdf).  "The AI company also suggests that US government policy may be helpfu...

Supply Chain Risk

"An Anthropic official told Axios that although there are laws against domestic mass surveillance, 'They have not in any way caught up to what AI can do,' which is why Anthropic wants to put tighter limits on its military use. "Hegseth, however, is close to not just cutting (sic) ending its $200 million contract with Anthropic, but designating the company a supply chain risk  —a penalty usually reserved for foreign adversaries, according to Axios . "That would require any company doing business with the military to also certify that they don’t use Anthropic tools in their own workflows. "The company brings in $14 billion in annual revenue and is widely considered a leader in many business applications, with eight of the top 10 biggest U.S. companies using Claude, according to Axios ." 

Uh oh, part 3

Scott Shambaugh:  "I’ve talked to several reporters, and quite a few news outlets have covered the story. " Ars Technica wasn’t one of the ones that reached out to me, but I especially thought this piece from them was interesting (since taken down — here’s the archive link ).  "They had some nice quotes from my blog post explaining what was going on.  "The problem is that these quotes were not written by me, never existed, and appear to be AI hallucinations themselves. "I won’t name the authors here. Ars , please issue a correction and an explanation of what happened. "Update: Ars Technica issued a brief statement admitting that AI was used to fabricate these quotes."

India AI Impact Summit

"India is hosting an artificial intelligence summit this week, bringing together heads of state, senior officials and tech executives to New Delhi for a five-day gathering highlighting the growing global importance of the technology. "Organizers said the India AI Impact Summit is the first such summit being held in the Global South to discuss the technology developed and dominated by wealthy companies based in rich countries. I "t comes at a pivotal moment as AI rapidly transforms economies, reshapes labor markets and raises questions around regulations, security and ethics. "From generative AI tools that can produce text and images to advanced systems used in defense, health care and climate modeling, AI has become a central focus for governments and corporations across the world. "The summit, previously held in France, the U.K. and South Korea, has evolved far beyond its modest beginnings as a meeting tightly focused on the safety of cutting-edge AI systems i...