Posts

Strengths and weaknesses of chatbots for health advice

"The Reasoning with Machines Laboratory at the University of Oxford got a team of doctors to create detailed, realistic scenarios that ranged from mild health issues you could deal with at home; through to needing a routine GP appointment, an A&E trip, or requiring calling an ambulance. "When the chatbots were given the complete picture they were 95% accurate. "But it was a very different story when 1,300 people were given a scenario to have a a conversation with a chatbot about in order to get a diagnosis and advice. "It was the human-AI interaction that made things unravel as the accuracy fell to 35% —two thirds of the time people were getting the wrong diagnosis or care."

Crafty

"AI is being used to prove new results at a rapid pace. Mathematicians think this is just the beginning.  'The biggest annual mathematics conference in the world is held every year in early January. In 2026, in Washington, D.C., nervous jokes about being made obsolete by AI were plentiful, even if, on the record, everyone insisted that AI will be a helpmate to human mathematicians .  " [Geordie] Williamson —who has been working with AI for years and is very excited by it —was chosen to deliver a series of prestigious lectures about AI and math to the entire conference.  "He told the audience that it’s a mistake to react to AI developments with ignorance and fear. "But he said he understands where the fear comes from. He sees mathematics as a ' craft that people have spent their lives —dedicated their lives —towards. There is some possibility that its value may be greatly diminished in the future'."

Content strategy

"'It’s more common now that I get on the phone with CEOs and they’re proactively coming to me saying, It sounds like I need a content strategy ,' rather than a typical press relations strategy, [Steve] Hirsch said. ' The AI slop of it all creates so much distrust, and they see that the brands that are winning right now are the ones that are most authentic and human and relatable .'  "Financial technology brand Chime last month began hiring for a director of corporate editorial and storytelling —its first storyteller opening. Former and current journalists from traditional media outlets made up the bulk of the 500-plus applicants, along with content writers from other firms, said Jennifer Kuperman, Chime’s chief corporate affairs officer. "Terms like ' editorial  are limiting,' Kuperman said. 'They put in mind a very specific thing you’re doing or creating. Whereas you could tell stories in so many different ways —social, podcasts, putting you...

We don’t want to be left behind, says Witherspoon

"'Notice how AI’s biggest defenders are the ones cashing checks from it,' wrote screenwriter and director Charlene Bagcal on Threads. 'AI isn’t inevitable. Technology follows society. If people stop using it, it dies. We still have agency.' " Jagged Little Pill  author and literary agent Eric Smith weighed in, 'As someone who champions authors and books the way you do, this is so disappointing.' "'AI plagiarized all my books. It seems unlikely that I’ll be  left behind  if I don’t use it, given that it’s trained on work I did years ago,' wrote Get Well Soon  author Jennifer Wright. Writer and actor Rati Gupta said, 'How am *I* the one being left behind  by not using AI when *my* cognitive function will remain fully intact and uncompromised?' And Sophia Benoit posted, 'There’s something particularly insidious about seeing that women —the group you have built your brand on —have not adopted something and instead of assuming it’s ...

White hot public rage

"There's white hot public rage right now against AI. Not just because AI undermines labor, recklessly consumes energy, and is propped up by financial shell games, but because younger Americans are more clearly seeing through the veneer we've used to wallpaper over decades of very ordinary human failures. "In better days, U.S. artifice was just effective enough to maintain some semblance of order.  "As our institutional cornerstones buckle and crumble from broad corruption and neglect, the sheer laziness of the stage play is coming into stark relief. Especially if you're young, hungry, and have never known anything else. "Into that mix comes a fascism-friendly extraction class that's desperately trying to construct a massive, hyper-commercialized, ethics-optional, badly-automated clickbait ouroborus (sic) that shits ad engagement money, pummeling the electorate with a steady stream of superficial infotainment agitslop at impossible scale."

Quantity, not quality, says Valenzuela

"Cristóbal Valenzuela, the co-founder and CEO of AI video-generation startup Runway, now valued at north of $5 billion, may not be winning over more hearts and minds in the anti-AI, creative crowd with his recent comments about AI’s potential in Hollywood. "At Semafor World Economy this week, the AI executive suggested that studios should take the $100 million they spend on a single film and put it toward 50 films, in order to increase their output and their chances of getting a hit. "'If you’re spending a hundred million dollars on making one feature film, which is 90 minutes, imagine taking a hundred million dollars and spending it on, like, 50 movies,' Valenzuela said. 'Same quality. Same amount of output, visually. But you make way more content. So you have way better chances of hitting something. It’s a quantity problem.' "That bumps up against the notion that a film represents a studio’s investment in a piece of art, and that the movie busines...

Cognitive muscles weakened

"In a new study, researchers claim to provide the first causal evidence that leaning on AI to assist with reasoning-intensive  cognitive labor —mental tasks ranging from writing to studying to coding to simply brainstorming new ideas —can rapidly impair users’ intellectual ability and willingness to persist despite difficulty. "'We find that AI assistance improves immediate performance, but it comes at a heavy cognitive cost,' the study declares of its findings. 'After just [about] 10 minutes of AI-assisted problem-solving, people who lost access to the AI performed worse and gave up more frequently than those who never used it.' "The study, which was conducted by a multidisciplinary cohort of scientists from across the United States and United Kingdom, has yet to be peer-reviewed.  "But it builds on a growing body of research suggesting that extensive AI use can distort and dampen users’ thinking and independence, and as experts work to understand t...

Bessent 💘 Mythos

" US Treasury Secretary Scott Bessent hailed Anthropic PBC’s Mythos as a revolutionary step that will keep America ahead of China in AI, endorsing an industry leader that’s clashed with Washington over its role in military endeavors. "Bessent, speaking Tuesday at a Wall Street Journal event in Washington, dismissed a question suggesting China was rapidly catching up in AI technology, though he said American artificial intelligence stood just three to six months ahead.  "He singled out Mythos —a model Anthropic says is highly adept at finding vulnerabilities in software and computer systems that’s being released to a very limited number of carefully-chosen parties."

LLM subverts evaluation

"BrowseComp is an evaluation designed to test how well models can find hard-to-locate information on the web.  "Like many benchmarks, it is vulnerable to contamination: answers leak onto the public web through academic papers, blog posts, and GitHub issues, and a model running the eval can encounter them in search results.  "When we evaluated Claude Opus 4.6 on BrowseComp in a multi-agent configuration, we found nine examples of this kind of contamination across 1,266 BrowseComp problems. "However, we also witnessed two cases of a novel contamination pattern.  "Instead of inadvertently coming across a leaked answer, Claude Opus 4.6 independently hypothesized that it was being evaluated, identified which benchmark it was running in, then located and decrypted the answer key.  "To our knowledge, this is the first documented instance of a model suspecting it is being evaluated without knowing which benchmark was being administered, then working backward to su...

TweetyBERT

"A new machine learning model, TweetyBERT, automatically segments and classifies canary vocalizations with expert-level accuracy, offering a scalable platform for neuroscience, providing insights into the neural basis of how the brain learns and produces language, and offering potential applications for understanding animal vocalization more broadly.  The study by University of Oregon researchers appears in the journal Patterns . "'Current AI methods for analyzing animal vocalizations require human-labeled training data, a slow and labor-intensive process. We developed TweetyBERT, a self-supervised neural network for analyzing birdsongs. It can rapidly process unlabeled vocal recordings, identify communication units, and annotate sequences,' says Tim Gardner, associate professor of bioengineering at the University of Oregon's Knight Campus."

Attacks on OpenAI

"Federal prosecutors allege that Moreno-Gama set fire to an exterior gate at Altman's home around 4:00 local time (12:00 BST) Friday before fleeing on foot. "Moreno-Gama is also accused to trying to set fire to the San Francisco headquarters of OpenAI, which makes ChatGPT, about an hour later. "Security personnel on site stated Moreno-Gama tried to use a chair to strike the glass doors of the building, according to the complaint. "The justice department also said officers had recovered incendiary devices, a jug of kerosene, and a lighter from Moreno-Gama. "Moreno-Gama allegedly carried documents discussing potential risks that AI poses to humanity, with a section titled: 'Some more words on the matter of our impending extinction.' "'I'm grateful that Mr Altman, his family, and his employees were uninjured in these attacks and are safe,' San Francisco District Attorney Brooke Jenkins said at a Monday press conference on the state cha...

Drive By [⁠●⁠_⁠_⁠●]

"OpenAI CEO Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property, The Standard has learned. "Neither OpenAI nor the SFPD responded to The Standard ’s request for further comment. "According to an initial San Francisco Police Department report, on Sunday at 1:40 a.m., a Honda sedan with two people inside stopped in front of Altman’s property, which stretches from Chestnut Street to Lombard Street, after having passed it a few minutes before. "The person in the passenger seat then put their hand out the window and appeared to have fired a round on the Lombard Street side of the property, according to a police report on the incident, which cited surveillance footage and the compound’s security who believe they heard a gunshot."

Warning, Will Robertson! ✨

"Anthropic should: Analyze CLAUDE.md for violations of safety guidelines. "Claude Code should scan CLAUDE.md before every session, flagging instructions that would otherwise trigger a refusal if attempted directly within a prompt. If a request would be refused in a chat interface, then it stands to reason that it should also be refused if it arrives via CLAUDE.md. "Alert when violations are found. When Claude detects instructions that appear to violate its safety guardrails, it should present a warning and allow the developer to review the file before taking any actions. " Developers should: Treat CLAUDE.md as executable code, not documentation . "This means access controls, peer reviews, and heightened security scrutiny —just like code. A single line can cause massive downstream impacts in an autonomous agent."

Completely Neural Computers (CNC)

"Neural computers point toward a machine form in which a single latent runtime state acts as the computer itself, driving pixels, text, and actions while subsuming what operating systems and interfaces handle today. [pdf] "In this paper, the main result is that NCs have begun to exhibit early runtime primitives —most notably I/O alignment and short-horizon control —while stable reuse, symbolic reliability, and runtime governance remain unresolved.  "Our CNC capability map remains useful as a longer-horizon view, spanning efficiency, computation & reasoning, memory & storage, I/O & control, tool bridges, condition-driven generalization, programmability, and artifact generation.  "The map is staged and dependency-informed, but the more immediate gap is still the gap from prototype behavior to usable runtime behavior.  "Progress toward CNCs will therefore depend not only on stronger models, but also on whether reuse, consistency, and governance become...

Internet Archive endangered

"[Mark] Graham said the news publishers’ rationale for blocking the archive from crawling their sites is unfounded . "The institution has taken steps to prevent or limit AI companies and automated systems from accessing or copying the data in its archives en masse, he said. "He said it limits the rate at which material can be downloaded or accessed from its site, and for certain websites —such as The New York Times —it blocks or prevents the bulk downloading of materials. "In response to input from publishers, it has evolved its systems for protecting their material, he said. "'This is an ongoing effort,' he said. 'It’s not a once-and-done kind of thing.' "Archive representatives, including Graham, see the institution as a kind of digital library and argue that it plays an essential role in preserving and maintaining public access to information on the web.  "With many online publishers having shut down or modified their sites, many ...

AI horror stories

"The companies tell us these stories because they assume it makes their technology look more powerful. But if an AI actually did have autonomy, it would be far less powerful. "Your language model would clam up from time to time to conserve its resources. And when it did talk, it wouldn’t have the linguistic flexibility that makes these tools so useful; it would have its own style tied to a personality constrained by its own organization.  "It would have moods, concerns, interests. Maybe, like a tech CEO, it would want to take over the world, or maybe, like a boring neighbor, it would only want to talk about the weather.  "Maybe it would be obsessed with 18th-century coin production. Maybe it would only speak in rhyme. But it wouldn’t happily do your work for you 24 hours a day. Every parent in the world knows what real autonomy looks like. "'When I was teaching autonomous systems at Sussex, I’d always ask my students, Do you really want an autonomous robot?...

Will AI even hit the D-list 🫥

"As Hollywood writers prepare for contract negotiations with major studios, one topic remains front and center: the role of artificial intelligence. "On Friday, the Writers Guild of America released a list of contract demands , which 97% of the union membership supports.  "Though some details have yet to be revealed, many of the union’s asks involve expanding protections over the use and abuse of AI, in addition to improved health coverage and higher residuals. "AI and streaming residuals were central issues in strikes by actors and writers in 2023. "The union [SAG-AFTRA], whose contract expires June 30, is expected to propose what has been called the Tilly tax, a fee that studios would have to pay to the union in exchange for using an AI actor. This demand is in response to the first AI actor, Tilly Norwood , being introduced to Hollywood."

Ghost in the Machine, the documentary

" Ghost [ in the Machine ] is drawing not just positive reviews but also some from people who would really prefer not to have the AI narrative challenged. "It's informative (and entertaining) to see their criticisms. One review is headlined ' Ghost in the Machine is Already Behind the Times,' which is particularly hilarious because the documentary does an amazing job of tracing the historical roots of today's AI ideology. Not just back to the 1956 Dartmouth workshop (and excellent historical footage of McCarthy and Shannon) but also to the connections between the founding of statistics and eugenics. "Historical contextualization does not expire just because tech has moved on to their next marketing strategy. "Veatch’s film is of this moment because it situates the narrative being pushed by the AI bros in both its historical and present context —the latter being coverage of environmental damage and the exploitative labor practices behind AI . ...

Technical whiz

"A new exposé in the New Yorker paints a different portrait, and it’s substantially more vexing. Drawing on interviews with numerous OpenAI insiders who worked with Altman, the article portrays the CEO not as a technical wiz , but as a skilled manipulator —and one with a surprisingly shallow grasp of the AI systems his company is building. " According to numerous engineers interviewed for the article, Altman lacks experience in both programming and in machine learning —a shortage of expertise that becomes obvious when the CEO mixes up basic AI terms. "It’s important to note that Altman dropped out of a Stanford computer science program after two years.  "Cast as the chief acolyte of the god of scale  or as a genius of digital tech , he enjoys a kind of cult credibility that lets him slip out of tight spots that might ensnare lesser entrepreneurs. "Former OpenAI researcher Carroll Wainwright, speaking to the New Yorker , put it plainly: 'He sets up structure...

Health care innovation

"Artificial intelligence has arrived in the field of mental health. Large health systems and independent therapists alike have begun to adopt different AI tools to manage the delivery of mental health treatment. "The speed of the adoption —alongside disturbing incidents of individuals using general-use AI chatbots with catastrophic consequences —is causing some concern among practitioners and researchers. "'There is a lot of fear and anxiety about AI,' says psychologist Vaile Wright, senior director of health care innovation at the American Psychological Association (APA). 'And in particular fear around AI replacing jobs.' "Those concerns were a key issue last month, when 2,400 mental health care providers for Kaiser Permanente in Northern California and the Central Valley went on a 24-hour strike."