More Radar Trends by Mike Loukides

Artificial Intelligence

  • The OpenGPT-X project has released its open large language model, Teuken-7B. This model is significant because it supports 24 European languages and is designed to be compliant with European law. It is available on Hugging Face.
  • OLMo 2 is a newly released, fully open, small language model that comes in 7B and 13B sizes. Both versions claim the best performance in their group.
  • NVIDIA has announced Fugatto, a new generative text-to-audio model that can create completely new kinds of sounds. They position it as a tool for creators.
  • Anthropic has announced the developer preview of its Model Context Protocol. MCP allows Claude Desktop to communicate securely with other resources. The MCP server limits the services that are exposed to Claude, filters Claude’s requests, and prevents data from being exposed over the internet.
  • OpenScholar is an open source language model designed to support scientific research. It’s significantly more accurate than GPT-4o and more economical to run. It uses RAG to access a large database of open-access scientific papers, which ensures that citations are accurate.
  • Meta has partnered with VSParticle to create new materials from instructions generated by AI. They are focusing on nanoporous materials, which could be catalysts for breaking down CO2 into useful products.
  • Perplexity has introduced in-app shopping: Users can search for something, then have Perplexity buy it. It’s the first widely available example of an AI agent that changes the state of the physical world.
  • Research has shown that generative AI models have their own distinctive styles, not unlike human writers. Stylistic analysis can identify the source of a text to the model that generated it.
  • Mistral has released Pixtral Large, a 124B parameter multimodal model with benchmark performance on a par with the latest versions of other frontier models.
  • Mozilla’s Common Voice project collects speech samples in languages other than Anglo-American English to help developers build voice-enabled applications using other languages and dialects. The project is open source.
  • Mechanistic interpretability is a research area that uses AI to examine what’s happening within each layer of a large language model. It provides a path toward AI interpretability: the ability to understand why an AI produces any output that it generates, and possibly to control that output.
  • Google’s Pixel phones will be able to monitor phone conversations to detect scams in real time. Processing takes place entirely on the phone. The feature is off by default and can be enabled on a per-call basis. Another new feature detects stalkerware, apps that collect data without the user’s consent or knowledge.
  • The Common Corpus dataset for training large language models is now open and available on Hugging Face. The dataset contains over 2T tokens taken from “permissibly licensed” sources, and it documents the provenance of every source.
  • OpenAI’s newest model, Orion, is an improvement over GPT-4. But is it a significant improvement? Apparently not. This may be the end of the road for improving LLMs by making them larger. (And is Orion GPT-5?)
  • FrontierMath is a new AI benchmark that is based on very tough mathematical problems. At this point, no language model scores higher than 2% (Gemini 1.5 Pro).
  • Separating the instruments in a musical performance is tough, but it’s possible. Here’s an AI-free masterpiece of signal processing that attempts to do so. Can we turn a performance back into sheet music?
  • Standard Intelligence has released hertz-dev, a new model for real-time voice synthesis. It was trained purely on audio and can participate in unscripted conversations without the use of text.
  • Microsoft’s Magentic-One is a generalist agentic system that is capable of performing complex tasks. Magentic-One is open source for researchers and developers. Microsoft has also released AutoGenBench, an open source tool for evaluating the performance of agentic systems.
  • ChainForge is a new visual tool for prompt engineering. It can be used to test prompts against multiple models and evaluate the quality of the response.
  • AI was used to de-age Tom Hanks and Robin Wright in a new film, allowing the actors to play their characters across a 60-year time span.
  • Anthropic has released Claude 3.5 Haiku, a new version of its smallest and fastest model. The company claims that its performance on many benchmarks is superior to Claude 3 Opus, its previous leading model. Anthropic has also significantly increased the price for using Haiku.
  • OpenAI has introduced predicted outputs. If the output to a prompt is largely known ahead of time—for example, if you’re asking GPT to modify a file—you can upload the expected result with the prompt, and GPT will make the changes necessary. Predicted outputs reduce latency; apparently they don’t reduce cost.
  • Fortunately, AI Psychiatry has nothing to do with psychoanalyzing human patients. It’s a forensic tool for postmortem analysis of AI failures that allows investigators to recover the exact model that was in use when the failure occurred.
  • SmolLM2 is a new small language model, designed for running on devices. It comes in 135M, 360M, and 1.7B parameter versions. Early reports say that its performance is impressive.
  • vLLM is a framework for serving LLMs. It works with most of the language models on Hugging Face. Not only does it claim to be simpler, but it also claims to have significant performance and cost benefits by using a key-value store to cache input tokens.
  • AI Flame Graphs show developers what their models are doing in detail. If you’re concerned about performance or energy use, they are revolutionary.
  • Google’s Project Jarvis is reported to be the company’s answer to Anthropic’s computer use API. Jarvis takes over a browser (presumably Chrome) to perform tasks on behalf of the user.
  • NotebookLM’s ability to generate a podcast from documents is impressive. Can other models do the same thing? NotebookLlama is an open source project that generates podcasts using the Llama models. 

Comments

Popular posts from this blog

Perplexity

Hamza Chaudhry