Meta’s Fundamental AI Research (FAIR)

Meta’s Fundamental AI Research (FAIR) team, is releasing several new AI models and tools for researchers to use. They are centered on audio generation, text-to-vision, and watermarking. “By publicly sharing our early research work, we hope to inspire iterations and ultimately help advance AI in a responsible way,” the company said in a press release. 

Meta is releasing a new AI model called JASCO, which is short for Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation. JASCO can take different audio inputs, such as a chord or a beat, to improve the final AI-generated sound. According to a paper from FAIR’s researchers, JASCO lets users adjust features of a generated sound like chords, drums, and  melodies to hone in on the final sound they want all through text. FAIR plans to release the JASCO inference code as part of its AudioCraft AI audio model library under an MIT license and the pre-trained model on a non-commercial Creative Commons license. 

The company will also launch AudioSeal, which adds watermarks to AI-generated speech, one of Meta’s tools to identify content as made with AI. “We believe [AudioSeal] is the first audio watermarking technique designed specifically for the localized detection of AI-generated speech, making it possible to pinpoint AI-generated segments within a longer audio snippet,” the company said in a press release. 

AudioSeal detects AI-generated audio in a longer sound clip. Meta said that more localized detection “allows for faster and more efficient detection” and increases detection speed by 485 times. Unlike other models, AudioSeal will be released with a commercial license. 

FAIR will also release two sizes of its multimodal text model Chameleon to the public under a research-only license. Chameleon 7B and 34B allow users to point the models to tasks requiring visual and textual understanding, such as image captioning.  However, in a press release, Meta said that it will not release the Chameleon image generation model “at this time,” and only the text-related models will be accessible. 

The company will also give researchers access to its multi-token prediction approach that trains language models on multiple future words at once rather than one at a time. It will only be available on a non-commercial and research-only license. 

Comments

Popular posts from this blog

Perplexity

Aphorisms: AI

DeepAI's Austen on China