Latest version: v1.3.2.post1
The information on this page was curated by experts in our Cybersecurity Intelligence Team.
CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic concepts from natural language supervision and enables “Zero-Shot” inference. The model has been extensively evaluated in 26 audio downstream tasks achieving SoTA in several of them including classification, retrieval, and captioning.
No known vulnerabilities found