

网站介绍
WavLM-Base-Plus
Microsoft’s WavLM
The base model pretrained on 16kHz sampled speech audio. When using the model, make sure that your speech input is also sampled at 16kHz.
Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model.
The model was pre-trained on:
- 60,000 hours of Libri-Light
- 10,000 hours of GigaSpeech
- 24,000 hours of VoxPopuli
Paper: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Authors: Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei
Abstract
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. In this paper, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM is built based on the HuBERT framework, with an emphasis on both spoken content modeling and speaker identity preservation. We first equip the Transformer structure with gated relative position bias to improve its capability on recognition tasks. For better speaker discrimination, we propose an utterance mixing training strategy, where additional overlapped utterances are created unsupervisely and incorporated during model training. Lastly, we scale up the training dataset from 60k hours to 94k hours. WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks.
The original model can be found under https://github.com/microsoft/unilm/tree/master/wavlm.
Usage
This is an English pre-trained speech model that has to be fine-tuned on a downstream task like speech recognition or audio classification before it can be
used in inference. The model was pre-trained in English and should therefore perform well only in English. The model has been shown to work well on the SUPERB benchmark.
Note: The model was pre-trained on phonemes rather than characters. This means that one should make sure that the input text is converted to a sequence
of phonemes before fine-tuning.
Speech Recognition
To fine-tune the model for speech recognition, see the official speech recognition example.
Speech Classification
To fine-tune the model for speech classification, see the official audio classification example.
Speaker Verification
TODO
Speaker Diarization
TODO
Contribution
The model was contributed by cywang and patrickvonplaten.
License
The official license can be found here
本站Ai工具导航提供的“microsoft/wavlm-base-plus”来源于网络,不保证外部链接的准确性和完整性,同时,对于该外部链接的指向,不由“Ai工具导航”实际控制,在“2025-10-05 21:03:34”收录时,该网页上的内容,都属于合规合法,后期网页的内容如出现违规,可以直接联系网站管理员进行删除,“Ai工具导航”不承担任何责任。
流量统计
- 7天
- 30天
- 90天
- 365天
猜你喜欢
IMSLP国际乐谱库翻译站点
IMSLP,也被称为国际乐谱图书馆项目或Petrucci音乐图书馆,始于2006年。主页上的标志是大写字母A。它取自第一...媒帮派
媒帮派伪原创工具基于人工智能(AI)大数据深度伪原创算法,使用神经网络算法,在超过100万篇文章中进行自动学习、聚合算法...新榜-编辑器
新榜编辑器,丰富的样式和模板、海量的在线图片搜索,一键同步多平台,还有大量爆文供你参考.New list editor,...Apache ECharts
一个基于 JavaScript 的开源可视化图表库,提供开箱即用的 20 多种图表和十几种组件,并且支持各种图表以及组件...DataEye
DataEye,行业领先的移动营销数据分析工具,提供移动广告投放分析,手游数据分析,广告投放效果监测等服务,为行业提供优...新榜-报告
新榜下设研究机构新榜研究院以新榜数据为基础独立原创的一系列新媒体研究报告A series of independent...模库网
模库网专业优质商业素材免费下载,提供海量免费素材,平面海报,广告海报素材,电商海报,背景海报,电商素材,png素材,背景...Uigreat
应用程序界面截图参考APP interface screenshot reference...thestocks翻译站点
用我们的免费照片讲述你的故事吧!Use our FREE photos to tell your story!...个人简历网
个人简历网提供个人简历模板、简历表格,简历封面等制作资料;以及简历写作指导;提供求职全程指导、面试笔试指南、职场生存指导...书生CG资源站
AK Element 3d Videocopilot ,AEP cinema 4d AE C4D R20 教程插件...AE模板精品站
AE模板精品站,致力于分享众多模板,免费下载AE模板。AE template boutique station, com...
- 关注我们
-
扫一扫二维码关注我们的微信公众号
- 网址推荐
- 热门标签
-
- 游戏(4562)
- 街机游戏合集(4329)
- 街机游戏(4329)
- 在线游戏集合(4329)
- 小霸王游戏(4329)
- 街机在线(4329)
- nes合集游戏(4328)
- 在线小游戏网站(4328)
- 游戏榜(4328)
- 红白机游戏盒(4328)
- GBA(1796)
- 街机(555)
- 动作冒险(400)
- 青檬花园(374)
- 角色扮演(354)
- 小游戏(346)
- 动作(341)
- 汉化(332)
- SFC(328)
- 运动比赛(321)
- 深度导航(309)
- 免费(294)
- 射击(292)
- AIGC导航(277)
- 创意(265)
- 国内精选服务商(255)
- 中文(247)
- 冒险(240)
- 工具达人(239)
- AI写作工具(232)
- 探索发现(221)
- 有趣网站(220)
- 平台(219)
- 摸鱼网站(219)
- 网络创意(219)
- 脑洞网站(219)
- 格斗(212)
- 人工智能(199)
- 视频(198)
- 翻译(187)
- 动漫(161)
- 的(153)
- Video(152)
- 数字人(151)
- 数据分析(145)
- 在线工具(139)
- ppt(138)
- 文生图(134)
- logo(134)
- 网页游戏(130)