文章目录

  • 学习目标:如何使用whisper
  • 学习内容一:whisper 转文字
  • 1.1 使用whisper.load_model()方法下载,加载
  • 1.2 使用实例对文件进行转录
  • 1.3 实战
  • 学习内容二:语者分离(pyannote.audio)pyannote.audio是huggingface开源音色包
  • 第一步:安装依赖
  • 第二步:创建key
  • 第三步:测试pyannote.audio
  • 学习内容三:整合

学习目标:如何使用whisper


学习内容一:whisper 转文字

1.3 实战

建议load_model添加参数

  • download_root:下载的根目录,默认使用~/.cache/whisper transcribe方法添加参数
  • word_timestamps=True
import whisper
import arrow

# 定义模型、音频地址、录音开始时间
def excute(model_name,file_path,start_time):
    model = whisper.load_model(model_name)
    result = model.transcribe(file_path,word_timestamps=True)
    for segment in result["segments"]:
        now = arrow.get(start_time)
        start = now.shift(seconds=segment["start"]).format("YYYY-MM-DD HH:mm:ss")
        end = now.shift(seconds=segment["end"]).format("YYYY-MM-DD HH:mm:ss")
        print("【"+start+"->" +end+"】:"+segment["text"])

if __name__ == '__main__':
    excute("large","/root/autodl-tmp/no/test.mp3","2022-10-24 16:23:00")

学习内容三:整合

这里要借助一个开源代码,用于整合以上两种产生的结果

报错No module named 'pyannote_whisper' 如果你使用使用AutoDL平台,你可以使用学术代理加速

source /etc/network_turbo
git clone https://github.com/yinruiqing/pyannote-whisper.git
cd pyannote-whisper
pip install -r requirements.txt

import os
import whisper
from pyannote.audio import Pipeline
from pyannote_whisper.utils import diarize_text
import concurrent.futures
import subprocess
import torch
print("正在加载声纹模型")
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1",use_auth_token="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")
output_dir = '/root/autodl-tmp/no/out'
print("正在whisper模型")
model = whisper.load_model("large", device="cuda")

# MP3转化为wav
def convert_to_wav(path):
    new_path = ''
    if path[-3:] != 'wav':
        new_path = '.'.join(path.split('.')[:-1]) + '.wav'
        try:
            subprocess.call(['ffmpeg', '-i', path, new_path, '-y', '-an'])
        except:
            return path, 'Error: Could not convert file to .wav'
    else:
        new_path = ''
    return new_path, None


def process_audio(file_path):
    file_path, retmsg = convert_to_wav(file_path)
    print(f"===={file_path}=======")
    asr_result = model.transcribe(file_path, initial_prompt="语音转换")
    pipeline.to(torch.device('cuda'))
    diarization_result = pipeline(file_path, num_speakers=2)
    final_result = diarize_text(asr_result, diarization_result)
    output_file = os.path.join(output_dir, os.path.basename(file_path)[:-4] + '.txt')
    with open(output_file, 'w') as f:
        for seg, spk, sent in final_result:
            line = f'{seg.start:.2f} {seg.end:.2f} {spk} {sent}\n'
            f.write(line)


if not os.path.exists(output_dir):
    os.makedirs(output_dir)

wave_dir = '/root/autodl-tmp/no'

# 获取当前目录下所有wav文件名
wav_files = [os.path.join(wave_dir, file) for file in os.listdir(wave_dir) if file.endswith('.mp3')]

# 处理每个wav文件
# with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
#     executor.map(process_audio, wav_files)
for wav_file in wav_files:
    process_audio(wav_file)
print('处理完成!')

根目录 git