Discord.pyとGoogle Cloud Text-to-Speech APIで読み上げbotを作る

この記事はNCCアドベントカレンダー23日目の記事です。

サークルのハッカソンでShovelや喋太郎のようなDiscord読み上げBotを作ったのでそのメモを残しておきます。

環境

Windows 10 Home

Python 3.8.6

Discord.py 1.5.1

Google Cloud Text-to-Speech August 24, 2020

環境構築

Pythonの環境構築はそこら中に記事があるので省略します。

Discord.pyの導入

以下のコマンドを実行してインストールしてください。

py -3 -m pip install -U discord.py[voice]

Discord Botの作成

discord.pyのドキュメントにBotの作り方が載っています。 Bot Permissionsは"View Channels"、 "Send Messeages"、 "Read Message History"、"Connect"、"Speak"、"Use Voice Activity"にチェックを入れてください。また、トークンは後で使うのでコピーしておきましょう。

Google Cloud Text-to-Speech APIの設定

この記事に詳しく書いてあります。(環境変数の設定までで大丈夫です)

~~外部の記事に任せすぎ~~

コードを書く

Botを起動させる

import discord

#ここに自分のBotのトークンを貼り付ける
TOKEN = "YOUR_TOKEN" 

#Botのオブジェクトを生成
client ＝discord.Client()

@client.event
async def on_ready():
    print('Login!!!')

client.run(TOKEN)

サーバーにBotが追加されていれば、たった数行のコードを実行するだけでBotがサーバーに接続されました。

ボイスチャンネルに参加させる

ここでは!connectとテキストを打つとその人が参加しているボイスチャンネルに接続するようにします。

import discord
from discord.channel import VoiceChannel

voiceChannel: VoiceChannel 

@client.event
async def on_message(message):
    global voiceChannel

    if message.author.bot:
        return
    if message.content == '!connect':
        voiceChannel = await VoiceChannel.connect(message.author.voice.channel)
        await message.channel.send('読み上げBotが参加しました')

ボイスチャンネルから切断させる

ここでは!disconnectとテキストを打つとその人が参加しているボイスチャンネルに接続するようにします。

import discord
from discord.channel import VoiceChannel

voiceChannel: VoiceChannel 

@client.event
async def on_message(message):
    global voiceChannel

    if message.author.bot:
        return
   if message.content == '!disconnect':
        voiceChannel.stop()
        await message.channel.send('読み上げBotが退出しました')
        await voiceChannel.disconnect()

チャットに入力された文章を再生する

まず、入力されたテキストをSSML(Speech Synthesis Markup Language)に変換します。

import html

def text_to_ssml(text):
    escaped_lines = html.escape(text)
    ssml = "{}".format(
        escaped_lines.replace("\n", '\n<break time="1s"/>')
    )
    return ssml

そして、SSMLをGoogle Cloud Text-to-Speech APIに渡して音声ファイルに変換し、保存します。

from google.cloud import texttospeech

def ssml_to_speech(ssml, file, language_code, gender):
    ttsClient = texttospeech.TextToSpeechClient()
    synthesis_input = texttospeech.SynthesisInput(text=ssml)
    voice = texttospeech.VoiceSelectionParams(
        language_code=language_code, ssml_gender=gender
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = ttsClient.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )
    with open(file, "wb") as out:
        out.write(response.audio_content)
    return file

最後に、保存した音声ファイルを再生することでチャットに入力された文章をボイスチャンネルで読み上げることが出来ます。

import discord
from discord.channel import VoiceChannel
from discord.player import FFmpegPCMAudio

def play_voice(text):
    ssml = text_to_ssml(text)
    file = ssml_to_speech(ssml, "voice.mp3", "ja-JP", texttospeech.SsmlVoiceGender.MALE)
    voiceChannel.play(FFmpegPCMAudio(file))

一連のコードは以下のようになってます。(これまでに紹介したコードを少し改変しています)

import discord
import html
from discord.channel import VoiceChannel
from discord.player import FFmpegPCMAudio
from google.cloud import texttospeech

TOKEN = 'YOUR_TOKEN'
client = discord.Client()

voiceChannel: VoiceChannel 

@client.event
async def on_ready():
    print('Login!!!')

@client.event
@client.event
async def on_message(message):
    global voiceChannel

    if message.author.bot:
        return
    if message.content == '!connect':
        voiceChannel = await VoiceChannel.connect(message.author.voice.channel)
        await message.channel.send('読み上げBotが参加しました')
        return
    elif message.content == '!disconnect':
        voiceChannel.stop()
        await message.channel.send('読み上げBotが退出しました')
        await voiceChannel.disconnect()
        return

    play_voice(message.content)

def text_to_ssml(text):
    escaped_lines = html.escape(text)
    ssml = "{}".format(
        escaped_lines.replace("\n", '\n<break time="1s"/>')
    )
    return ssml

def ssml_to_speech(ssml, file, language_code, gender):
    ttsClient = texttospeech.TextToSpeechClient()
    synthesis_input = texttospeech.SynthesisInput(text=ssml)
    voice = texttospeech.VoiceSelectionParams(
        language_code=language_code, ssml_gender=gender
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = ttsClient.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )
    with open(file, "wb") as out:
        out.write(response.audio_content)
        print("Audio content written to file " + file)
    return file

def play_voice(text):
    ssml = text_to_ssml(text)
    file = ssml_to_speech(ssml, "voice.mp3", "ja-JP", texttospeech.SsmlVoiceGender.MALE)
    voiceChannel.play(FFmpegPCMAudio(file))

client.run(TOKEN)

これで読み上げbotに関する最低限の機能は実装できました。あとはお好みで色々な機能をつけてみると良いと思います。

ある昼下がりの一室

色々と書いたり書かなかったりします