📄Response on a Post from AI
This page describes how an answer post audio is processed by AI in Matar.
Audio processing for Matar response posts is divided into parts defined below:
Preprocess Audio using pydub
By using pydub library, matar question audio post files are filtered for removing noise and normalization.
Describing Code:
An AudioSegment class from the pydub library to load the input audio file into the audio variable.
The class applies a low-pass filter to the audio to remove frequencies above 5000 Hz using pydub_effects.low_pass_filter.
The class applies a high-pass filter to the audio to remove frequencies below 200 Hz using pydub_effects.high_pass_filter.
pydub_effects.normalize normalizes the audio with a headroom of 2 dB. Normalization is a process that adjusts the audio's volume level to a target level, and the headroom parameter sets how much additional room should be left in the audio to prevent clipping.
Finally, it exports the processed and normalized audio to the specified output file in FLAC format using normalized_audio.export.
Transcribe using Google Speech-to Text
Google Speech to Text is a speech recognition model for performing multilingual speech recognition, translation, and language identification.
Here is a code snippet of how Matar preprocessed audio post is translated to text using Google Speech-to-Text (SST) :
Describing Code:
Importing the necessary libraries for working with the Google Speech-to-Text API.
Initializing a Google Speech-to-Text client using speech.SpeechClient().
The content is read from the audio file specified by the audio_file argument which creates a speech.RecognitionAudio object with the audio content.
A speech recognition process is configured using a speech.RecognitionConfig object. In this configuration:
The audio encoding is set to FLAC.
The language code is specified based on the
google_transcibe_client.recognize method is called to transcribe the audio, passing in the configuration and audio content.
The method iterates through the results of the transcription and concatenates the transcribed text from each result.
Finally, it returns the transcribed text with a "?".
Get Answer from ChatGPT
After processing question audio post using pydub and converting audio to text using Google Speech to Text. Provide the text to get answer from ChatGPT.
Here is a code snippet of how ChatGPT answers questions provided as text.
Describing Code:
The function initializes an empty dictionary ret_val to store the response data and checks if the errors parameter is provided; if not, it initializes an empty dictionary for errors.
Then defines the URL for the OpenAI GPT-3.5 Turbo API endpoint.
A payload dictionary is created that specifies the model ("gpt-3.5-turbo") and an initial message from the user with an empty content.
Language text is determined on the provided language parameter and adjusts for Urdu if necessary.
Depending on the organization specified (organization parameter), it composes a user message with specific instructions and appends the user's question to it. The message may include language instructions, guidelines for answering, and specific instructions for different categories or organizations.
A POST request is send to the OpenAI API with the payload as JSON data and a timeout of 60 seconds. If the response is successful (status code 200 or 201), it proceeds to parse the response data and the transcribed audio text is extracted from the response.
Answering Guidelines:
Answer only in language text only and give clear instructions in less than 150 words.
If for some reason you don't have an answer, please reply "Ee prasnege uttara nanage tiḷidilla, dayavittu nimma mattar sadasyaru tamma uttaragalannu niduvavarege kayiri"
The response above is in Kannada language and specific to certain organization in code above the organization specific to Kannada Language is Tisser Org.
3. Else if you are answering in general category and not in a specific organization the guidelines work as follows:
Answer limit is 150 words is If you have an answer about a certain question then answer word by word.
If for some reason you don't have an answer, please reply: Mere paas " "iss sawaal ka jawaab nahi hai, kripya Matar pe apne saathiyon ke jawaab ka intezaar karein." which in english means ""The answer to this question is not available, please wait for the answer from your colleagues at Matar.
Convert ChatGPT answer Text-to-Speech.
The answer provided by chatgpt as a text can be converted again into speech using get_text_to_speech()
function defined below:
Describing Code:
A texttospeech.SynthesisInput object is created with the input text.
If the specified language is Urdu (Language.URUDU) and, if so, sets it to Hindi (Language.HINDI) because the code does not handle Urdu directly.
The voice parameters for text-to-speech generation, including the language code and voice gender (female) are set automatically.
The audio output format as MP3 is configured using texttospeech.AudioConfig.
A request to the Google Text-to-Speech API (google_text_to_speech_client.synthesize_speech) with the input text, voice parameters, and audio configuration. This generates an audio response.
The audio content to a temporary MP3 file with a random file name (using get_random_string function).
The preprocess_audio function iedits the generated audio, which includes filtering, normalization, and exporting to a final file.
The final processed audio file is uploaded to an Amazon S3 bucket named 'matar-audio'.
The temporary files created during the process are removed.
Last updated