📄Response on a Post from AI

This page describes how an answer post audio is processed by AI in Matar.

Audio processing for Matar response posts is divided into parts defined below:

Preprocess Audio using pydub

By using pydub library, matar question audio post files are filtered for removing noise and normalization.

def preprocess_audio(input_file, output_file):
    ''' Preprocess audio to remove noise and normalize '''
    audio = AudioSegment.from_file(input_file)
    # normalized_audio = audio.apply_gain(-audio.dBFS)
    audio = pydub_effects.low_pass_filter(audio, 5000)
    audio = pydub_effects.high_pass_filter(audio, 200)
    normalized_audio = pydub_effects.normalize(audio, headroom=2)
    normalized_audio.export(output_file, format="flac")
    return audio.duration_seconds

Describing Code:

An AudioSegment class from the pydub library to load the input audio file into the audio variable.
The class applies a low-pass filter to the audio to remove frequencies above 5000 Hz using pydub_effects.low_pass_filter.
The class applies a high-pass filter to the audio to remove frequencies below 200 Hz using pydub_effects.high_pass_filter.
pydub_effects.normalize normalizes the audio with a headroom of 2 dB. Normalization is a process that adjusts the audio's volume level to a target level, and the headroom parameter sets how much additional room should be left in the audio to prevent clipping.
Finally, it exports the processed and normalized audio to the specified output file in FLAC format using normalized_audio.export.

Transcribe using Google Speech-to Text

Google Speech to Text is a speech recognition model for performing multilingual speech recognition, translation, and language identification.

Here is a code snippet of how Matar preprocessed audio post is translated to text using Google Speech-to-Text (SST) :

def transcribe_audio(audio_file, language=Language.HINDI):
    ''' Transcribe audio using google speech to text api '''
    google_transcibe_client = speech.SpeechClient()
    with io.open(audio_file, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
        # sample_rate_hertz=24000,
        language_code=language
    )
    response = google_transcibe_client.recognize(config=config, audio=audio)
    text = ""
    for result in response.results:
        text += format(result.alternatives[0].transcript)
    return text + "?"

Describing Code:

Importing the necessary libraries for working with the Google Speech-to-Text API.
Initializing a Google Speech-to-Text client using speech.SpeechClient().
The content is read from the audio file specified by the audio_file argument which creates a speech.RecognitionAudio object with the audio content.
A speech recognition process is configured using a speech.RecognitionConfig object. In this configuration:
- The audio encoding is set to FLAC.
- The language code is specified based on the
google_transcibe_client.recognize method is called to transcribe the audio, passing in the configuration and audio content.
The method iterates through the results of the transcription and concatenates the transcribed text from each result.
Finally, it returns the transcribed text with a "?".

Get Answer from ChatGPT

After processing question audio post using pydub and converting audio to text using Google Speech to Text. Provide the text to get answer from ChatGPT.

Here is a code snippet of how ChatGPT answers questions provided as text.

def get_response_from_chatgpt(
    question, source="default", is_audio_text_needed=False,
    level=0, category=None, errors=None, language=Language.HINDI,
    organization=None
):
    try:
        ret_val = {}
        if(not errors):
            errors = {}
        url = "https://api.openai.com/v1/chat/completions"

        
            
        payload = {
            "model": "gpt-3.5-turbo",
            "messages": [
                {
                    "role": "user",
                    "content": ""
                }
            ]
        }

        language_text = Language._text.get(language)

        if(language == Language.URUDU):
            language_text = Language._text.get(Language.HINDI)

        if(organization and organization._id in ["685QVK", "K7QUEV", "XYBT7P"]):  # for Tisser Org
            payload["messages"][0]["content"] = (
                "An artisan in india is asking the below question "
                f"under {category.title if category else 'general'} category. "
                "You have to absolutely ensure to follow the below instructions while answering the question:\n"
                f"1. Answer only in {language_text} and give clear instructions in less than 150 words.\n"
                "2. If for some reason you don't have an answer, please reply: "
                "Ee prasnege uttara nanage tiḷidilla, dayavittu nimma mattar sadasyaru tamma uttaragalannu niduvavarege kayiri.\n"
                f"Here's the question in {language_text}: {question}"
            )
        else:
            payload["messages"][0]["content"] = (
                f"A farmer in india is asking the below question under {category.title if category else 'general'} category. "
                "You have to absolutely ensure to follow the below instructions while answering the question:\n"
                f"1. Answer only in {language_text} and give clear instructions in less than 150 words.\n"
                "2. If the below question is around mandi price, crop price or daily weather information, give the reply word-by-word as : "
                "Agar aap mandi bhaw ya mausam jaankari jaanna chahte hain toh kripya apne Whatsapp pe Bolbhaav "
                "ke number nau teen nau, nau shunya chaar, saat chaar shunya chheh, par jaake mandi bhaw ya mausam type karein, "
                "aage se bhi Mandi bhaw aur mausam ki jaankari bhi aapko Bolbhaav par milegi.\n"
                "3. If for some reason you don't have an answer, please reply: Mere paas "
                "iss sawaal ka jawaab nahi hai, kripya Matar pe apne saathiyon ke jawaab ka intezaar karein.\n"
                f"Here's the question in {language_text}: {question}"
            )

        headers = {
            'Authorization': 'Bearer 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',
            'Content-Type': 'application/json'
        }

        response = requests.request(
            "POST", url, headers=headers,
            data=json.dumps(payload), timeout=60
        )
        if(response):
            print("time taken for gpt: %d(sec)"%(response.elapsed.total_seconds()))
        if(not response or response.status_code not in [200, 201]):
            errors["message"] = f"{source}- error occured in getting gpt response"
            print(errors["message"])
            return

        response_data = json.loads(response.text)

        audio_text = response_data.get("choices")[0].get("message").get("content")

        ret_val["audio_text"] = audio_text
        ret_val["id"] = response_data.get("id")

        ret_val["audio"] = get_text_to_speech(text=audio_text, language=language, errors=errors)

        return ret_val

    except Exception as ex:
        print(str(ex))

Describing Code:

The function initializes an empty dictionary ret_val to store the response data and checks if the errors parameter is provided; if not, it initializes an empty dictionary for errors.
Then defines the URL for the OpenAI GPT-3.5 Turbo API endpoint.
A payload dictionary is created that specifies the model ("gpt-3.5-turbo") and an initial message from the user with an empty content.
Language text is determined on the provided language parameter and adjusts for Urdu if necessary.
Depending on the organization specified (organization parameter), it composes a user message with specific instructions and appends the user's question to it. The message may include language instructions, guidelines for answering, and specific instructions for different categories or organizations.
A POST request is send to the OpenAI API with the payload as JSON data and a timeout of 60 seconds. If the response is successful (status code 200 or 201), it proceeds to parse the response data and the transcribed audio text is extracted from the response.

Answering Guidelines:

Answer only in language text only and give clear instructions in less than 150 words.
If for some reason you don't have an answer, please reply "Ee prasnege uttara nanage tiḷidilla, dayavittu nimma mattar sadasyaru tamma uttaragalannu niduvavarege kayiri"

The response above is in Kannada language and specific to certain organization in code above the organization specific to Kannada Language is Tisser Org.

3. Else if you are answering in general category and not in a specific organization the guidelines work as follows:

Answer limit is 150 words is If you have an answer about a certain question then answer word by word.
If for some reason you don't have an answer, please reply: Mere paas " "iss sawaal ka jawaab nahi hai, kripya Matar pe apne saathiyon ke jawaab ka intezaar karein." which in english means ""The answer to this question is not available, please wait for the answer from your colleagues at Matar.

Convert ChatGPT answer Text-to-Speech.

The answer provided by chatgpt as a text can be converted again into speech using get_text_to_speech() function defined below:

def get_text_to_speech(text, errors=None, language=Language.HINDI):
    try:
        if(not errors):
            errors = {}
        input_text = texttospeech.SynthesisInput(text=text)

        if(language == Language.URUDU):
            language = Language.HINDI
        voice = texttospeech.VoiceSelectionParams(
            language_code=language,
            ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
        )
        audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)

        response = google_text_to_speech_client.synthesize_speech(
            input=input_text, voice=voice, audio_config=audio_config)
        audio_content = response.audio_content

        # Write the response to an audio file
        file_name = get_random_string(n=10) + ".mp3"
        file_name_final = get_random_string(n=10) + ".mp3"
        with open(file_name, "wb") as out:
            out.write(response.audio_content)

        audio_duration = preprocess_audio(file_name, file_name_final)

        S3_CLIENT.upload_fileobj(
            Fileobj=open(file_name_final, "rb"),
            Bucket='matar-audio',
            Key=file_name
        )

        os.remove(file_name)
        os.remove(file_name_final)

        return {
            "url": f"https://matar-audio.s3.ap-south-1.amazonaws.com/{file_name}",
            "duration": audio_duration,
            "language": language
        }

    except Exception as ex:
        errors["message"] = "error in text to speech"
        errors["is_text_to_speech"] = False
        print(str(ex))

Describing Code:

A texttospeech.SynthesisInput object is created with the input text.
If the specified language is Urdu (Language.URUDU) and, if so, sets it to Hindi (Language.HINDI) because the code does not handle Urdu directly.
The voice parameters for text-to-speech generation, including the language code and voice gender (female) are set automatically.
The audio output format as MP3 is configured using texttospeech.AudioConfig.
A request to the Google Text-to-Speech API (google_text_to_speech_client.synthesize_speech) with the input text, voice parameters, and audio configuration. This generates an audio response.
The audio content to a temporary MP3 file with a random file name (using get_random_string function).
The preprocess_audio function iedits the generated audio, which includes filtering, normalization, and exporting to a final file.
The final processed audio file is uploaded to an Amazon S3 bucket named 'matar-audio'.
The temporary files created during the process are removed.

PreviousBackend Code NextPost Sorting

Last updated 5 months ago