Python Voice Assistant. (Update version 0.5 alpha)

 


PYTHON VOICE ASSISTANT

09.04.2023

Shrijan Paudel

-LFG

Overview

It is a simple script based program, i.e you should code the program to do the required task. It is not an AI (Artificial Intelligence) But you can use chat gpt , or google bard in it by using its API.


Simple Draft:


Proposed Plan Of Work

The work started with analyzing the audio commands given by the user through the microphone. This can be anything like getting any information, operating a computer’s internal files, etc. This is an empirical qualitative study, based on reading above mentioned literature and testing their examples. Tests are made by programming according to books and online resources, with the explicit goal to find best practices and a more advanced understanding of Voice Assistant.

Fig 1. shows the workflow of the basic process of the voice assistant. Speech recognition is used to convert the speech input to text. This text is then fed to the central processor which determines the nature of the command and calls the relevant script for execution.

But, the complexities don’t stop there. Even with hundreds of hours of input, other factors can play a huge role in whether or not the software can understand you. Background noise can easily throw a speech recognition device off track. This is because it does not inherently have the ability to distinguish the ambient sounds it “hears” of a dog barking or a helicopter flying overhead, from your voice. Engineers have to program that ability into the device; they conduct data collection of these ambient sounds and “tell” the device to filter them out. Another factor is the way humans naturally shift the pitch of their voice to accommodate for noisy environments; speech recognition systems can be sensitive to these pitch changes.


Methodology of Virtual Assistant Using Python

Fig 2 Detailed workflow

Speech Recognition module

The system uses Google’s online speech recognition system for converting speech input to text. The speech input Users can obtain texts from the special corpora organized on the computer network server at the information center from the microphone is temporarily stored in the system which is then sent to Google cloud for speech recognition. The equivalent text is then received and fed to the central processor.

Python Backend:

The python backend gets the output from the speech recognition module and then identifies whether the command or the speech output is an API Call and Context Extraction. The output is then sent back to the python backend to give the required output to the user.

API calls

API stands for Application Programming Interface. An API is a software intermediary that allows two applications to talk to each other. In other words, an API is a messenger that delivers your request to the provider that you’re requesting it from and then delivers the response back to you.

Content Extraction

Context extraction (CE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most cases, this activity concerns processing human language texts using natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as context extraction TEST RESULTS.

Text-to-speech module

Text-to-Speech (TTS) refers to the ability of computers to read text aloud. A TTS Engine converts written text to a phonemic representation, then converts the phonemic representation to waveforms that can be output as sound. TTS engines with different languages, dialects and specialized vocabularies are available through third-party publishers.


Goals Achived:

1. Voice Activate
2. Website Opening
3.Program Opening
4.Repeat after you
5. Basic conversation
6.Interactive Loop
7.Voice Output
8. Error Handling.


Specifications

Uses (pyttsx3) python library to speak. Uses speech recognition python library to recognize the user instructions . Webbrowser python library to open any website that is stored in its library. Os python library to open the systems programs and other.

Future functions

It is still in development so I expect much more from it . It’s been 2 months of development . The project has improved a lot in comparison to the prototype. The following are some future expectations i have from it:


  • Have an GUI

  • Should be able to detect the user identity

  • Should talk humanly and have some generative artificial emotions in it

  • Should be able to talk back quickly 

  • Should be able to work offline. ( i.e not to open websites or do any web related tasks. But it should have backup functions that prevent it from crashing and create problems restarting it. { if the computer is offline it should redirect the program to the previous snapshot of the saved program that works offline.)

Milestones To be Achived.

I.     Can talk back without any instructions.
II. Can wake up when clapping 2 times.
III. Can run in background.(but have some bugs)
IV. Can send e-mails.(still not accurate)
V. Can inform if any notifications in social medias.(Not accurate, Many bugs)
VI. (OTHER MORE)




Download Source Code V.0.2

Click the Download Button to download the Source Code...





Popular Posts