We developed the cutting-edge Seven-app that allows learners to engage in realistic conversations with a virtual companion. Instant feedback is provided as well as guidance on pronunciation, grammar, and vocabulary usage. Imagine having a dedicated language coach, available 24/7, who listens attentively, offers guidance, and fuels your linguistic journey. That's what our Seven app offers – an intelligent solution for those seeking mastery of a new language.
Background and problem statementThere is a growing need for accessible and effective language learning solutions that cater to diverse learner needs, as the world becomes increasingly interconnected. Traditional methods of language instruction often rely on static materials and one-way communication. This can be limiting and less engaging. Moreover, many learners struggle with pronunciation, grammar, and vocabulary usage as they lack real-time feedback.
To address these challenges, our AMD-powered app provides a conversational partner that simulates realistic interactions. Our aim is to provide learners with immersive learning experiences that foster language proficiency. By leveraging AI-powered technology, we focus on personalised guidance, instant feedback, and engaging conversations that simulate natural human interaction.
ImplementationTo this aim, we used a combination of cutting-edge hardware and software technologies. In terms of hardware, we used the powerful AMD Radeon Pro W79000 GPU to support the large language models that we trained and deployed.
In terms of software, we relied on several open-source libraries and frameworks. We used the ROCm 6.0 PyTorch library for building and training our neural networks. Additionally, we utilized several pre-trained language models from publicly available repositories such as Huggingface's Transformers; these models provide state-of-the-art results on a wide range of natural language processing tasks. We fine-tuned them for our specific task of conversational language learning, and we integrated them into our conversational AI agent. At this stage, the objective was to provide an MVP for the English language.
To bring our vision to life, we implemented the following steps:
a) Audio Data Capture: We utilised the sounddevice library to capture audio data from the user's microphone. The learners' voice is captured in real-time, enabling us to analyse their pronunciation and speech patterns.
b) Speech Transcription System: We employed the Whisper model ('base.en') to transcribe recorded audio data into text. This step enables us to process and analyse learners' spoken language, providing insights into their grammar, vocabulary, and pronunciation.
c) Response Generation: The interaction with the learner hinges on the state of the art open-source Llama-3 model. This model generates a response or a question (depending on the specific conversation context) based on the learner's input. This step simulates realistic conversations, by allowing the AI-powered partner to respond accordingly, and also, to take into account the conversation history.
d) Spoken Output Generation: To generate spoken output, we utilised the an source multilingual model. The model's advanced capabilities enables us to simulate natural human-like interactions and also provide learners with a seamless conversation experience.
e) Playing Back Response: Finally, sounddevice was employed to play back the generated response to the users, allowing them to engage in a realistic conversation with their conversational partner.
The app allows for personalised choices with respect to the following aspects: discussion topic, vocabulary, type of learner-'teacher' interaction, AI voice etc.
HardwareOur application leverages the AMD Radeon Pro W79000 GPU, integrating it with a Ryzen 9 processor and sufficient RAM and storage in a local workstation. This potent combination enables us to train and deploy our conversational AI agent efficiently on a local machine. As a result, we're able to deliver language learning experiences that captivate and educate.
With AMD's innovative AI hardware at its core, our application is poised to disrupt the status quo in language learning: we provide learners with a personalised, conversational partner that adapts to their unique needs, learns from their strengths and weaknesses, and provides real-time feedback.
User experience
From the users' perspective, interacting with AMD Seven is straightforward and engaging. Here's how they can use the app:
1. Initial Setup: Users begin by setting up their language learning preferences, including selecting the target language, content area, topic, type of AI-interaction, and proficiency level.
2. Conversational Partner Activation: Upon activation, Seven is launched, allowing learners to initiate conversations with the virtual companion.
For this specific stage of the project (i.e., Minimal MVP) we considered the task of practicing speaking skills in preparation for the IELTS exam:
3. Real-time Conversations: Learners engage in real-time conversations with the AI-powered partner, practicing their speaking skills within a welcoming and comfortable virtual environment.
4. Instant Feedback: As learners converse, they receive instant feedback on their pronunciation, grammar, and vocabulary usage, allowing them to refine their language skills.
5. Personalised Guidance: The app provides personalised guidance based on learners' strengths, weaknesses, and learning goals, ensuring that each interaction is tailored to their unique needs and interests.
Future steps
To further upgrade the Seven-app, we plan to:
1. Further fine-tune Whisper for other languages of choice. Namely, we are interested in improving voice transcription by fine-tuning the Whisper model for specific languages.
2. Personalisation based on content and proficiency level. We plan to further enhance the app's capabilities such that it is tailored to each learner's needs and interests. Developing personalised learning paths that cater to learners' interests, goals, and skill levels is an important feature to enhance our competitive distinction.
3. Extension towards improving writing and reading skills. We will expand the app's capabilities to include writing and reading comprehension exercises, fostering a comprehensive language learning experience.
4. Feature development based on scientific evidence. We plan to focus and implement innovative features that are grounded in scientific evidence, leading to changes in users' behaviour that support consistent learning.
5. Enhanced personalisation using predictive modelling. Learners want personalisation. We will leverage state-of-the-art predictive modelling techniques to provide learners with highly personalised recommendations and guidance. We plan to take a computational, data driven unbiased highly personalised approach to efficiently prevent 'dropping-out' of language learning.
We want to empower learners to achieve their goals. By pursuing these future steps, we create an even more engaging, effective, and comprehensive language learning platform.
Comments