First publicly available Japanese AI dialogue...

July 15, 2025

Say hello to J-Moshi, the first publicly available Japanese AI dialogue system that can speak and listen simultaneously
The Higashinaka Lab is developing AI-human dialogue systems designed to work alongside human operators. As part of their research, a guide robot was deployed at Osaka’s NIFREL Aquarium to answer visitors’ questions about marine life. Human operators could step in to provide help with complex questions. Credit: Higashinaka Lab, Nagoya University. Taken at NIFREL Aquarium, Osaka

How do you develop an AI system that perfectly mimics the way humans speak? Researchers at Nagoya University in Japan have taken a significant step forward to achieve this. They have created J-Moshi, the first publicly available AI system specifically designed for Japanese conversational patterns.

J-Moshi captures the natural flow of Japanese conversation, which often has short verbal responses known as “aizuchi” that Japanese speakers use during conversation to show they are actively listening and engaged. Responses such as “Sou desu ne” (that’s right) and “Naruhodo” (I see) are used more often than similar responses in English.

Traditional AI has difficulty using aizuchi because it cannot speak and listen at the same time. This capability is especially important for natural-sounding Japanese AI dialog. Consequently, J-Moshi has become very popular with Japanese speakers who recognize and appreciate its natural conversation patterns.

Say hello to J-Moshi, the first publicly available Japanese AI dialogue system that can speak and listen simultaneously
Prof. Higashinaka (right) and his team are collaborating on developing humanoid robots that combine speech, gestures, and movement to communicate naturally with people. Credit: Higashinaka Lab, Nagoya University

Building a Japanese Moshi model

The development team, led by researchers from the Higashinaka Laboratory at the Graduate School of Informatics, built J-Moshi by adapting the English-language Moshi model created by the non-profit laboratory Kyutai. The process took about four months and involved training the system using multiple Japanese speech datasets. The research is published on the arXiv preprint server.

The biggest dataset was obtained from J-CHAT, the largest publicly available Japanese dialog dataset created and released by the University of Tokyo. It contains approximately 67,000 hours of audio from podcasts and YouTube. Additionally, the team used smaller but higher-quality dialog datasets, some collected within the lab and others dating back 20–30 years. To increase their training data, the researchers also converted written chat conversations into artificial speech with text-to-speech programs they developed for this purpose.

  • Say hello to J-Moshi, the first publicly available Japanese AI dialogue system that can speak and listen simultaneously
    Ph.D. student Atsumoto Ohashi, the main developer of J-Moshi, demonstrates how the AI system mimics natural Japanese conversation patterns. He has been working on the optimization of task-oriented dialogue systems for his Ph.D. Credit: Merle Naidoo, Nagoya University
  • Say hello to J-Moshi, the first publicly available Japanese AI dialogue system that can speak and listen simultaneously
    Ph.D. student Yuki Zenimoto engages with a question-guiding dialogue system that elicits user healthcare information through casual conversation. Credit: Merle Naidoo, Nagoya University

In January 2024, J-Moshi gained significant attention when demonstration videos went viral on social media. Beyond its technical novelty, it has possible practical applications in language learning. For example, helping non-native speakers practice and understand natural Japanese conversation patterns.

The research team is also exploring commercial applications in call centers, health care settings, and customer service. They note that adapting the system to specialized fields or industries is challenging due to the limited availability of Japanese speech data compared to resources available for English.

The research team’s leader, Professor Ryuichiro Higashinaka, brings a unique perspective to academic AI research, having spent 19 years as a corporate researcher at NTT Corporation before joining Nagoya University five years ago.

During his industry tenure, he worked on consumer dialog systems and voice agents, including a project to realize a question-answer function for Shabette Concier, a voice agent service by NTT DOCOMO. To further pursue research on human communication patterns, he set up his own lab at Nagoya University’s Graduate School of Informatics in 2020.

His 20-member lab now tackles challenges that bridge theoretical research and practical applications, from understanding conversational timing in Japanese to deploying AI guides in public spaces like aquariums.

“Technology like J-Moshi can be applied to systems that work with human operators. For example, our guide robots at the NIFREL Aquarium in Osaka can handle routine interactions independently and easily connect visitors to human operators for complex questions or when specialized assistance is needed,” Professor Higashinaka said. “Our work is part of a national Cabinet Office Moonshot Project that aims to improve service quality through advanced AI-human collaboration systems.”

Say hello to J-Moshi, the first publicly available Japanese AI dialogue system that can speak and listen simultaneously
Ph.D. student Sanae Yamashita (left) works on techniques that summarize conversations to help human operators step in when AI dialogue systems need assistance. Researcher Ao Guo (right) focuses on making mobile guidance robots more user-friendly using speech, gestures, and movement. Credit: Merle Naidoo, Nagoya University

Opportunities and challenges for human-robot interactions

Prof. Higashinaka explained the unique challenges facing Japanese AI research: “Japan suffers from a scarcity of speech resources, limiting researchers’ ability to train AI dialog systems. Privacy concerns also need to be considered.”

This data shortage forced creative solutions, such as using computer programs to separate mixed voices in podcast recordings into individual speaker tracks needed for training.

Currently, dialog systems have difficulty with complex social situations, especially when interpersonal relationships and physical environments need to be considered. Visual obstacles such as masks or hats can also impair their performance as important visual cues like facial expressions are covered. Testing at Osaka’s NIFREL Aquarium showed that sometimes the AI cannot handle user questions and needs human operators to intervene and take over the conversation.

While J-Moshi represents a significant achievement in capturing natural Japanese conversational patterns with overlapping speech and aizuchi interjections, these limitations mean it currently needs human backup systems for most practical applications. The researchers are working to enhance these human backup systems to mitigate these challenges.These include methods for dialog summarization and dialog breakdown detection systems that alert operators to potential problems so they can respond quickly.

The lab’s broader research extends beyond J-Moshi and includes multiple methods for human-robot interaction. In collaboration with colleagues working on realistic humanoid robots, they are developing robot systems that coordinate speech, gestures, and movement for natural communication.

These robots, including those manufactured by Unitree Robotics, represent the latest advances in AI in physical form, where dialog systems must navigate not just conversational nuances but also physical presence and spatial awareness. The team regularly showcases their work during university open campus days, where the public can experience how AI dialog systems are evolving firsthand.

Their paper on J-Moshi has been accepted for publication in Interspeech, the largest international conference in the field of speech technology and research. Professor Higashinaka and his team are looking forward to presenting their J-Moshi research in Rotterdam, The Netherlands, in August 2025.

“In the near future, we will witness the emergence of systems capable of collaborating seamlessly with humans through natural speech and gestures. I aspire to create the foundational technologies that will be essential for such a transformative society,” Professor Higashinaka said.

More information:
Atsumoto Ohashi et al, Towards a Japanese Full-duplex Spoken Dialogue System, arXiv (2025). DOI: 10.48550/arxiv.2506.02979

Listen to audio of J-Moshi here: https://nu-dialogue.github.io/j-moshi/

The codebase used for training J-Moshi is available here: https://github.com/nu-dialogue/moshi-finetune

Journal information:
arXiv

Provided by
Nagoya University

Citation:
First publicly available Japanese AI dialogue system can speak and listen simultaneously (2025, July 15)
retrieved 15 July 2025
from https://techxplore.com/news/2025-07-japanese-ai-dialogue-simultaneously.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Frank M

Lorem ipsum amet elit morbi dolor tortor. Vivamus eget mollis nostra ullam corper. Natoque tellus semper taciti nostra primis lectus donec tortor.

Leave a Comment