publications | Junxiang (Jim) Wang 王俊翔

Google Scholar

2025

Under Review
CoRI: Synthesizing Communication of Robot Intent for Physical Human-Robot Interaction

Junxiang Wang, Emek Barış Küçüktabak, Rana Soltani Zarrin, and Zackory Erickson

arXiv preprint arXiv:2505.20537, 2025

Abs arXiv Bib PDF

Clear communication of robot intent fosters transparency and interpretability in physical human-robot interaction (pHRI), particularly during assistive tasks involving direct human-robot contact. We introduce CoRI, a pipeline that automatically generates natural language communication of a robot’s upcoming actions directly from its motion plan and visual perception. Our pipeline first processes the robot’s image view to identify human poses and key environmental features. It then encodes the planned 3D spatial trajectory (including velocity and force) onto this view, visually grounding the path and its dynamics. CoRI queries a vision-language model with this visual representation to interpret the planned action within the visual context before generating concise, user-directed statements, without relying on task-specific information. Results from a user study involving robot-assisted feeding, bathing, and shaving tasks across two different robots indicate that CoRI leads to statistically significant difference in communication clarity compared to a baseline communication strategy. Specifically, CoRI effectively conveys not only the robot’s high-level intentions but also crucial details about its motion and any collaborative user action needed.
@article{wang2025cori, title = {{CoRI}: Synthesizing Communication of Robot Intent for Physical Human-Robot Interaction}, author = {Wang, Junxiang and K{\"u}{\c{c}}{\"u}ktabak, Emek Bar{\i}{\c{s}} and Zarrin, Rana Soltani and Erickson, Zackory}, journal = {arXiv preprint arXiv:2505.20537}, year = {2025}, }
IJHCS
User interaction patterns and breakdowns in conversing with LLM-powered voice assistants

Amama Mahmood, Junxiang Wang, Bingsheng Yao, Dakuo Wang, and Chien-Ming Huang

International Journal of Human-Computer Studies, 2025

Abs DOI arXiv Bib

Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality is changed to voice. In this work, we investigate whether LLMs can enrich VA interactions via an exploratory study with participants (N=20) using a ChatGPT-powered VA for three scenarios (medical self-diagnosis, creative planning, and discussion) with varied constraints, stakes, and objectivity. We observe that LLM-powered VA elicits richer interaction patterns that vary across tasks, showing its versatility. Notably, LLMs absorb the majority of VA intent recognition failures. We additionally discuss the potential of harnessing LLMs for more resilient and fluid user-VA interactions and provide design guidelines for tailoring LLMs for voice assistance.
@article{mahmood2025user, title = {User interaction patterns and breakdowns in conversing with {LLM}-powered voice assistants}, author = {Mahmood, Amama and Wang, Junxiang and Yao, Bingsheng and Wang, Dakuo and Huang, Chien-Ming}, journal = {International Journal of Human-Computer Studies}, volume = {195}, pages = {103406}, year = {2025}, publisher = {Elsevier}, doi = {10.1016/j.ijhcs.2024.103406}, }
ISMR 2025
A digital twin for telesurgery under intermittent communication

Junxiang Wang^*, Juan Antonio Barragan^*, Hisashi Ishida^*, Jingkai Guo, Yu-Chun Ku, and Peter Kazanzides

In 2025 International Symposium on Medical Robotics (ISMR), 2025

Abs DOI arXiv Bib PDF Code

Telesurgery is an effective way to deliver service from expert surgeons to areas without immediate access to specialized resources. However, many of these areas, such as rural districts or battlefields, might be subject to different problems in communication, especially latency and intermittent periods of communication outage. This challenge motivates the use of a digital twin for the surgical system, where a simulation would mirror the robot hardware and surgical environment in the real world. The surgeon would then be able to interact with the digital twin during communication outage, followed by a recovery strategy on the real robot upon reestablishing communication. This paper builds the digital twin for the da Vinci surgical robot, with a buffering and replay strategy that reduces the mean task completion time by 23 % when compared to the baseline, for a peg transfer task subject to intermittent communication outage.
@inproceedings{wang2025digital, title = {A digital twin for telesurgery under intermittent communication}, author = {Wang, Junxiang and Barragan, Juan Antonio and Ishida, Hisashi and Guo, Jingkai and Ku, Yu-Chun and Kazanzides, Peter}, booktitle = {2025 International Symposium on Medical Robotics (ISMR)}, pages = {218--224}, year = {2025}, organization = {IEEE}, doi = {10.1109/ISMR67322.2025.11025988}, }

2024

Under Review
Situated Understanding of Older Adults’ Interactions with Voice Assistants: A Month-long In-home Study

Amama Mahmood, Junxiang Wang, and Chien-Ming Huang

arXiv e-prints, 2024

Abs arXiv Bib

Our work addresses the challenges older adults face with commercial Voice Assistants (VAs), notably in conversation breakdowns and error handling. Traditional methods of collecting user experiences-usage logs and post-hoc interviews-do not fully capture the intricacies of older adults’ interactions with VAs, particularly regarding their reactions to errors. To bridge this gap, we equipped 15 older adults’ homes with smart speakers integrated with custom audio recorders to collect "in-the-wild" audio interaction data for detailed error analysis. Recognizing the conversational limitations of current VAs, our study also explored the capabilities of Large Language Models (LLMs) to handle natural and imperfect text for improving VAs. Midway through our study, we deployed ChatGPT-powered VA to investigate its efficacy for older adults. Our research suggests leveraging vocal and verbal responses combined with LLMs’ contextual capabilities for enhanced error prevention and management in VAs, while proposing design considerations to align VA capabilities with older adults’ expectations.
@article{mahmood2024situated, title = {Situated Understanding of Older Adults' Interactions with Voice Assistants: A Month-long In-home Study}, author = {Mahmood, Amama and Wang, Junxiang and Huang, Chien-Ming}, journal = {arXiv e-prints}, pages = {arXiv--2403}, year = {2024}, }

2023

IROS 2023
Method for robotic motion compensation during PET imaging of mobile subjects

Junxiang Wang, Iulian Iordachita, and Peter Kazanzides

In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023

Abs DOI arXiv Bib

Studies of the human brain during natural activities, such as locomotion, would benefit from the ability to image deep brain structures during these activities. While Positron Emission Tomography (PET) can image these structures, the bulk and weight of current scanners are not compatible with the desire for a wearable device. This has motivated the design of a robotic system to support a PET imaging system around the subject’s head and to move the system to accommodate natural motion. We report here the design and experimental evaluation of a prototype robotic system that senses motion of a subject’s head, using parallel string encoders connected between the robot-supported imaging ring and a helmet worn by the subject. This measurement is used to robotically move the imaging ring (coarse motion correction) and to compensate for residual motion during image reconstruction (fine motion correction). Minimization of latency and measurement error are the key design goals, respectively, for coarse and fine motion correction. The system is evaluated using recorded human head motions during locomotion, with a mock imaging system consisting of lasers and cameras, and is shown to provide an overall system latency of about 80 ms, which is sufficient for coarse motion correction and collision avoidance, as well as a measurement accuracy of about 0.5mm for fine motion correction.
@inproceedings{wang2023method, title = {Method for robotic motion compensation during {PET} imaging of mobile subjects}, author = {Wang, Junxiang and Iordachita, Iulian and Kazanzides, Peter}, booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, pages = {4648--4654}, year = {2023}, organization = {IEEE}, doi = {10.1109/IROS55552.2023.10341444}, }
JMRR
Calibration and evaluation of a motion measurement system for PET imaging studies

Junxiang Wang, Ti Wu, Iulian Iordachita, and Peter Kazanzides

Journal of Medical Robotics Research, 2023

Abs DOI arXiv Bib

Positron Emission Tomography (PET) enables functional imaging of deep brain structures, but the bulk and weight of current systems preclude their use during many natural human activities, such as locomotion. The proposed long-term solution is to construct a robotic system that can support an imaging system surrounding the subject’s head, and then move the system to accommodate natural motion. This requires a system to measure the motion of the head with respect to the imaging ring, for use by both the robotic system and the image reconstruction software. We report here the design, calibration, and experimental evaluation of a parallel string encoder mechanism for sensing this motion. Our results indicate that with kinematic calibration, the measurement system can achieve accuracy within 0.5mm, especially for small motions.
@article{wang2023calibration, title = {Calibration and evaluation of a motion measurement system for {PET} imaging studies}, author = {Wang, Junxiang and Wu, Ti and Iordachita, Iulian and Kazanzides, Peter}, journal = {Journal of Medical Robotics Research}, volume = {8}, number = {01n02}, pages = {2340003}, year = {2023}, publisher = {World Scientific}, doi = {10.1142/S2424905X23400032}, }

2022

ISMR 2022
Evaluation of a motion measurement system for PET imaging studies

Junxiang Wang, Ti Wu, Iulian Iordachita, and Peter Kazanzides

In 2022 International Symposium on Medical Robotics (ISMR), 2022

Best Student Paper Abs DOI arXiv Bib

Awarded to the best paper with a student first-author (4%)

Positron Emission Tomography (PET) enables functional imaging of deep brain structures, but the bulk and weight of current systems preclude their use during many natural human activities, such as locomotion. The proposed long-term solution is to construct a robotic system that can support an imaging system surrounding the subject’s head, and then move the system to accommodate natural motion. This requires a system to measure the motion of the head with respect to the imaging ring, for use by both the robotic system and the image reconstruction software. We report here the design and experimental evaluation of a parallel string encoder mechanism for sensing this motion. Our preliminary results indicate that the measurement system may achieve accuracy within 0.5mm, especially for small motions, with improved accuracy possible through kinematic calibration.
@inproceedings{wang2022evaluation, title = {Evaluation of a motion measurement system for {PET} imaging studies}, author = {Wang, Junxiang and Wu, Ti and Iordachita, Iulian and Kazanzides, Peter}, booktitle = {2022 International Symposium on Medical Robotics (ISMR)}, pages = {1--6}, year = {2022}, organization = {IEEE}, doi = {10.1109/ISMR48347.2022.9807554}, }