Automatic speaker recognition: current trends and challenges
by Prof. Tomoko Matsui, The Institute of Statistical Mathematics, Tokyo, Japan
Abstract: Speaker recognition (SR) is a technique to automatically identify or verify a speaker using the speech features extracted from voice data. In the past decades, interest in SR has increased as a technique for biometric authentication and for user friendly interface design. SR research started in the 1930’s with the application of statistical machine learning methods including Gaussian mixture models, factor analysis models, kernel machines and so on. Recently deep neural networks have been successfully utilized. In this talk, the current trends and challenges in SR are introduced as well as some future perspectives.
Bio: She received the Ph.D. degree from the Computer Science Department, Tokyo Institute of Technology, Tokyo, Japan, in 1997. From 1988 to 2002, she was with NTT, where she worked on speaker and speech recognition. From 1998 to 2002, she was with the Spoken Language Translation Research Laboratory, ATR, Kyoto, Japan, as a Senior Researcher and worked on speech recognition. From January to June 2001, she was an Invited Researcher in the Acoustic and Speech Research Department, Bell Laboratories, Murray Hill, NJ, working on finding effective confidence measures for verifying speech recognition results. She is currently a Professor in the Institute of Statistical Mathematics, Tokyo, working on statistical modeling for speech and speaker recognition applications. Prof. Matsui received the paper award of the Institute of Electronics, Information, and Communication Engineers of Japan (IEICE) in 1993.
Proximity-based Federation of Smart Objects: Its Formal Modelling and Application Framework
by Prof. Yuzuru Tanaka, Hokkaido University and National Institute of Informatics
Abstract: In the age of smart phones, IC cards, and IoT, we are surrounded by a huge number of smart objects, i.e., intelligent devices with wireless communication capabilities ranging from peer-to-peer to cellphone communications. Some of them are wearable or in-vehicle ones. However, it is often pointed out both by theoreticians and by practitioners that the lack of a formal computation model and an application framework capable of context modeling and complex application scenario description to cover the application diversity of smart objects and their federations is the main reason why most existing applications essentially still remain within the scope of the three stereotyped scenarios, i.e., the location transparent service continuation and the location, situation-aware service provision, and dynamic federation among smart objects through the Internet, i.e., their web-based federation. The first one focuses on the ubiquity of services, while the second focuses on the context-dependent services. This talk reviews the speaker’s formal modeling of complex application scenarios using autonomic proximity-based federation among smart objects, and application framework to develop complex application scenarios, and tries to open a new vista of smart objects and their federation.
Bio: Yuzuru Tanaka has been a professor emeritus of Hokkaido University (2013- ), the research supervisor of the JST CREST Program on Big Data Applications (2013-2021), an MI research advisor of Research and Services Division of Materials Data and Integrated System (MaDIS) at National Institute of Materials Science (NIMS) (2017- ), and visiting professors of National Institute of Informatics (NII) (2004- ), Institute of Catalysis at Hokkaido University (2017- ), and Hokkaigakuen University (2017- ). He had been a full professor of computer architecture at the Department of Electrical Engineering (1990-2003), then of knowledge media architecture at the Department of Computer Science (2004-2017), Hokkaido University, and the founding director of Meme Media Laboratory (1995-2013), Hokkaido University. He was also a full professor of Digital Library, Graduate School of Informatics, Kyoto University (1998-2000) in parallel. His research areas covered multiprocessor architectures, database schema-design theory, database machine architectures, full text search of document image files, and automatic cut detection in movies and full video search. His current research areas cover meme media architectures, knowledge federation frameworks, proximity-based federation of smart objects, their application to digital libraries, e-Science, clinical trials, materials informatics, and social cyber-physical systems. He worked as a visiting research fellow at IBM T.J. Watson Research Center (1985-1986), an affiliated scientist of FORTH in Crete (2010- ), and a series editor of Springer’s LNAI (lecture Notes in Artificial Intelligence). He has been involved in EU’s FP6 Integrated Project ACGT (Advancing Clinico-Genomic Trials on Cancer), FP7 Best Practice Network Project ASSETS (Advanced Search Services and Enhanced Technological Solutions for the European Digital Library), and FP7 Large Integration Project p-medicine (personalized medicine).
Audio/speech information hiding based on human auditory characteristics
by Prof. Masashi Unoki, the Japan Advanced Institute of Science and Technology (JAIST)
Abstract: Audio information hiding (AIH) has recently been focused on as a state-of-the-art technique enabling copyrights to be protected and defended against attacks and tampering of audio/speech content. This technique has aimed at embedding codes as watermarks to protect copyrights in audio/speech content, which are inaudible to and inseparable by users, and at detecting embedded codes from watermarked signals. It has also aimed at verifying whether it can robustly detect embedded codes from watermarked signals (robust or fragile), whether it can blindly detect embedded codes from watermarked signals (blind or non-blind), whether it can completely restore watermarked signals to the originals by removing embedded codes from them (reversible or irreversible), and whether it can be secure against the publicity of algorithms employed in public or private methods. AIH methods, therefore, must satisfy some of the five following requirements to provide a useful and reliable form of watermarking: (a) inaudibility (inaudible to humans with no sound distortion caused by the embedded data), (b) robustness (not affected when subjected to techniques such as data compression and malicious attacks), (c) blind detectability (high possibility of detecting the embedded data without using the original or reference signal), (d) confidentiality (secure and undetectable concealment of embedded data), and (e) reversibility (removable embedded data from the watermarked signal and/or enable watermarking to be re-edited). In this talk, historical and typical AIH methods are introduced and pointed out drawbacks. Then our proposed methods based on human auditory characteristics (cochlear delay, adaptive phase modulation, singular spectrum analysis with psychoacoustic model, and formant enhancement) are introduced.
Bio: Masashi Unoki received his M.S. and Ph.D. in Information Science from the Japan Advanced Institute of Science and Technology (JAIST) in 1996 and 1999. His main research interests are in auditory motivated signal processing and the modeling of auditory systems. He was a Japan Society for the Promotion of Science (JSPS) research fellow from 1998 to 2001. He was associated with the ATR Human Information Processing Laboratories as a visiting researcher from 1999-2000, and he was a visiting research associate at the Centre for the Neural Basis of Hearing (CNBH) in the Department of Physiology at the University of Cambridge from 2000 to 2001. He has been on the faculty of the School of Information Science at JAIST since 2001 and a full professor. Dr. Unoki received the Sato Prize from the Acoustical Society of Japan (ASJ) in 1999, 2010, and 2013 for Outstanding Papers and Best Paper Award from the Institute of Electronics, Information and Communication Engineers in 2017. Currently, he is an associate editor of Applied Acoustics and an Editor in chief of the ASJ/Acoustical Science and Technology.