ATMS

Intelligent systems applied to medicine

Speech recognition,
Speech analysis,
Speech Denunciation,
Speech Algorithms for Hearing Prosthetics,
Speaker Recognition
Document analysis and recognition
Visual detection of objects
Audio-visual recognition
Face Recognition

Integration of visual information with acoustic information for exploitation for recognition Bimodal automatic speech or speaker remains a scientific subject that has been the subject of several studies for years. If this exploitation is very attractive, the problematic it raises is, however, far from being trivial. First, there is the question of the level of integration: is it the level of data or the level of results? Then there are the phenomena of time lag between the auditory realization and the visual realization of a phoneme for example. Then comes the problem of adaptation of the contributions of the acoustic and visual modalities according to their relative reliability. Finally, there is the question of the relevance of the use, for the processing of the visual speech signal.

The issues related to this field are still numerous and current research focuses on the exploration and presentation of new techniques to analyze information related to the presence of a human in a digital audiovisual recording. The related application areas obviously concern security (identification of people, video surveillance, secure access systems, home monitoring), media (time control, automatic indexing) and the digital entertainment industry (devices "smart" photos for example) but also assistance to people in distress or human-machine communication.

In the literature available in this field, most approaches treat the problem by dividing it into two sub-problems of synthesis. The first is the synthesis of acoustic speech and the other is the corresponding facial animation generation. But, that does not guarantee a perfect synchronization and coherence of audio-visual speech. Several researchers have attempted to overcome this disadvantage by proposing acoustic-visual speech synthesis approaches based on the natural selection of bimodal synchronous units. The main idea behind these synthetic techniques is to keep the natural association between the acoustic modality and visual intact.

In addition, other major topics in the literature focus on the analysis of the face, speaker and information gestures. In addition, the problem of acquisition of audio-visual corpora and the preparation of the database for reconnaissance systems is still a hot topic. In fact, the different aspects of the selection of units bimodals that need to be optimized for a good synthesis will need to be well detailed in order to, appropriately reach the goal of synthesizing speech dynamics reasonably well and thus designing systems better than standard systems based on a single modality.

In addition, a research program within the ATMS research unit on speech signal processing has been engaged since the beginning of the 2000s. Four lines of research have been opened concerning treatment. of the speech signal and in particular the analysis, recognition, synthesis and denoising of the speech signal.

The research themes open up new perspectives, given the considerable difference between manual control and voice control. We have also given a lot of interest to biomedical applications such as vocal audiometry, which is an essential diagnostic for deafness screening as well as for clinical rehabilitation.

In this area of research, we have a significant stage of progress and the required knowledge across the supervision of PFE, new Masters and Theses. Also, various results of work in this area of research have been published in national and international conferences as well as in scientific journals.

The ATMS research unit has also given a lot of importance to the field of document recognition and this is the goal of designing systems to migrate paper to electronic media which represents a great revolution and has brought out new actors and new features. Huge body of documents digitized in its various forms (manuscript, print, graphic, images, documents composites, etc.) are made available for integrated document retrieval systems in libraries digital or virtual. But digitization alone is not enough anymore. It must absolutely cohabit with the development of computer tools to improve the conditions of access and research. This is the subject of the iBook project which aims to define an intelligent system of documentary research useful for better exploitation of archived documents. Indeed the project i-book is the composition of two sub projects entitled: i-Library and i-Bag.

On the other hand, our ATMS research unit investigated the field of image analysis applied to object and surface detection. This finds its applications in the field of video surveillance which has experienced a very strong expansion in recent years. In CCTV applications, the multi-camera aspect begins to play an important role. Not only moving objects must be segmented and tracked, but the machine must be able to recognize the same object that comes out and then reappears in the field of a camera, or passes from the field of a camera to that of another camera.

At the same time, recent research has been launched in the areas of speaker recognition, face recognition and in audiovisual recognition of speech in noisy environments. The work inherent in this research is ongoing with more or less advanced degrees of advancement.