|Vol 29 No 1||
A Methodology for Modelling and Interactively Visualising the Human Vocal-tract in 3D Space
M. Barlow, F. Clermont & P. Mokhtari
Sound Separation with a Cochlear Implant and a Hearing Aid in Opposite Ears
P. Blamey, C. James & L. Martin
Coding Wideband Speech at Narrowband Bit Rates
J. Epps & W. Holmes
Rapid Channel Compensation for Speaker Verification in the NIST 2000 Speaker Recognition Evaluation
J. Pelecanos & S. Sridharan
Adaptive Dynamic Range Optimisation of Hearing Aids
L. Martin, P Blamey, C. James, K. Galvin & D. Macfarlane
Prospects for Speech Technology in the Oceania Region
J. B. Millar
A Comparison of Two Acoustic Methods for Forensic Speaker Discrimination
P Rose & F. Clermont
Auditory and F-Pattern Variations in Australian Okay: A Forensic Investigation
Acoustics Australia Information
Australian Acoustical Society Information
Michael Barlow*, Frantz Clermont* and Parham Mokhtariz**
*School of Computer Science, University College, University of NSW
**Electrotechnical Laboratory, Tsukuba, Japan
Vol. 29, No. 1 pp 5-8
ABSTRACT: A system is described for constructing and visualising three-dimensional (3D) images of the human vocal-tract (VT), either from directly-measured articulatory data or from acoustic measurements of the speech waveform. The system comprises the following three major components: (1) a method of inversion for mapping acoustic parameters of speech into VT area-functions, (2) a suite of algorithms which transform the VT area-function into a 3D model of the VT airway, and (3) solutions for immersing the 3D model in an interactive visual environment. The emphasis in all stages of modelling is to achieve a balance between computational simplicity as imposed by the constraint of real-time operation, and visual plausibility of the reconstructed 3D images of the human vocal-tract.
Peter J, Blamey*, Christopher J, James* and Lois EA, Martin**
*Department of Otolaryngology, University of Melbourne
**Bionic Ear Institute
Vol. 29, No. 1 pp 9-12
ABSTRACT: Two experiments were conducted to investigate the perception of speech and noise presented simultaneously to three subjects with impaired hearing in five monaural and binaural conditions. A broadband noise was found to have no effect on speech perception when the two signals were presented to opposite ears. When speech and noise were presented to the same ear(s), speech perception scores on a closed-set test fell from above 95% at high signal-to-noise ratios (SNR) to 71% at an SNR of about -5 dB. When two speech signals were presented simultaneously at equal intensities (0 dB SNR) speech perception scores fell to 75% or lower, regardless of the ear(s) to which the signals were presented. Thus dichotic presentation helped these listeners to separate speech from a broadband noise, but not to separate two simultaneous speech signals produced by different speakers.
J.R. Epps and W.H. Holmes
School of Electrical Engineering and Telecommunications,
The University of New South Wales
Vol. 29, No. 1 pp 13-16
ABSTRACT: The 'muffled' quality of coded speech, which arises from the bandlimiting of speech to 4 kHz, can be reduced either by coding speech with a wider bandwidth or by wideband enhancement of the narrowband coded speech. This paper investigates the limitations of wideband enhancement and possibilities for its improvement. A new wideband coding scheme is proposed that is based on any narrowband coder, but augmented by wideband enhancement plus a few bits per frame of highband information. The scheme thus has a bit rate only slightly greater than that of the narrowband coder. Subjective listening tests show that this scheme can produce wideband speech of significantly higher quality than the narrowband coded speech.
J. Pelecanos and S. Sridharan
Speech Research Lab, RCSAVT
School of Electrical and Electronic Systems Engineering
Qneensland University of Technology
Vol. 29, No. 1 pp 17-20
ABSTRACT: A technique is proposed for rapidly compensating for channel effects of telephone speech for speaker verification. The method is generic and can be applied to both the one and two speaker detection tasks without re-training the separate systems. The technique has the advantages that it can be performed in real time (except for the small initial buffering), it does not suffer from a relatively long settling time such as certain RASTA processing techniques, and in addition, it is computationally efficient to apply. Results of the application of this technique to the NIST 2000 Speaker Recognition Evaluation are reported.
Lois F.A. Martin* **, Peter J, Blamey** ***, Christopher J, James** ***, Karyn L, Galvin* ** and David Macfarlane* **
*Bionic Ear Institute
**CRC for Cochlear Implant and Hearing Aid Innovation
***Department of Otolaryngology, University of Melbourne
Vol. 29, No. 1 pp 21-24
ABSTRACT: ADRO (Adaptive Dynamic Range Optimisation) is a slowly-adapting digital signal processor that controls the output levels of a set of narrow frequency bands so that the levels fall within a specified dynamic range. ADRO is suitable for a variety of applications, including control of a hearing aid. In the case of a hearing aid the output dynamic range is defined by the threshold of hearing (T) and a comfortable level (C) at each frequency for the individual listener. A set of rules is used to control the output levels, with each rule directly addressing a requirement for a functional hearing aid. For example, the audibility rule specifies that the output level should be greater than a fixed level between T and C at least 70% of the time. The discomfort rule specifies that the output level should be below C at least 90% of the time. In this study, open-set sentence perception scores for 15 listeners were compared for ADRO and a linear hearing aid fit. Speech was presented at three levels. ADRO improved scores by 1.9% at 75 dB SPL (NS), 15.9% at 65 dB SPL (p = 0.014) and 36% at 55 dB SPL (p < 0.001).
J. Bruce Millar
Computer Sciences Laboratory
Research School of Information Sciences and Engineering
Australian National University
Vol. 29, No. 1 pp 25-29
ABSTRACT: The development of speech technology in the Oceania region is an issue for Australian speech scientists and technologists. In this paper we examine both the issues that govern the development of speech technology anywhere, the specific opportunities and inhibiting factors of the Oceania region, and the role that Australia, as the largest and most prosperous nation of the region, can have in the process. The necessary scientific resources required to establish both basic and more sophisticated speech technology are reviewed and mapped against the characteristics of the Oceania region. It is concluded that the most productive approach is likely to be one of creative partnership with the many island communities such that technology may be developed in a cost-effective and culturally sensitive manner.
Phil Rose* and Frantz Clermont**
*Phonetics Laboratory, Linguistics Program, Australian National University
**School of Computer Science, University of New South Wales (ADFA)
Vol. 29, No. 1 pp 31-35
ABSTRACT: A pilot forensic-phonetic experiment is described which compares the performance of formant- and cepstrally-based analyses on forensically realistic speech: intonationally varying tokens of the word hello said by six demonstrably similar-sounding speakers in recording sessions separated by at least a year. The two approaches are compared with respect to F-ratios and overall discrimination performance utilising a novel band-selective cepstral analysis. It is shown that at the second diphthongal target in hello the cepstrum-based analysis outperforms the formant analysis by about 5%, compared to its 10% superiority for same-session data.
Jennifer R. Elliott
School of Language Studies
The Australian National University
Vol. 29, No. 1 pp 37-41
ABSTRACT: An understanding of the acoustic properties, as well as the nature of within- and between-speaker variation, of words which occur with high frequency in natural discourse, is of great importance in forensic phonetic analyses. One word which occurs with relatively high frequency in natural discourse, including telephone conversations, which are often a source of data in forensic comparisons, is okay. This paper presents the initial findings of a study of auditory and F-pattern variations in okay in a natural telephone conversation spoken by six male speakers of general Australian English. Seven pre-defined sampling points are measured within each token to determine the most efficient sampling points and formants for distinguishing between-speaker variation from within-speaker-variation in okay. F-ratios at these seven sampling points are calculated as a mean of ratios of between- to within-speaker variation. The greatest F-ratio is shown to be for F4 at voice onset of the second vowel. Forensic implications are discussed.