Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 35
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

Poznan Supercomputing and Networking Center (PSNC) developed an ambisonic installation and workflow as part of audio-visual 8K VR 360° immersive media experiments. This work aimed to investigate the quality of performance of the PSNC setup through both subjective tests as well as simulations providing objective parameters of interaural characteristics in a real-life scenario of PSNC studio. For the objective part, an algorithm for angle estimation has been proposed and computations were performed.
Go to article

Authors and Affiliations

Marcin Dąbrowski
1
Jan Skorupa
1
Wojciech Raszewski
1
Maciej Głowiak
1

  1. Institute of Bioorganic Chemistry of the Polish Academy of Sciences Poznan Supercomputing and Networking Center, Poland
Download PDF Download RIS Download Bibtex

Abstract

In this article, an analysis of an innovative system for filtering signals in the audible range (16 Hz - 20 kHz) on programmable logic devices using a filters with a finite impulse response, is presented. Mentioned system was neat combination of software and hardware platform, where in the program layer a multiple programming languages including VHDL, JavaScript, Matlab or HTML were used to create completely useful application. To determine the coefficients of polynomial filters the Matlab Filter Design & Analysis Tool was used. Thanks to the developed graphic layer, a user-friendly interface was created, which allows easily transfer the required coefficients from the computer to the executive system. The practical implementation made on the FPGA platform, specifically on the Altera DE2- 115 development kit with the FPGA Cyclone IV, was compared with simulation realization of Matlab FIR filters. The performed research confirm the effectiveness of filtration in real time with up to 128th order of the filter for both audio channels simultaneously in FPGA-based system.
Go to article

Authors and Affiliations

Adrian Lipowski
1
Paweł Majewski
1
Sławomir Pluta
1

  1. Opole University Technology, Opole, Poland
Download PDF Download RIS Download Bibtex

Abstract

In October 2018, local digital radio was launched to cover the agglomeration of Wroclaw. The implementation of this undertaking required many tests, including qualitative ones, that refer to both music and speech. This paper presents the results of subjective tests based on the evaluation of speech quality of signals recorded at various points in Wroclaw. Measurements were carried out in accordance with the recommendations of the International Telecommunication Union as well as in ordinary acoustic conditions in listeners’ flats. The rating was made for male and female voices. The most important conclusion is that for speech signal assessment in meaning of the quality the test conditions do not influence the obtained results. The other fact confirmed in the experiment was that the receiving place of DAB+ signal in the Single-Frequency Network also does not affect the perceived voice quality.
Go to article

Authors and Affiliations

Stefan Brachmański
1
ORCID: ORCID
Maurycy Kin
1
Patrycja Zemankiewicz
1

  1. Wroclaw University of Science and Technology, Poland
Download PDF Download RIS Download Bibtex

Abstract

This study assessed sound localisation definition in ambisonic systems using two-non-parametric and three parametric decoders, in a two-dimensional format. The sound samples were played back through eight loudspeakers arranged in a circle. The participants compared pairs of sound samples to determine which sample offered a more precise perception of the sound source’s location. The data analysis, using a Bradley-Terry probability mode, revealed that parametric decoders were preferred with a 60–83% probability. Among the parametric decoders, the COMPASS method, which utilizes the Multiple Signal Classification algorithm for sound source direction estimation, received the highest scores for sound localisation judgements.
Go to article

Authors and Affiliations

Jacek Majer
1

  1. Chopin University of Music, Department of Sound Engineering, Chair of Musical Acoustics and Multimedia, Warszawa, Poland
Download PDF Download RIS Download Bibtex

Abstract

In this article some key events concerning founding Polish Section of the Audio Engineering Society were presented. In addition, the history covering International Symposia on Sound Engineering and Mastering was outlined. Also, papers contained in this issue were shortly reviewed.

Go to article

Authors and Affiliations

Bożena Kostek
Marianna Sankiewicz
Download PDF Download RIS Download Bibtex

Abstract

The paper presents a comparative study of music features derived from audio recordings, i.e. the same music pieces but representing different music genres, excerpts performed by different musicians, and songs performed by a musician, whose style evolved over time. Firstly, the origin and the background of the division of music genres were shortly presented. Then, several objective parameters of an audio signal were recalled that have an easy interpretation in the context of perceptual relevance. Within the study parameter values were extracted from music excerpts, gathered and compared to determine to what extent they are similar within the songs of the same performer or samples representing the same piece.

Go to article

Authors and Affiliations

Aleksandra Dorochowicz
Bożena Kostek

Abstract

The 16th International Symposium on Sound Engineering and Tonmeistering (ISSET) organized by the Institute of Radioelectronics and Multimedia Technology (Warsaw University of Technology), Department of Sound Engineering (Fryderyk Chopin University of Music) and the Polish Radio, under auspicious of the Polish Section of the Audio Engineering Society was held in Warsaw on October 8-10 in 2015. The main topics of the Symposium covered mostly all domains of audio engineering, i.e. musical acoustics, noise control, signal processing, room acoustics, radio and television, multimedia, sound engineering and tonmeistering, perception and quality assessment, and many others. The extra attention has been paid for the problems of loudness of audio programs in radio and TV broadcasting. Over 60 people from different branches of audio technology participated in this Symposium and shared their knowledge and experiences during the paper sessions, technical tours, workshops and special presentations. The selection of abstracts of the papers presented at the ISSET’2015 are inserted below.
Go to article
Download PDF Download RIS Download Bibtex

Abstract

Recently, the rapid advancement of the IT industry has resulted in significant changes in audio-system configurations; particularly, the audio over internet protocol (AoIP) network-based audio-transmission technology has received favourable evaluations in this field. Applying the AoIP in a certain section of the multiple-cable zone is advantageous because the installation cost is lower than that for the existing systems, and the original sound is transmitted without any distortion. The existing AoIP-based technology, however, cannot control the audio-signal characteristics of every device and can only transmit multiple audio signals through a network. In this paper, the proposed Audio Network & Control Hierarchy Over peer-to-peer (Anchor) system enables all audio equipment to send and receive signals via a data network, and the receiving device can mix the signals of different IPs. Accordingly, it was possible to improve the system-application flexibility by simplifying the audio-system configuration. The research results confirmed that the received audio signals from different IPs were received, mixed, and output without errors. It is expected that Anchor will become a standard for audio-network protocols.

Go to article

Authors and Affiliations

Jaeho Lee
Hyoungjoon Jeon
Pyungho Choi
Soonchul Kwon
Seunghyun Lee
Download PDF Download RIS Download Bibtex

Abstract

This study investigates listeners’ perceptual responses in audio-visual interactions concerning binaural spatial audio. Audio stimuli are coupled with or without visual cues to the listeners. The subjective test participants are tasked to indicate the direction of the incoming sound while listening to the audio stimulus via loudspeakers or headphones with the head-related transfer function (HRTF) plugin. First, the methodology assumptions and the experimental setup are described to the participants. Then, the results are presented and analysed using statistical methods. The results indicate that the headphone trials showed much higher perceptual ambiguity for the listeners than when the sound is delivered via loudspeakers. The influence of the visual modality dominates the audio-visual evaluation when loudspeaker playback is employed. Moreover, when the visual stimulus is present, the headphone playback pattern of behavior is not always in response to the loudspeaker playback.
Go to article

Authors and Affiliations

Bartłomiej Mróz
1 2
Bożena Kostek
2

  1. Multimedia Systems Department, Gdansk, Poland
  2. Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk, Poland
Download PDF Download RIS Download Bibtex

Abstract

In the early days, consumption of multimedia content related with audio signals was only possible in a stationary manner. The music player was located at home, with a necessary physical drive. An alternative way for an individual was to attend a live performance at a concert hall or host a private concert at home. To sum up, audio-visual effects were only reserved for a narrow group of recipients. Today, thanks to portable players, vision and sound is at last available for everyone. Finally, thanks to multimedia streaming platforms, every music piece or video, e.g. from one’s favourite artist or band, can be viewed anytime and everywhere. The background or status of an individual is no longer an issue. Each person who is connected to the global network can have access to the same resources. This paper is focused on the consumption of multimedia content using mobile devices. It describes a year to year user case study carried out between 2015 and 2019, and describes the development of current trends related with the expectations of modern users. The goal of this study is to aid policymakers, as well as providers, when it comes to designing and evaluating systems and services.

Go to article

Authors and Affiliations

Przemysław Falkowski-Gilski
Download PDF Download RIS Download Bibtex

Abstract

The paper presents the results of research and analysis of voice data transmission quality in IP packet networks. It analyses mechanisms allowing for the assessment of packet telephony data transmission quality. Possible transmission quality levels and adequate quality metrics, applicable in the recommendations of standardisation organisations, as well as suggested limit values conditioning acceptable voice data transmission quality were indicated and discussed. A packet network model was designed and tested, taking into account VoIP architecture supporting various audio codecs used for voice compression. Transmission mechanisms based on audio codecs G.711, G.723, G.726, G.728 and G.729 were investigated. It was shown that for delay-sensitive traffic which fluctuates beyond its nominal rate, selected codecs have an advantage over others and allow for better transmission quality of VoIP traffic with guaranteed bandwidth and delay.
Go to article

Bibliography

[1] S. K. Puspita FM and S. Z. Taib BM, “Improved models of internet charging scheme of single bottleneck link in multi qos networks,” 2013. [Online]. Available: http://ddms.usim.edu.my:80/jspui/handle/123456789/15429
[2] A. R. Modarressi and S. Mohan, “Control and management in next-generation networks: challenges and opportunities,” IEEE Communications Magazine, vol. 38, no. 10, pp. 94–102, 2000. [Online]. Available: https://doi.org/10.1109/35.874976
[3] D. Strzęciwilk, K. Ptaszek, P. Hoser, and I. Antoniku, “A research on the impact of encryption algorithms on the quality of vpn tunnels’ transmission,” in ITM Web of Conferences, vol. 21. EDP Sciences, 2018, p. 00011. [Online]. Available: https://doi.org/10.1051/itmconf/ 20182100011
[4] H. J. Kim and S. G. Choi, “A study on a qos/qoe correlation model for qoe evaluation on iptv service,” in 2010 The 12th International Conference on Advanced Communication Technology (ICACT), vol. 2. IEEE, 2010, pp. 1377–1382.
[5] D. Strzęciwilk, “Examination of transmission quality in the ip multiprotocol label switching corporate networks,” International Journal of Electronics and Telecommunications, vol. 58, pp. 267–272, 2012. [Online]. Available: http://doi.org/10.2478/v10177-012-0037-z
[6] A. J. Estepa, R. Estepa, J. M. Vozmediano, and P. Carrillo, “Dynamic voip codec selection on smartphones,” Netw. Protoc. Algorithms, vol. 6, no. 2, pp. 22–37, 2014. [Online]. Available: https://doi.org/10.5296/npa.v6i2.5370
[7] W. M. Zuberek and D. Strzeciwilk, “Modeling traffic shaping and traffic policing in packet-switched networks,” Journal of Computer Sciences and Applications, vol. 6, no. 2, pp. 75–81, 2018. [Online]. Available: http://pubs.sciepub.com/jcsa/6/2/4
[8] D. Cohen, “Specifications for the network voice protocol,” UNIVERSITY OF SOUTHERN CALIFORNIA MARINA DEL REY INFORMATION SCIENCES INST, Tech. Rep., 1976. [Online]. Available: https://www.rfc-editor.org/info/rfc741
[9] J. Davidson, J. Peters, J. Peters, and B. Gracely, Voice over IP fundamentals. Cisco press, 2000. [10] S. Ganguly and S. Bhatnagar, VoIP: wireless, P2P and new enterprise voice over IP. John Wiley & Sons, 2008.
[11] B. Hartpence, Packet Guide to Voice over IP: A system administrator’s guide to VoIP technologies. " O’Reilly Media, Inc.", 2013.
[12] S. Deering and R. Hinden, “Rfc2460: Internet protocol, version 6 (ipv6) specification,” 1998.
[13] K. Ramakrishnan, S. Floyd, and D. Black, “Rfc3168: The addition of explicit congestion notification (ecn) to ip,” 2001.
[14] K. Nicholas, “Definition of the differentiated services field in the ipv4 and ipv6 headers,” RFC 2474, 1998.
[15] F. Baker, J. Polk, and M. Dolly, “A differentiated services code point (dscp) for capacity-admitted traffic,” Internet Engineering Task Force (IETF), 2010.
[16] D. Strzęciwilk, R. Nafkha, and R. Zawi´slak, “Performance analysis of a qos system with wfq queuing using temporal petri nets,” in International Conference on Computer Information Systems and Industrial Management. Springer, 2021, pp. 462–476. [Online]. Available: https://doi.org/10.1007/978-3-030-84340-3_38 [17] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, “An architecture for differentiated services,” 1998.
[18] D. C. Dowden, R. D. Gitlin, and R. L. Martin, “Next-generation networks,” Bell Labs technical journal, vol. 3, no. 4, pp. 3–14, 1998. [Online]. Available: https://doi.org/10.1002/bltj.2125
[19] G. R. Ash, Traffic engineering and QoS optimization of integrated voice and data networks. Elsevier, 2006.
[20] M. H. Miraz, S. A. Molvi, M. A. Ganie, M. Ali, and A. H. Hussein, “Simulation and analysis of quality of service (qos) parameters of voice over ip (voip) traffic through heterogeneous networks,” arXiv preprint arXiv:1708.01572, 2017. [Online]. Available: https://arxiv.org/abs/1708.01572
[21] E. T. Affonso, R. D. Nunes, R. L. Rosa, G. F. Pivaro, and D. Z. Rodriguez, “Speech quality assessment in wireless voip communication using deep belief network,” IEEE Access, vol. 6, pp. 77 022–77 032, 2018. [Online]. Available: https://doi.org/10.1109/ACCESS.2018.2871072
[22] J. Yu and I. Al-Ajarmeh, “Call admission control and traffic engineering of voip,” in 2007 Second International Conference on Digital Telecommunications (ICDT’07). IEEE, 2007, pp. 11–11.
[23] T. ITU, “Recommendation g. 114, one-way transmission time,” Series G: Transmission Systems and Media, Digital Systems and Networks, Telecommunication Standardization Sector of ITU, 2000.
[24] J. H. James, B. Chen, and L. Garrison, “Implementing voip: a voice transmission performance progress report,” IEEE Communications Magazine, vol. 42, no. 7, pp. 36–41, 2004. [Online]. Available: https://doi.org/10.1109/MCOM.2004.1316528
[25] J. G. Beerends, C. Schmidmer, J. Berger, M. Obermann, R. Ullmann, J. Pomy, and M. Keyhl, “Perceptual objective listening quality assessment (polqa), the third generation itut standard for end-to-end speech quality measurement part i—temporal alignment,” Journal of the Audio Engineering Society, vol. 61, no. 6, pp. 366–384, 2013. [Online]. Available: http://resolver.tudelft.nl/uuid:91d98cbc-d802-40d3-a1bb-a58d67668728
[26] R. D. Nunes, R. L. Rosa, and D. Z. Rodríguez, “Performance improvement of a non-intrusive voice quality metric in lossy networks,” IET Communications, vol. 13, no. 20, pp. 3401–3408, 2019. [Online]. Available: https://doi.org/10.1049/iet-com.2018.5165
[27] B. Naderi and R. Cutler, “An open source implementation of itu-t recommendation p. 808 with validation,” arXiv preprint arXiv:2005.08138, 2020. [Online]. Available: https://arxiv.org/ct?url=https%3A%2F%2Fdx. doi.org%2F10.21437%2FInterspeech.2020-2665&v=69f1738e
[28] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221), vol. 2. IEEE, 2001, pp. 749–752.
[29] S. Voran, “Objective estimation of perceived speech quality. i. development of the measuring normalizing block technique,” IEEE Transactions on speech and audio processing, vol. 7, no. 4, pp. 371–382, 1999. [Online]. Available: https://doi.org/10.1109/89.771259
[30] M. Coto-Jimenez, J. Goddard-Close, L. Di Persia, and H. L. Rufiner, “Hybrid speech enhancement with wiener filters and deep lstm denoising autoencoders,” in 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI). IEEE, 2018, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IWOBI.2018.8464132
[31] L. Ding and R. A. Goubran, “Speech quality prediction in voip using the extended e-model,” in GLOBECOM’03. IEEE Global Telecommunications Conference (IEEE Cat. No. 03CH37489), vol. 7. IEEE, 2003, pp. 3974–3978. [Online]. Available: https://doi.org/10.1109/GLOCOM.2003.1258975
[32] J. A. Bergstra and C. Middelburg, “Itu-t recommendation g. 107: The e-model, a computational model for use in transmission planning,” 2003.
[33] R. Jain, “Quality of experience,” IEEE multimedia, vol. 11, no. 1, pp. 96–95, 2004. [Online]. Available: https://doi.org/10.1109/MMUL.2004.10000
[34] A. Eskandar, M. Syed et al., “Performance analysis of voip over gre tunnel.” International Journal of Computer Network & Information Security, vol. 7, no. 12, 2015. [Online]. Available: http://doi.org/10.5815/ijcnis.2015.12.01
[35] R. S. Ramakrishnan and P. V. Kumar, “Performance analysis of different codecs in voip using sip,” in The Conference on Mobile and Pervasive Computing, 2008, pp. 142–145.
[36] S. Ragot, B. Kovesi, R. Trilling, D. Virette, N. Duc, D. Massaloux, S. Proust, B. Geiser, M. Gartner, S. Schandl et al., “Itu-t g. 729.1: An 8-32 kbit/s scalable coder interoperable with g. 729 for wideband telephony and voice over ip,” in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4. IEEE, 2007, pp. IV–529. [Online]. Available: https://doi.org/10.1109/ICASSP. 2007.366966
Go to article

Authors and Affiliations

Dariusz Strzęciwilk
1

  1. Institute of Information Technology, University of Life Sciences, Warsaw, Poland
Download PDF Download RIS Download Bibtex

Abstract

Audio data compression is used to reduce the transmission bandwidth and storage requirements of audio data. It is the second stage in the audio mastering process with audio equalization being the first stage. Compression algorithms such as BSAC, MP3 and AAC are used as standards in this paper. The challenge faced in audio compression is compressing the signal at low bit rates. The previous algorithms which work well at low bit rates cannot be dominant at higher bit rates and vice-versa. This paper proposes an altered form of vector quantization algorithm which produces a scalable bit stream which has a number of fine layers of audio fidelity. This modified form of the vector quantization algorithm is used to generate a perceptually audio coder which is scalable and uses the quantization and encoding stages which are responsible for the psychoacoustic and arithmetical terminations that are actually detached as practically all the data detached during the prediction phases at the encoder side is supplemented towards the audio signal at decoder stage. Therefore, clearly the quantization phase which is modified to produce a bit stream which is scalable. This modified algorithm works well at both lower and higher bit rates. Subjective evaluations were done by audio professionals using the MUSHRA test and the mean normalized scores at various bit rates was noted and compared with the previous algorithms.
Go to article

Authors and Affiliations

Shajin Prince
1
Bini D
1
A Alfred Kirubaraj
1
J Samson Immanuel
1
Surya M
1

  1. Karunya Institute of Technology and Sciences, Coimbatore, India
Download PDF Download RIS Download Bibtex

Abstract

Using appropriate signal processing tools to analyze time series data accurately is essential for correctly interpreting the underlying processes. Commonly employed methods include kernel-based transforms that utilize base functions and modifications to depict time series data. This paper refers to the analysis of audio data using two such transforms: the Fourier transform and the wavelet transform, both based on assumptions regarding the signal's linearity and stationarity. However, in audio engineering, these assumptions often do not hold as the statistical characteristics of most audio signals vary over time, making them unsuitable for treatment as outputs from a Linear Time-Invariant (LTI) system. Consequently, more recent methods have shifted towards breaking down signals into various modes in an adaptive, data-specific manner, potentially offering benefits over traditional kernel-based methods. Techniques like empirical mode decomposition and Holo-Hilbert Spectral Analysis are examples of this. The effectiveness of these methods was tested through simulations using speech signals for both kernel-based and adaptive decomposition methods, demonstrating that these adaptive methods are effective for analyzing audio data that is both nonstationary and an output of the nonlinear system.
Go to article

Authors and Affiliations

Marcin Lewandowski
1
Qizhang Deng
2

  1. Warsaw University of Technology
  2. University of New South Wales Sydney
Download PDF Download RIS Download Bibtex

Abstract

Dariah.lab is a research infrastructure created for digital humanities, consisting of state-of-the-art hardware and dedicated software tools. One of the tools developed for digital musicology is Timbra, a web-based application for conducting research on sound timbre. The aim was to create an easy-touse online tool for non-programmers. The tool can be used to calculate, visualise, and compare different timbre characteristics of uploaded audio files and to export the extracted parameters in CSV format for further processing, e.g. by classification tools. The application offers extraction and visualisation of scalar features such as zero crossing rate, fundamental frequency, spectral centroid, spectral roll-off, spectral flatness, band energy ratio, as well as feature vectors (e.g. chromagram, spectral contrast, spectrogram, and MFCCs). An interested user can compare selected sound characteristics using various types of plots and run dissimilarity analysis of timbre parameters by means of 2D or 3D multidimensional scaling (MDS). The paper showcases potential applications of the tool based on presented case studies. In terms of implementation, the calculations are performed at the backend Django server using Librosa and standard Python libraries. Dash library is used for the frontend. By offering an easy-to-use tool accessible anytime and anywhere through the Internet, we want to facilitate timbre analysis for a broader group of researchers, e.g. sound engineers, luthiers, phoneticians, or musicologists.
Go to article

Authors and Affiliations

Filip Szymański
1
Ewa Łukasik
1
Magdalena Chudy
2

  1. Poznan University of Technology, Poznan
  2. Institute of Art, Polish Academy of Sciences, Warsaw, Poland
Download PDF Download RIS Download Bibtex

Abstract

This article presents an efficient method of modelling acoustic phenomena for real-time applications such as computer games. Simplified models of reflections, transmission, and medium attenuation are described along with assessments conducted by a professional sound designer. The article introduces representation of sound phenomena using digital filters for further digital audio processing.
Go to article

Authors and Affiliations

Bartłomiej Miga
Bartosz Ziółko
Download PDF Download RIS Download Bibtex

Abstract

In this paper, a new lifting wavelet domain audio watermarking algorithm based on the statistical characteristics of sub-band coefficients is proposed. First of all, an original audio signal was segmented and each segment was divided into two sections. Then, the Barker code was used for synchronization, the LWT (lifting wavelet transform) was performed on each section, a synchronization code and a watermark were embedded into the first section and the second section, respectively, by modifying the statistical average value of the sub-band coefficients. The embed strength was determined adaptively according to the auditory masking property. Experiments show that the embedded watermark has better robustness against common signal processing attacks than present algorithms based on LWT and can resist random cropping in particular.

Go to article

Authors and Affiliations

Zhi Tao
He-ming Zhao
Jun Wu
Ji-hua Gu
Yi-shen Xu
Di Wu
Download PDF Download RIS Download Bibtex

Abstract

In building speech recognition based applications, robustness to different noisy background condition is an important challenge. In this paper bimodal approach is proposed to improve the robustness of Hindi speech recognition system. Also an importance of different types of visual features is studied for audio visual automatic speech recognition (AVASR) system under diverse noisy audio conditions. Four sets of visual feature based on Two-Dimensional Discrete Cosine Transform feature (2D-DCT), Principal Component Analysis (PCA), Two-Dimensional Discrete Wavelet Transform followed by DCT (2D-DWT- DCT) and Two-Dimensional Discrete Wavelet Transform followed by PCA (2D-DWT-PCA) are reported. The audio features are extracted using Mel Frequency Cepstral coefficients (MFCC) followed by static and dynamic feature. Overall, 48 features, i.e. 39 audio features and 9 visual features are used for measuring the performance of the AVASR system. Also, the performance of the AVASR using noisy speech signal generated by using NOISEX database is evaluated for different Signal to Noise ratio (SNR: 30 dB to −10 dB) using Aligarh Muslim University Audio Visual (AMUAV) Hindi corpus. AMUAV corpus is Hindi continuous speech high quality audio visual databases of Hindi sentences spoken by different subjects.
Go to article

Authors and Affiliations

Prashant Upadhyaya
Omar Farooq
M.R. Abidi
Priyanka Varshney
Download PDF Download RIS Download Bibtex

Abstract

Biography and scientific achievements of Professors Marianna Sankiewicz-Budzyński and Gustaw K.E. Budzyński - Founders of the Polish Audio Engineering.

Go to article

Authors and Affiliations

Andrzej Czyżewski
Bożena Kostek
Download PDF Download RIS Download Bibtex

Abstract

The MDCT and IntMDCT Algorithm is widely utilized is Audio coding. By lifting scheme or rounding operation IntegerMDCT is evolved from Modified Discrete Cosine Transform. This method acquire the properties of MDCT and contribute excelling invertiblity and good spectral mean .In this paper we discuss about the audio codec like AAC and FLAC using MDCT and Integer MDCT algorithm and to find which algorithm shows better Compression Ratio(CR).The confines of this task is to hybriding lossy and lossless audio codec with diminished bit rate but with finer sound quality. Certainly the quality of the audio is figure out by Subjective and Objective testing which is in terms of MOS (Mean opinion square), ABx and some of the hearing aid testing methodology like PEAQ(Perceptual Evaluation Audio Quality) and ODG(Objective Difference Grade)is followed. Execution measure, that is Compression Ratio(CR) and Sound Pressure Level (SPL) is approximated.

Go to article

Authors and Affiliations

M. Davidson Kamala Dhas
R. Priyadharsini
Download PDF Download RIS Download Bibtex

Abstract

Field programmable analog arrays (FPAA), thanks to their flexibility and reconfigurability, give the designers quite new possibilities in analog circuit design. The number of both academic projects on FPAA and applications of commercially available programmable devices is still growing. This paper explores the properties and parameters of two most popular FPAA circuits: the AnadigmVortex AN221E04 and AnadigmApex AN231E04 from the Anadigm company. The research conducted by the authors led to the discovery of some undocumented features of these devices. Several applications for audio processing were built and tested. The results show that these circuits can be used in medium-demanding audio applications. Thanks to dynamic reconfigurability, they also allow to build an universal analog audio signal processor. These circuits can also act as a versatile platform for rapid prototyping and educational purposes.

Go to article

Authors and Affiliations

Piotr Falkowski
Andrzej Malcher
Download PDF Download RIS Download Bibtex

Abstract

The aim of the study was to examine how the wording of a question about audio, visual and audiovisual stimuli can affect the assessment of the environment. The participants of the psychophysical experiments were asked to rate, on a numerical scale, audio and visual information both separately and together, combined into mixes. A set of questions was used for all the investigated audio, visual, and audio-visual stimuli. The participants were asked about the comfort or the discomfort caused by the perceived stimuli presented at three different sound levels.
The results show that there are no statistically significant differences between the assessment of comfort and discomfort associated with visual samples. Actually, the comfort and discomfort ratings are equivalent to the extent that a discomfort rating can be represented as the opposite to the comfort rating, i.e. the discomfort rating is equal to the 10 minus comfort rating.
In general, the results obtained for audio and audio-visual samples were the same, with only a few exceptions that were dependent on sound level. No statistically significant differences were found for the loudest stimuli, but there were some exceptions for the softener cases. Based on the results, we show that only for visual stimuli both scales are totally interchangeable. When presenting audio and audio-visual samples, only one scale should be applied – either discomfort or comfort, depending on the context and the character of the stimuli.
Go to article

Authors and Affiliations

Jan Felcyn
1
ORCID: ORCID
Anna Preis
1
Marcin Praszkowski
1
Małgorzata Wrzosek
2

  1. Department of Acoustics, Faculty of Physics, Adam Mickiewicz University, Poznan, Poland
  2. Institute of Philosophy, Szczecin University, Szczecin, Poland
Download PDF Download RIS Download Bibtex

Abstract

As the virtual reality (VR) market is growing at a fast pace, numerous users and producers are emerging with the hope to navigate VR towards mainstream adoption. Although most solutions focus on providing highresolution and high-quality videos, the acoustics in VR is as important as visual cues for maintaining consistency with the natural world. We therefore investigate one of the most important audio solutions for VR applications: ambisonics. Several VR producers such as Google, HTC, and Facebook support the ambisonic audio format. Binaural ambisonics builds a virtual loudspeaker array over a VR headset, providing immersive sound. The configuration of the virtual loudspeaker influences the listening perception, as has been widely discussed in the literature. However, few studies have investigated the influence of the orientation of the virtual loudspeaker array. That is, the same loudspeaker arrays with different orientations can produce different spatial effects. This paper introduces a VR audio technique with optimal design and proposes a dual-mode audio solution. Both an objective measurement and a subjective listening test show that the proposed solution effectively enhances spatial audio quality.
Go to article

Authors and Affiliations

Shu-Nung Yao
1

  1. Department of Electrical Engineering, National Taipei University, No. 151, University Rd., Sanxia Dist., New Taipei City 237303, Taiwan
Download PDF Download RIS Download Bibtex

Abstract

The paper examines the usage of Convolutional Bidirectional Recurrent Neural Network (CBRNN) for a problem of quality measurement in a music content. The key contribution in this approach, compared to the existing research, is that the examined model is evaluated in terms of detecting acoustic anomalies without the requirement to provide a reference (clean) signal. Since real music content may include some modes of instrumental sounds, speech and singing voice or different audio effects, it is more complex to analyze than clean speech or artificial signals, especially without a comparison to the known reference content. The presented results might be treated as a proof of concept, since some specific types of artefacts are covered in this paper (examples of quantization defect, missing sound, distortion of gain characteristics, extra noise sound). However, the described model can be easily expanded to detect other impairments or used as a pre-trained model for other transfer learning processes. To examine the model efficiency several experiments have been performed and reported in the paper. The raw audio samples were transformed into Mel-scaled spectrograms and transferred as input to the model, first independently, then along with additional features (Zero Crossing Rate, Spectral Contrast). According to the obtained results, there is a significant increase in overall accuracy (by 10.1%), if Spectral Contrast information is provided together with Mel-scaled spectrograms. The paper examines also the influence of recursive layers on effectiveness of the artefact classification task.

Go to article

Authors and Affiliations

Kamila Organiściak
Józef Borkowski
Download PDF Download RIS Download Bibtex

Abstract

Most of the existing algorithms for the objective audio quality assessment are intrusive, as they require access both to an unimpaired reference recording and an evaluated signal. This feature excludes them from many practical applications. In this paper, we introduce a non-intrusive audio quality assessment method. The proposed method is intended to account for audio artefacts arising from the lossy compression of music signals. During its development, 250 high-quality uncompressed music recordings were collated. They were subsequently processed using the selection of five popular audio codecs, resulting in the repository of 13,000 audio excerpts representing various levels of audio quality. The proposed non-intrusive method was trained with the data obtained employing a well-established intrusive model (ViSQOL v3). Next, the performance of the trained model was evaluated utilizing the quality scores obtained in the subjective listening tests undertaken remotely over the Internet. The listening tests were carried out in compliance with the MUSHRA recommendation (ITU-R BS.1534-3). In this study, the following three convolutional neural networks were compared: (1) a model employing 1D convolutional filters, (2) an Inception-based model, and (3) a VGG-based model. The last-mentioned model outperformed the model employing 1D convolutional filters in terms of predicting the scores from the listening tests, reaching a correlation value of 0.893. The performance of the Inceptionbased model was similar to that of the VGG-based model. Moreover, the VGG-based model outperformed the method employing a stacked gated-recurrent-unit-based deep learning framework, recently introduced by Mumtaz et al. (2022).
Go to article

Authors and Affiliations

Aleksandra Kasperuk
1
Sławomir Krzysztof Zieliński
1

  1. Faculty of Computer Science, Białystok University of Technology, Poland

This page uses 'cookies'. Learn more