Feature Extraction and Analysis Method of Trombone Timbre Based on CNN Model
Feature Extraction and Analysis Method of Trombone Timbre Based on CNN Model
Wang, Yanjun
2022-09-23 00:00:00
Hindawi Journal of Robotics Volume 2022, Article ID 9460208, 12 pages https://doi.org/10.1155/2022/9460208 Research Article Feature Extraction and Analysis Method of Trombone Timbre Based on CNN Model Yanjun Wang School of Music, Shandong College of Arts, Jinan 250014, China Correspondence should be addressed to Yanjun Wang; z00922@sdca.edu.cn Received 10 August 2022; Revised 30 August 2022; Accepted 13 September 2022; Published 23 September 2022 Academic Editor: Shahid Hussain Copyright © 2022 Yanjun Wang. �is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In order to improve the accuracy of trombone timbre feature extraction, this paper combines the CNN model to construct a trombone timbre feature extraction model and summarizes the principle of trombone timbre signal. Moreover, this paper deduces the parameters of the trombone timbre signal and the corresponding network model and uses mathematical expressions to model the trombone timbre signal, which is convenient for theoretical analysis and processing of the trombone timbre signal. In addition, this paper provides a detailed discussion of time-frequency analysis techniques, including their advantages and limitations, which provide an algorithmic basis for working with trombone timbre signals. It can be seen that time-frequency analysis technology still has great advantages in trombone timbre signal processing. Finally, the simulation results show that the trombone timbre feature extraction method based on the CNN model proposed in this paper can e‹ectively identify the trombone timbre in various musical performances. achieved by observing the feeling of color with the naked eye, 1. Introduction and the ear will also form a corresponding impression when At the beginning of its invention, the trombone was divided hearing the sound, that is, “timbre.” It should be noted that into treble trombone, alto trombone, sub trombone, and the realization of beautiful timbre cannot be judged only bass trombone. However, with the continuous invention and from a certain angle, such as the thickness and brightness of improvement of musical instruments, the treble trombone the sound, but needs to be comprehensively interpreted in and the alto trombone in the trombone type were gradually combination with di‹erent scenes [2]. �is is like in the replaced by other instruments, and only the infratone painting process: the perfect painting cannot be achieved trombone and the bass trombone remained. With the only through a single color and line, but through di‹erent continuous development of the times, the trombone has colors to show the layering and artistic charm of the evolved into a relatively mature musical instrument and is painting. In the trombone performance, it is necessary to use widely loved by many music lovers because of its unique di‹erent timbres according to the di‹erent types of pieces to be played, instead of using a single timbre for performance. timbre charm. Moreover, it plays an important and irre- placeable role in the performance of wind instruments [1]. It is recognized as a deduction that achieves beautiful timbre Timbre refers to the feeling that sound brings to people, [3]. For example, performing classical repertoire needs to be and it can also be understood as the quality of sound. Timbre accompanied by a calm and vigorous tone; performing has the function of judging the performance of musical romantic music requires a bright and cheerful tone. At instruments, such as the e‹ective judgment of many factors present, modern musical instruments also use di‹erent including the smoothness, strength, and weakness, and timbre interpretation methods in the performance process, emotion of the sound produced when the relevant musical and it is still necessary to combine the style of music for instrument is played. �e understanding of timbre can be correct interpretation. �erefore, to realize the beautiful 2 Journal of Robotics (e trombone is different from the keyed instrument. It timbre of the trombone, it must be feasible under the premise of meeting the requirements of music and under- does not have the flexible control of the keyed instrument, and it is also unrestrained and bold with the percussion instru- standing the concept [4]. In the performance of the trombone, the performance ment, but it can play a sound with its own personality, level and ability of the individual will have the most direct showing its own unique charm in performance. Moreover, the impact on the performance. (erefore, the key to whether function of tongue movement is an important expression the trombone can play a beautiful timbre depends on the technique for trombone players. In trombone performance, performer. Although the ability to play the trombone can be we can usually hear music with a very fast rhythm. (is is to acquired through a long period of hard training, it needs to use tongue movement and speed to play. Yes, we should practice a lot of tongue movement skills on the basis of be clear that the correct playing method must be mastered in order to achieve the perfect interpretation of the trombone mastering basic skills, such as speeding up the speed of the tongue on the premise of expressing each note neatly and [5]. Learning the trombone requires the use of the correct mouth shape and breath, and further study on this basis is clearly [12]. All in all, we can only improve in tongue movement skills after mastering the method and practicing a meaningful. Under normal circumstances, the smile-pressed mouth shape will produce a relatively bright and thin tone lot, and only through hard work can, we achieve something in when playing the trombone and a lack of control over the basic skills. Diligently practice the basic skills and learn from intonation and volume, resulting in a high overall tone. the simple to the complex order, and there will be a higher Especially when performing classical music or romantic level breakthrough in the performance of the trombone [13]. music, it is especially sharp, not round enough, and difficult (e lip shape of the bass trombone is very important, it is to coordinate with other musical instruments [6]. (e use of related to the sound quality and sound quality. (e shape of the mouth plays an important and decisive role in the sound a too concentrated mouth shape will produce a relatively dull tone, which is not bright enough and too high-pitched quality of the bass trombone. Bass trombone performance relies on the coordination of mouth shape and teeth and compared to the interpretation of the smile-pressed mouth shape. Because the mouth shape cannot be used flexibly in other organs. Mouth shape is the most basic learning stage, and bass trombone players must ensure a correct mouth performance, the performance of gorgeous music or music that requires higher performance skills does not have suf- shape [14]. It is necessary to practice patiently and make ficient performance expression [7]. (erefore, studies have continuous progress because the innate conditions are shown that using a mouth shape that cancels each other out different, so the ability to use and control the mouth shape is with a smile and tuck is by far the correct mouth shape for also decidedly different. We must analyze this situation trombone playing. Because of its flexibility and power, it can based on the actual situation [15]. fully meet the performance requirements of various pieces of Whether the music is pleasing to the ear or not, the pitch plays an important decision. Pitch specifically refers to the music and can achieve a more ideal effect of trombone performance [8]. high-pitched sound produced during singing and musical instrument performance that can match the high-pitched In addition to the shape of the mouth, the vibration of the lips and the speed of the airflow caused by breathing also sound of a certain rate. (erefore, it is said that pitch de- have a major impact on the timbre. In the use of breathing in termines the basic conditions for playing beautiful, pleasant, wind instruments, players often use two methods, one is and pleasant music. (ere are many people in life who sing chest breathing and the other is abdominal breathing. Chest songs that feel particularly beautiful, and the real reason is breathing makes the performer easily fatigued due to too the pitch [16]. As far as musical instruments are concerned, little inhalation during the performance, and it is difficult to trombone and pitch have an inseparable and important achieve good breath regulation and strength support in the relationship. Trombone mainly relies on the movement of high-pitched or low-pitched part of the music [9]. Ab- the telescopic tube to adjust the treble, so it is not easy to correctly control the pitch issue. It takes hard work and time dominal breathing mainly relies on the breath force of the abdomen to achieve performance. Although the performer to do the right exercise and inquiry. And if we can usually listen to some excellent performances, it can also help us to can achieve positive movement to generate breath during the performance, it cannot effectively increase the inhalation improve the pitch [17]. value and maximize the inhalation. At present, thoracic- (is paper combines the CNN model to construct the abdominal breathing is the most scientific playing method trombone timbre feature extraction model to improve the used in wind instrument performance, especially trombones. training and learning effect of the trombone and promote (e muscles of breathing can not only maximize breathing the effect of the trombone in playing. but also control the breathing smoothly, without causing fatigue to the performer [10]. 2. Trombone Timbre Frequency In order to develop the bass trombone to a greater extent Communication Overview in the art of playing, we must learn and master the bass trombone playing techniques. (e following will focus on 2.1. Trombone Timbre Frequency Communication Principle some of the factors that the player can control, such as how and Characteristics. As shown in Figure 1, the trombone to use the mouth skills in shape, breathing method, tongue timbre frequency communication process first generates the movement, pronunciation, etc., to make the bass trombone original information data from the transmitting end and sound more perfect and harmonious [11]. then performs the first baseband modulation through the Journal of Robotics 3 Information modulator Information Information Frequency hopping Frequency hopping Information modulator modulator modulator Frequency hopping Frequency hopping synthesizer synthesizer Frequently hopping Frequently hopping rate table rate table Frequency hopping Frequency hopping sequence sequence Frequency hopping synchronization Figure 1: Block diagram of the working principle of the trombone timbre frequency communication system. information modulator. At the same time, the trombone nonstationary signal whose carrier frequency varies with timbre frequency sequence is generated under the control of time. (e variation law of its carrier frequency is controlled the pseudorandom sequence, and then the frequency hop- by pseudorandom sequences, such as m-sequence, gold ping table is synthesized through a specific mapping rela- sequence, and so on. tionship, so as to control the frequency synthesizer to select (e regularity of the trombone timbre frequency signal is the local carrier according to the corresponding rules. After generally observed through the time-frequency diagram. On a that, the baseband signal is multiplied by the trombone time-frequency diagram, the trombone timbre frequency signal appears as a line that varies in the time-frequency timbre frequency modulator to achieve the purpose of corresponding frequency shifting, that is, the carrier wave. dimension. (e time-frequency diagram of a single trombone Finally, the frequency band information is radiated into the timbre frequency signal is shown in Figure 3. (e horizontal air through the transmitting antenna. axis of the time-frequency diagram is time, and the vertical According to whether the frequency hopping is the same axis is frequency. (erefore, from the time-frequency dia- time reference, it is divided into synchronous network and gram, parameters such as time-hopping, period-hopping, and asynchronous network. According to whether the trombone frequency sets of the trombone timbre frequency signal can be timbre frequency collides at the same time, it is divided into clearly observed. We assume that M trombone timbre fre- orthogonal network and nonorthogonal network. In general, quency signal segments are received at observation time T. due to the asynchronous network, it is difficult to avoid the (ere are K complete signal segments, the hopping period is collision of trombone timbre frequencies. (erefore, the T , and the carrier frequency is f (k � 1, 2, . . . , K). In- h k trombone timbre frequency networking methods are gen- complete signals are mainly the beginning and the end of the erally divided into three categories according to the two parts. Among them, the start duration is τ , the carrier abovementioned situation. A schematic diagram of each frequency is f , the end duration is τ , and the carrier fre- s e network model is shown in Figure 2. quency is f . (en, the expression for the signal throughout the observation time is given as follows: 2.2. Mathematical Model of Trombone Timbre Frequency Signal. (e trombone timbre frequency signal is a K− 1 ⎧ ⎨ ⎫ ⎬ t t − (k − 1)T − τ t − (K − 1)T − τ h s h s s(t) � a(t) × rect exp j2πf + rect exp j2πf + rect exp j2πf . s k e ⎩ ⎭ τ T τ s h e k�2 (1) 4 Journal of Robotics 800 800 700 700 600 600 500 500 400 400 300 300 200 200 100 100 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) Net platform 1 Net platform 1 Net platform 2 Net platform 2 (a) (b) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Net platform 1 Net platform 2 (c) Figure 2: (ree types of networking. (a) Synchronous orthogonal networking, (b) synchronous nonorthographic networking, and (c) asynchronous nonorthogonal networking. 800 Among them, ⎧ ⎪ 10< t≤ T, rect � (2) 0 others, where a(t) represents the complex envelope of the observed trombone timbre frequency signal. (e signal received in the real communication envi- 300 ronment is not only a single trombone timbre frequency signal but is usually mixed with other trombone timbre frequency signals, as well as interference signals, such as fixed frequency signals, sweep frequency signals, and so on. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 (erefore, in the case of single-antenna reception, it is as- Time (s) sumed that N trombone timbre frequency signals are re- ceived from the air, and the interference is additive Figure 3: Schematic diagram of a single trombone timbre fre- interference, so the expression for single-antenna multi- quency signal. trombone timbre frequency reception is given as follows: Frequency (kHz) Frequency (kHz) Frequency (kHz) Frequency (kHz) Journal of Robotics 5 resolution in the frequency domain is higher, and the res- y(t) � s (t) + n(t). (3) olution in the time domain is lower; if it is a short window, n�1 the result is the opposite. (e two contradict each other, which is mainly constrained by the uncertainty principle. Its In formula (3), s (n � 1, 2, . . . , N) is the trombone expression is given as follows: timbre frequency signal, and n(t) is the sum of various disturbances and noises. Of course, in addition to single- (10) BT≥ . antenna reception, there is also array antenna reception, mainly using linear uniform linear array and uniform cir- Among them, B is the bandwidth, and T is the time cular array. (e purpose of applying multiple antennas is to width. (erefore, when analyzing the signal, it is necessary to use the signal arrival delay to estimate parameters such as the select the appropriate window function parameters incident direction angle, which is convenient for the later according to different situations. Figure 4 shows the time- blind source separation problem. However, this article is frequency diagram of the STFT transformation of the mainly a single-antenna system, so it will not be discussed trombone timbre frequency signal processed by different for now. windows, the sampling length of the signal is N, and the type of the window H is the hamming window. (e window length of Figure 4(a) is N/4 + 1, and the window length of 2.3. Time-Frequency Analysis Technology of Trombone Timbre Figure 4(b) is N/10 + 1. Frequency Signal. (e short-time Fourier transform (STFT) It can be seen that when the window length is N/4 + 1, is a classical linear transform proposed by Gabor. It mainly the frequency resolution of the trombone timbre frequency uses h(t) for windowing processing on the basis of Fourier signal is high, and when the window length is N/10 + 1, the transform. Moreover, each segment of the signal segmented time resolution is high. (erefore, when analyzing the by the window function is considered to be stable, and then trombone timbre frequency signal, the length of the window the window function is continuously shifted, and the Fourier depends on the situation. For fast trombone timbre fre- transform is performed on each segment of the signal, and quency signals, short window processing can be selected. finally, all the transforms are superimposed. Because the However, for slow trombone timbre frequency signals, long trombone timbre frequency signal s(t) is a nonstationary window processing can be selected. signal, it has a good effect on STFT processing, and its For the case where multiple signals are superimposed, continuous-time expression is given as follows: s (t) and s (t) are added linearly, and the following formula 1 2 +∞ ∗ − i2πfτ is satisfied: STFT (4) (t, f) � s(τ)h (τ − t)e dτ. − ∞ STFT as (t) + bs (t) � aSTFT s (t) + bSTFT s (t) . 1 2 1 2 (e discrete expression is given as follows: (11) N− 1 − j(2π/N)kn STFT(m, n) � s(m + k)h(k)e , n � 0, 1, . . . , N − 1. It can be seen that STFT is a linear transformation. k�0 Common ones are Gabor transform and wavelet transform. (5) Gabor transform is proposed by Gabor in 1946 to represent time-frequency information in the form of a grid Among them, m is the number of sampling points in the in a two-dimensional plane. Compared with the time time dimension, n is the number of sampling points in the window of STFT, Gabor is a joint time-frequency window. frequency dimension, h(k) is the discrete window function, (e Gabor change can be seen as the Gauss window selected and STFT(n, m) represents the amplitude at the corre- by the STFT as the window function. sponding coordinate point. It can be seen from the abovementioned formula that STFT also has the following j2πfτ Gabor(t, f) � s(τ)g (τ − t)e dτ. (12) Guass properties: (1) Time shift property: When the Gaussian window function is used, the lower bound of the uncertainty theorem can be satisfied. (erefore, y(t) � s t − t , (6) the Gabor transform can be regarded as the optimal STFT transform. However, the shape of Gabor transform window STFT (t, f) � STFT t, f − f . (7) y s 0 function is uniform, unlike STFT, which can choose a variety of window functions to truncate the signal, and the time- (2) Frequency shift property: frequency window size is fixed. As far as the actual signal j2πf t processing is concerned, the window length used for signals (8) y(t) � s(t)e , of different frequencies should be variable. Wavelet change is a linear transformation with different resolutions at different j2πf STFT (t, f) � STFT t − t , f e . (9) y x 0 frequencies. Its expression is given as follows: 1 t − b For STFT, its time-frequency resolution is mainly af- √� � wavelet(a, b) � f(t)ψ dt. (13) a a fected by the width of the window. If it is a long window, the 6 Journal of Robotics 3.5 3.5 2.5 2.5 1.5 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) (a) (b) Figure 4: STFTs with different window lengths. (a) H � N/4 + 1. (b) H � N/10 + 1. 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) (a) (b) Figure 5: Two commonly used linear transformations. (a) Gabor transfer and (b) wavelet transform. Among them, ψ(t) represents the mother wavelet, a cross-term. (e WVD transformation expression is given as represents the scale, which controls the expansion and follows: contraction of the wavelet function, and b represents the τ τ ∗ j2πfτ WVD(t, f) � st + s t − e dτ. (14) translation amount, which controls the movement on the − ∞ 2 2 time axis of the wavelet function. Figure 5 shows the analysis of trombone timbre frequency signal by Gabor transform It can be seen from Equation (14) that WVD does not and wavelet transform. (e Gabor transform uses a Gaussian need to select a window function similar to STFT transform window with a window width of N/(10 + 1), and the wavelet to intercept the signal, and s((t + τ)/2)s ((t − τ)/2) can be base used by the wavelet is a complex-valued Morlet wavelet. regarded as the autocorrelation function of the signal. It can be seen from Figure 5 that the Gabor transform is However, WVD is its Fourier transform with respect to τ, similar to STFT, that is, the resolution for each frequency is and the result obtained is a two-dimensional parameter of consistent. However, the wavelet transform has different the time-frequency plane. resolutions for different frequencies, indicating that the First, the WVD is processed by windowing in the time resolution of the wavelet transform is adaptive. It shows that domain similar to STFT, which can remove the influence of the wavelet transform has a good analysis effect for the signal the cross term, and then obtain the PWVD, and its ex- of a single frequency, but for the multicomponent signal pression is given as follows: such as the trombone timbre frequency, the effect of the +∞ τ τ wavelet transform is not very good. ∗ − j2πfτ PWVD (t, f) � s t − st + h(τ)e dτ. Compared with the linear time-frequency transforma- 2 2 − ∞ tion, it does not have linear superposition, so after the (15) transformation, there will be two results of self-term and Frequency (kHz) Frequency (kHz) Frequency (kHz) Frequency (kHz) Journal of Robotics 7 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) (a) (b) 3.5 2.5 1.5 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) and (c) Figure 6: Time-frequency diagram of WVD and its improvement. (a) WVD, (b) PWVD, and (c) SPWVD. (e h (trombone timbre) in the formula is the added In order to achieve the purpose of completely elimi- time window function, and its main function is to smooth nating the cross term, based on the PWVD, a windowing the s(t) in the time domain. PWVD reduces the impact of operation is also performed in the frequency domain, which most cross-interference on the signal by sacrificing time- is SPWVD, and its expression is given as follows: frequency focus. +∞ +∞ τ τ ∗ − j2πfτ (16) SPWVD(t, f) � st − v + s t − v − h(t)g(v)e dvdτ. 2 2 − ∞ − ∞ In the formula, h (trombone timbre) represents the the self-term and cross-term interference become indis- window function in the time domain, which mainly per- tinguishable, and even some cross-term energy is completely forms filtering in the time domain. g (trombone timbre) larger than the signal self-term energy. However, its time- represents the window function in the frequency domain, frequency focus is indeed the best. Figure 6(b) shows the and its function is to filter in the frequency domain. PWVD windowed in the time domain. It can be seen that it Compared with PWVD, because the window function is has a certain inhibitory effect on the cross term but does not added in the frequency domain, the suppression effect of the completely eliminate the interference of the cross term, and cross term is better. However, secondary windowing makes the addition of the window reduces the time-frequency focus SPWVD less time-frequency focused. Figure 6 is a time- of the PWVD. Figure 6(c) shows the SPWVD with windows frequency analysis diagram of WVD, PWVD, and SPWVD. in both the time domain and the frequency domain. It can be As can be seen from Figure 6(a), after the trombone seen that the interference of the cross term has been timbre frequency signal undergoes WVD transformation, completely suppressed. However, the time-frequency focus Frequency (kHz) Frequency (kHz) Frequency (kHz) 8 Journal of Robotics 2 is also reduced, so the windowing operation is to suppress SPEC (t, f) � aSTF (t, f) + bSTFT (t, f) s s s 1 2 the cross-term problem between the signals at the expense of 2 2 2 � a STFT (t, f) + bSTFT (t, f) s s reducing the time-frequency focus. 1 2 (e reason why the cross-term is generated, when WVD + 2abSTFT (t, f)STFT (t, f)cosϕ − ϕ . s s s (t,f) s (t,f) 1 2 1 2 analyzes multiple signals, can be demonstrated by mathe- (19) matical formulas. For signal s � s + s , there is Among them, ϕ � arg(STFT (t, f)), ϕ � 1 2 s (t,f) s (t,f) 1 1 2 +∞ arg(STF (t, f)). It can be seen that the spectrum does not τ τ s ∗ − j2πfτ 2 WVD � st + s t − e dτ s satisfy the linear superposition, and there is a phase cross − ∞ 2 2 term of cos(ϕ − ϕ ). However, if the two signals do s (t,f) s (t,f) 1 2 not overlap in the time and frequency domains, the fol- +∞ τ τ − j2πfτ lowing expressions are satisfied: � s + s t + s + s t − e dτ