### Abstract

Hindawi Journal of Robotics Volume 2022, Article ID 9460208, 12 pages https://doi.org/10.1155/2022/9460208 Research Article Feature Extraction and Analysis Method of Trombone Timbre Based on CNN Model Yanjun Wang School of Music, Shandong College of Arts, Jinan 250014, China Correspondence should be addressed to Yanjun Wang; z00922@sdca.edu.cn Received 10 August 2022; Revised 30 August 2022; Accepted 13 September 2022; Published 23 September 2022 Academic Editor: Shahid Hussain Copyright © 2022 Yanjun Wang. �is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In order to improve the accuracy of trombone timbre feature extraction, this paper combines the CNN model to construct a trombone timbre feature extraction model and summarizes the principle of trombone timbre signal. Moreover, this paper deduces the parameters of the trombone timbre signal and the corresponding network model and uses mathematical expressions to model the trombone timbre signal, which is convenient for theoretical analysis and processing of the trombone timbre signal. In addition, this paper provides a detailed discussion of time-frequency analysis techniques, including their advantages and limitations, which provide an algorithmic basis for working with trombone timbre signals. It can be seen that time-frequency analysis technology still has great advantages in trombone timbre signal processing. Finally, the simulation results show that the trombone timbre feature extraction method based on the CNN model proposed in this paper can e‹ectively identify the trombone timbre in various musical performances. achieved by observing the feeling of color with the naked eye, 1. Introduction and the ear will also form a corresponding impression when At the beginning of its invention, the trombone was divided hearing the sound, that is, “timbre.” It should be noted that into treble trombone, alto trombone, sub trombone, and the realization of beautiful timbre cannot be judged only bass trombone. However, with the continuous invention and from a certain angle, such as the thickness and brightness of improvement of musical instruments, the treble trombone the sound, but needs to be comprehensively interpreted in and the alto trombone in the trombone type were gradually combination with di‹erent scenes [2]. �is is like in the replaced by other instruments, and only the infratone painting process: the perfect painting cannot be achieved trombone and the bass trombone remained. With the only through a single color and line, but through di‹erent continuous development of the times, the trombone has colors to show the layering and artistic charm of the evolved into a relatively mature musical instrument and is painting. In the trombone performance, it is necessary to use widely loved by many music lovers because of its unique di‹erent timbres according to the di‹erent types of pieces to be played, instead of using a single timbre for performance. timbre charm. Moreover, it plays an important and irre- placeable role in the performance of wind instruments [1]. It is recognized as a deduction that achieves beautiful timbre Timbre refers to the feeling that sound brings to people, [3]. For example, performing classical repertoire needs to be and it can also be understood as the quality of sound. Timbre accompanied by a calm and vigorous tone; performing has the function of judging the performance of musical romantic music requires a bright and cheerful tone. At instruments, such as the e‹ective judgment of many factors present, modern musical instruments also use di‹erent including the smoothness, strength, and weakness, and timbre interpretation methods in the performance process, emotion of the sound produced when the relevant musical and it is still necessary to combine the style of music for instrument is played. �e understanding of timbre can be correct interpretation. �erefore, to realize the beautiful 2 Journal of Robotics (e trombone is diﬀerent from the keyed instrument. It timbre of the trombone, it must be feasible under the premise of meeting the requirements of music and under- does not have the ﬂexible control of the keyed instrument, and it is also unrestrained and bold with the percussion instru- standing the concept [4]. In the performance of the trombone, the performance ment, but it can play a sound with its own personality, level and ability of the individual will have the most direct showing its own unique charm in performance. Moreover, the impact on the performance. (erefore, the key to whether function of tongue movement is an important expression the trombone can play a beautiful timbre depends on the technique for trombone players. In trombone performance, performer. Although the ability to play the trombone can be we can usually hear music with a very fast rhythm. (is is to acquired through a long period of hard training, it needs to use tongue movement and speed to play. Yes, we should practice a lot of tongue movement skills on the basis of be clear that the correct playing method must be mastered in order to achieve the perfect interpretation of the trombone mastering basic skills, such as speeding up the speed of the tongue on the premise of expressing each note neatly and [5]. Learning the trombone requires the use of the correct mouth shape and breath, and further study on this basis is clearly [12]. All in all, we can only improve in tongue movement skills after mastering the method and practicing a meaningful. Under normal circumstances, the smile-pressed mouth shape will produce a relatively bright and thin tone lot, and only through hard work can, we achieve something in when playing the trombone and a lack of control over the basic skills. Diligently practice the basic skills and learn from intonation and volume, resulting in a high overall tone. the simple to the complex order, and there will be a higher Especially when performing classical music or romantic level breakthrough in the performance of the trombone [13]. music, it is especially sharp, not round enough, and diﬃcult (e lip shape of the bass trombone is very important, it is to coordinate with other musical instruments [6]. (e use of related to the sound quality and sound quality. (e shape of the mouth plays an important and decisive role in the sound a too concentrated mouth shape will produce a relatively dull tone, which is not bright enough and too high-pitched quality of the bass trombone. Bass trombone performance relies on the coordination of mouth shape and teeth and compared to the interpretation of the smile-pressed mouth shape. Because the mouth shape cannot be used ﬂexibly in other organs. Mouth shape is the most basic learning stage, and bass trombone players must ensure a correct mouth performance, the performance of gorgeous music or music that requires higher performance skills does not have suf- shape [14]. It is necessary to practice patiently and make ﬁcient performance expression [7]. (erefore, studies have continuous progress because the innate conditions are shown that using a mouth shape that cancels each other out diﬀerent, so the ability to use and control the mouth shape is with a smile and tuck is by far the correct mouth shape for also decidedly diﬀerent. We must analyze this situation trombone playing. Because of its ﬂexibility and power, it can based on the actual situation [15]. fully meet the performance requirements of various pieces of Whether the music is pleasing to the ear or not, the pitch plays an important decision. Pitch speciﬁcally refers to the music and can achieve a more ideal eﬀect of trombone performance [8]. high-pitched sound produced during singing and musical instrument performance that can match the high-pitched In addition to the shape of the mouth, the vibration of the lips and the speed of the airﬂow caused by breathing also sound of a certain rate. (erefore, it is said that pitch de- have a major impact on the timbre. In the use of breathing in termines the basic conditions for playing beautiful, pleasant, wind instruments, players often use two methods, one is and pleasant music. (ere are many people in life who sing chest breathing and the other is abdominal breathing. Chest songs that feel particularly beautiful, and the real reason is breathing makes the performer easily fatigued due to too the pitch [16]. As far as musical instruments are concerned, little inhalation during the performance, and it is diﬃcult to trombone and pitch have an inseparable and important achieve good breath regulation and strength support in the relationship. Trombone mainly relies on the movement of high-pitched or low-pitched part of the music [9]. Ab- the telescopic tube to adjust the treble, so it is not easy to correctly control the pitch issue. It takes hard work and time dominal breathing mainly relies on the breath force of the abdomen to achieve performance. Although the performer to do the right exercise and inquiry. And if we can usually listen to some excellent performances, it can also help us to can achieve positive movement to generate breath during the performance, it cannot eﬀectively increase the inhalation improve the pitch [17]. value and maximize the inhalation. At present, thoracic- (is paper combines the CNN model to construct the abdominal breathing is the most scientiﬁc playing method trombone timbre feature extraction model to improve the used in wind instrument performance, especially trombones. training and learning eﬀect of the trombone and promote (e muscles of breathing can not only maximize breathing the eﬀect of the trombone in playing. but also control the breathing smoothly, without causing fatigue to the performer [10]. 2. Trombone Timbre Frequency In order to develop the bass trombone to a greater extent Communication Overview in the art of playing, we must learn and master the bass trombone playing techniques. (e following will focus on 2.1. Trombone Timbre Frequency Communication Principle some of the factors that the player can control, such as how and Characteristics. As shown in Figure 1, the trombone to use the mouth skills in shape, breathing method, tongue timbre frequency communication process ﬁrst generates the movement, pronunciation, etc., to make the bass trombone original information data from the transmitting end and sound more perfect and harmonious [11]. then performs the ﬁrst baseband modulation through the Journal of Robotics 3 Information modulator Information Information Frequency hopping Frequency hopping Information modulator modulator modulator Frequency hopping Frequency hopping synthesizer synthesizer Frequently hopping Frequently hopping rate table rate table Frequency hopping Frequency hopping sequence sequence Frequency hopping synchronization Figure 1: Block diagram of the working principle of the trombone timbre frequency communication system. information modulator. At the same time, the trombone nonstationary signal whose carrier frequency varies with timbre frequency sequence is generated under the control of time. (e variation law of its carrier frequency is controlled the pseudorandom sequence, and then the frequency hop- by pseudorandom sequences, such as m-sequence, gold ping table is synthesized through a speciﬁc mapping rela- sequence, and so on. tionship, so as to control the frequency synthesizer to select (e regularity of the trombone timbre frequency signal is the local carrier according to the corresponding rules. After generally observed through the time-frequency diagram. On a that, the baseband signal is multiplied by the trombone time-frequency diagram, the trombone timbre frequency signal appears as a line that varies in the time-frequency timbre frequency modulator to achieve the purpose of corresponding frequency shifting, that is, the carrier wave. dimension. (e time-frequency diagram of a single trombone Finally, the frequency band information is radiated into the timbre frequency signal is shown in Figure 3. (e horizontal air through the transmitting antenna. axis of the time-frequency diagram is time, and the vertical According to whether the frequency hopping is the same axis is frequency. (erefore, from the time-frequency dia- time reference, it is divided into synchronous network and gram, parameters such as time-hopping, period-hopping, and asynchronous network. According to whether the trombone frequency sets of the trombone timbre frequency signal can be timbre frequency collides at the same time, it is divided into clearly observed. We assume that M trombone timbre fre- orthogonal network and nonorthogonal network. In general, quency signal segments are received at observation time T. due to the asynchronous network, it is diﬃcult to avoid the (ere are K complete signal segments, the hopping period is collision of trombone timbre frequencies. (erefore, the T , and the carrier frequency is f (k � 1, 2, . . . , K). In- h k trombone timbre frequency networking methods are gen- complete signals are mainly the beginning and the end of the erally divided into three categories according to the two parts. Among them, the start duration is τ , the carrier abovementioned situation. A schematic diagram of each frequency is f , the end duration is τ , and the carrier fre- s e network model is shown in Figure 2. quency is f . (en, the expression for the signal throughout the observation time is given as follows: 2.2. Mathematical Model of Trombone Timbre Frequency Signal. (e trombone timbre frequency signal is a K− 1 ⎧ ⎨ ⎫ ⎬ t t − (k − 1)T − τ t − (K − 1)T − τ h s h s s(t) � a(t) × rect exp j2πf + rect exp j2πf + rect exp j2πf . s k e ⎩ ⎭ τ T τ s h e k�2 (1) 4 Journal of Robotics 800 800 700 700 600 600 500 500 400 400 300 300 200 200 100 100 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) Net platform 1 Net platform 1 Net platform 2 Net platform 2 (a) (b) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Net platform 1 Net platform 2 (c) Figure 2: (ree types of networking. (a) Synchronous orthogonal networking, (b) synchronous nonorthographic networking, and (c) asynchronous nonorthogonal networking. 800 Among them, ⎧ ⎪ 10< t≤ T, rect � (2) 0 others, where a(t) represents the complex envelope of the observed trombone timbre frequency signal. (e signal received in the real communication envi- 300 ronment is not only a single trombone timbre frequency signal but is usually mixed with other trombone timbre frequency signals, as well as interference signals, such as ﬁxed frequency signals, sweep frequency signals, and so on. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 (erefore, in the case of single-antenna reception, it is as- Time (s) sumed that N trombone timbre frequency signals are re- ceived from the air, and the interference is additive Figure 3: Schematic diagram of a single trombone timbre fre- interference, so the expression for single-antenna multi- quency signal. trombone timbre frequency reception is given as follows: Frequency (kHz) Frequency (kHz) Frequency (kHz) Frequency (kHz) Journal of Robotics 5 resolution in the frequency domain is higher, and the res- y(t) � s (t) + n(t). (3) olution in the time domain is lower; if it is a short window, n�1 the result is the opposite. (e two contradict each other, which is mainly constrained by the uncertainty principle. Its In formula (3), s (n � 1, 2, . . . , N) is the trombone expression is given as follows: timbre frequency signal, and n(t) is the sum of various disturbances and noises. Of course, in addition to single- (10) BT≥ . antenna reception, there is also array antenna reception, mainly using linear uniform linear array and uniform cir- Among them, B is the bandwidth, and T is the time cular array. (e purpose of applying multiple antennas is to width. (erefore, when analyzing the signal, it is necessary to use the signal arrival delay to estimate parameters such as the select the appropriate window function parameters incident direction angle, which is convenient for the later according to diﬀerent situations. Figure 4 shows the time- blind source separation problem. However, this article is frequency diagram of the STFT transformation of the mainly a single-antenna system, so it will not be discussed trombone timbre frequency signal processed by diﬀerent for now. windows, the sampling length of the signal is N, and the type of the window H is the hamming window. (e window length of Figure 4(a) is N/4 + 1, and the window length of 2.3. Time-Frequency Analysis Technology of Trombone Timbre Figure 4(b) is N/10 + 1. Frequency Signal. (e short-time Fourier transform (STFT) It can be seen that when the window length is N/4 + 1, is a classical linear transform proposed by Gabor. It mainly the frequency resolution of the trombone timbre frequency uses h(t) for windowing processing on the basis of Fourier signal is high, and when the window length is N/10 + 1, the transform. Moreover, each segment of the signal segmented time resolution is high. (erefore, when analyzing the by the window function is considered to be stable, and then trombone timbre frequency signal, the length of the window the window function is continuously shifted, and the Fourier depends on the situation. For fast trombone timbre fre- transform is performed on each segment of the signal, and quency signals, short window processing can be selected. ﬁnally, all the transforms are superimposed. Because the However, for slow trombone timbre frequency signals, long trombone timbre frequency signal s(t) is a nonstationary window processing can be selected. signal, it has a good eﬀect on STFT processing, and its For the case where multiple signals are superimposed, continuous-time expression is given as follows: s (t) and s (t) are added linearly, and the following formula 1 2 +∞ ∗ − i2πfτ is satisﬁed: STFT (4) (t, f) � s(τ)h (τ − t)e dτ. − ∞ STFT as (t) + bs (t) � aSTFT s (t) + bSTFT s (t) . 1 2 1 2 (e discrete expression is given as follows: (11) N− 1 − j(2π/N)kn STFT(m, n) � s(m + k)h(k)e , n � 0, 1, . . . , N − 1. It can be seen that STFT is a linear transformation. k�0 Common ones are Gabor transform and wavelet transform. (5) Gabor transform is proposed by Gabor in 1946 to represent time-frequency information in the form of a grid Among them, m is the number of sampling points in the in a two-dimensional plane. Compared with the time time dimension, n is the number of sampling points in the window of STFT, Gabor is a joint time-frequency window. frequency dimension, h(k) is the discrete window function, (e Gabor change can be seen as the Gauss window selected and STFT(n, m) represents the amplitude at the corre- by the STFT as the window function. sponding coordinate point. It can be seen from the abovementioned formula that STFT also has the following j2πfτ Gabor(t, f) � s(τ)g (τ − t)e dτ. (12) Guass properties: (1) Time shift property: When the Gaussian window function is used, the lower bound of the uncertainty theorem can be satisﬁed. (erefore, y(t) � s t − t , (6) the Gabor transform can be regarded as the optimal STFT transform. However, the shape of Gabor transform window STFT (t, f) � STFT t, f − f . (7) y s 0 function is uniform, unlike STFT, which can choose a variety of window functions to truncate the signal, and the time- (2) Frequency shift property: frequency window size is ﬁxed. As far as the actual signal j2πf t processing is concerned, the window length used for signals (8) y(t) � s(t)e , of diﬀerent frequencies should be variable. Wavelet change is a linear transformation with diﬀerent resolutions at diﬀerent j2πf STFT (t, f) � STFT t − t , f e . (9) y x 0 frequencies. Its expression is given as follows: 1 t − b For STFT, its time-frequency resolution is mainly af- √� � wavelet(a, b) � f(t)ψ dt. (13) a a fected by the width of the window. If it is a long window, the 6 Journal of Robotics 3.5 3.5 2.5 2.5 1.5 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) (a) (b) Figure 4: STFTs with diﬀerent window lengths. (a) H � N/4 + 1. (b) H � N/10 + 1. 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) (a) (b) Figure 5: Two commonly used linear transformations. (a) Gabor transfer and (b) wavelet transform. Among them, ψ(t) represents the mother wavelet, a cross-term. (e WVD transformation expression is given as represents the scale, which controls the expansion and follows: contraction of the wavelet function, and b represents the τ τ ∗ j2πfτ WVD(t, f) � st + s t − e dτ. (14) translation amount, which controls the movement on the − ∞ 2 2 time axis of the wavelet function. Figure 5 shows the analysis of trombone timbre frequency signal by Gabor transform It can be seen from Equation (14) that WVD does not and wavelet transform. (e Gabor transform uses a Gaussian need to select a window function similar to STFT transform window with a window width of N/(10 + 1), and the wavelet to intercept the signal, and s((t + τ)/2)s ((t − τ)/2) can be base used by the wavelet is a complex-valued Morlet wavelet. regarded as the autocorrelation function of the signal. It can be seen from Figure 5 that the Gabor transform is However, WVD is its Fourier transform with respect to τ, similar to STFT, that is, the resolution for each frequency is and the result obtained is a two-dimensional parameter of consistent. However, the wavelet transform has diﬀerent the time-frequency plane. resolutions for diﬀerent frequencies, indicating that the First, the WVD is processed by windowing in the time resolution of the wavelet transform is adaptive. It shows that domain similar to STFT, which can remove the inﬂuence of the wavelet transform has a good analysis eﬀect for the signal the cross term, and then obtain the PWVD, and its ex- of a single frequency, but for the multicomponent signal pression is given as follows: such as the trombone timbre frequency, the eﬀect of the +∞ τ τ wavelet transform is not very good. ∗ − j2πfτ PWVD (t, f) � s t − st + h(τ)e dτ. Compared with the linear time-frequency transforma- 2 2 − ∞ tion, it does not have linear superposition, so after the (15) transformation, there will be two results of self-term and Frequency (kHz) Frequency (kHz) Frequency (kHz) Frequency (kHz) Journal of Robotics 7 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) (a) (b) 3.5 2.5 1.5 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) and (c) Figure 6: Time-frequency diagram of WVD and its improvement. (a) WVD, (b) PWVD, and (c) SPWVD. (e h (trombone timbre) in the formula is the added In order to achieve the purpose of completely elimi- time window function, and its main function is to smooth nating the cross term, based on the PWVD, a windowing the s(t) in the time domain. PWVD reduces the impact of operation is also performed in the frequency domain, which most cross-interference on the signal by sacriﬁcing time- is SPWVD, and its expression is given as follows: frequency focus. +∞ +∞ τ τ ∗ − j2πfτ (16) SPWVD(t, f) � st − v + s t − v − h(t)g(v)e dvdτ. 2 2 − ∞ − ∞ In the formula, h (trombone timbre) represents the the self-term and cross-term interference become indis- window function in the time domain, which mainly per- tinguishable, and even some cross-term energy is completely forms ﬁltering in the time domain. g (trombone timbre) larger than the signal self-term energy. However, its time- represents the window function in the frequency domain, frequency focus is indeed the best. Figure 6(b) shows the and its function is to ﬁlter in the frequency domain. PWVD windowed in the time domain. It can be seen that it Compared with PWVD, because the window function is has a certain inhibitory eﬀect on the cross term but does not added in the frequency domain, the suppression eﬀect of the completely eliminate the interference of the cross term, and cross term is better. However, secondary windowing makes the addition of the window reduces the time-frequency focus SPWVD less time-frequency focused. Figure 6 is a time- of the PWVD. Figure 6(c) shows the SPWVD with windows frequency analysis diagram of WVD, PWVD, and SPWVD. in both the time domain and the frequency domain. It can be As can be seen from Figure 6(a), after the trombone seen that the interference of the cross term has been timbre frequency signal undergoes WVD transformation, completely suppressed. However, the time-frequency focus Frequency (kHz) Frequency (kHz) Frequency (kHz) 8 Journal of Robotics 2 is also reduced, so the windowing operation is to suppress SPEC (t, f) � aSTF (t, f) + bSTFT (t, f) s s s 1 2 the cross-term problem between the signals at the expense of 2 2 2 � a STFT (t, f) + bSTFT (t, f) s s reducing the time-frequency focus. 1 2 (e reason why the cross-term is generated, when WVD + 2abSTFT (t, f)STFT (t, f)cosϕ − ϕ . s s s (t,f) s (t,f) 1 2 1 2 analyzes multiple signals, can be demonstrated by mathe- (19) matical formulas. For signal s � s + s , there is Among them, ϕ � arg(STFT (t, f)), ϕ � 1 2 s (t,f) s (t,f) 1 1 2 +∞ arg(STF (t, f)). It can be seen that the spectrum does not τ τ s ∗ − j2πfτ 2 WVD � st + s t − e dτ s satisfy the linear superposition, and there is a phase cross − ∞ 2 2 term of cos(ϕ − ϕ ). However, if the two signals do s (t,f) s (t,f) 1 2 not overlap in the time and frequency domains, the fol- +∞ τ τ − j2πfτ lowing expressions are satisﬁed: � s + s t + s + s t − e dτ 1 2 1 2 2 2 − ∞ 2 SPEC (t, f) � aSTF (t, f) + bSTFT (t, f) s s s 1 2 +∞ (20) τ τ ∗ − j2πfτ 2 2 2 2 � s t + s t − e 1 � a STFT (t, f) + b STFT (t, f) . s s 2 2 − ∞ 1 2 Figure 7 is the spectrogram analysis of the trombone +∞ τ τ ∗ − j2πfτ timbre frequency signal, in which the window function is + s t + s t − e dτ 2 2 2 2 − ∞ selected as the hammering window, and the length of the window is N/(10 + 1). +∞ τ τ ∗ − j2πfτ It can be seen from Figure 7 that there is no cross-term + s t + s t − e dτ 1 2 2 2 − ∞ interference when the spectrogram analyzes a single trombone timbre frequency signal because the frequencies of +∞ τ τ the trombone timbre frequency signal in each time period do ∗ − j2πfτ + s t + s t − e dτ not coincide. (e time-frequency focus of the spectrogram is − ∞ 2 2 also very high, and its computational complexity is smaller than that of SPWVD, so it is widely used in engineering � WVD + WVD + 2ReWVD . s s s ,s 1 2 1 2 practice. Of course, there are also analysis methods such as (17) fourth-order spectrogram and eighth-order spectrogram, which will not be repeated here. Among them, 2Re[WVD ] is the introduced cross- s s 1 2 (e received signal is processed by STFT, and then the interference term. Figure 6(a) can clearly see the cross-term time-frequency analysis diagram of the multitrombone between signals, while PWVD and SPWVD only ﬁlter out timbre frequency signal can be obtained. (en, according to the cross-term part through the principle of time domain the signal-to-noise ratio, the corresponding cut-oﬀ value is and frequency domain ﬁltering. (erefore, the trombone set, and then the elements in the STFT matrix are com- timbre frequency signal is generally not analyzed by WVD. paratively cut oﬀ. (erefore, by truncating STFT (t, f), In addition to the abovementioned secondary time- STFT (t, f) is obtained, and its expression is given as frequency analysis, spectrogram (SP) is also an important follows: secondary time-frequency analysis method. (e spectro- gram is mainly obtained by the square of the STFT mode, ⎧ ⎨ STFT (t, f), if STFT (t, f)≥ ε, s s and its expression is given as follows: STFT (t, f) � (21) s 0, if STFT (t, f) < ε. SPEC(t, f) � |STFT(t, f)| In the formula, ε is the truncation threshold. If ∗ − j2πfτ � s(τ)h (τ − t)e dτ ′ ′ STFT (t, f) is discretized, STFT (n, m) can be obtained, s s where n is the sampling point on the time axis, the total (18) ∗ − j2πfτ number of sampling points is N, m is the sampling point on s (τ)h(τ − t)e dτ the frequency axis, and the total number of sampling points is M. (erefore, the threshold ε is deﬁned as follows: � BW(τ, v)W (τ − t, v − f)dτdv. In the formula, the (τ, v) operator represents the WVD ε � η × mean STFT (n, m) � η STFT (n, m). transformation, so the spectrogram can be regarded as the NM m�1 n�1 two-dimensional convolution of the WVD of the signal and (22) the WVD of the window function. (e spectral operation is simple, so there are many practical applications. For mul- In the formula, η is the threshold factor. tiple signals, such as s � s + s , phase information is pro- (e time-frequency analysis result STFT (t, f) after 1 2 duced because the STFT result is a complex number. Its threshold truncation processing and WVD (t, f) with good spectral expression is given as follows: time-frequency focusing are processed by dot product, Journal of Robotics 9 3.5 2.5 1.5 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Figure 7: Spectrogram time-frequency analysis diagram. namely, Hadamard product and the combined time-fre- (2) (e algorithm ﬁnds the global maximum value quency distribution can be obtained as follows: TF (t, f) and the global minimum value max TF (t, f) in TF (t, f), and sets the step size of the min x TF (t, f) � STFT (t, f)⊙ WVD (t, f). (23) s s s amplitude interval as step � trombone timbre/N and N represents the number of intervals to be divided. Furthermore, good performance can be obtained by estimating the parameters of the combined time-frequency TF (t, f), TF (t, f) + step , min min analysis signal. Taking the combination of STFT and WVD TF (t, f) + step, TF (t, f) + 2 step, min min as an example, another combined time-frequency analysis is (24) processed in the same way. . . . Figure 8 shows the time-frequency analysis of three TF (t, f) + (N − 1)step, TF (t, f). min max common combinations of STFT-WVD, STFT-PWVD, and STFT-SPWVD and the time-frequency analysis diagram of (e algorithm counts the amplitude values falling in STFT in the same noise environment. (e time-frequency these N intervals and obtains the frequency vector of matrix transformed by the STFT is intercepted with a ﬁxed amplitude values as M � [m , m , . . . , m ], and then 1 2 N threshold to make the time-frequency matrix show a dis- divides it by the total number of elements in the tribution state of zero and nonzero values, so as to achieve time-frequency matrix to obtain N groups of the purpose of artiﬁcially weakening the noise. probabilities P � [p , p , . . . , p ]. 1 2 N It can be seen from the abovementioned ﬁgure that the (3) (e algorithm calculates the information entropy of time-frequency diagram of the combined time-frequency the time-frequency analysis graph with the following can suppress a certain amount of noise and is clearer and formula: more robust than the single time-frequency analysis. (erefore, in the time-frequency analysis of multitrombone timbre frequencies, a combined time-frequency analysis may S � − K p ln p . (25) n n be an optimal solution. It can be observed that STFT-WVD n�1 and STFT-PWVD have little noise interference at multiple (rough the obtained entropy value, the quality of trombone timbre frequencies, so STFT-SPWVD time-fre- various time-frequency analysis methods can be quency analysis is used in postprocessing signals. analyzed. (e larger the entropy value is, the worse (e quality of the time-frequency analysis method is the time-frequency focusing is, and the worse the mainly measured from the time-frequency focusing and the eﬀect of suppressing cross-interference is. On the cross-interference term. However, the level of time-frequency contrary, the smaller the entropy value is, the better focus and the presence or absence of cross-terms can only be the time-frequency focusing is, and the better the distinguished by the human eye, and a performance index eﬀect of suppressing cross-interference is. based on information entropy to measure its time-frequency map is proposed. (e speciﬁc steps are given as follows: 3. Trombone Timbre Feature Extraction (1) (e algorithm uses the time-frequency analysis method to obtain the time-frequency analysis dia- based on CNN Model gram for the signal, and then performs modulo (e general framework of trombone timbre feature recog- processing on the elements in the obtained time- nition is shown in Figure 9. In the preprocessing stage, the frequency matrix, and the modulo time-frequency musical tone samples are deaveraged and normalized to matrix is expressed as TF (t, f). Frequency (kHz) 10 Journal of Robotics 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) (a) (b) 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time (s) Time (s) (c) (d) Figure 8: Time-frequency distribution of STFT and its combination. (a) STFT, (b) STFT-WVD, (c) STFT-PWVD, and (d) STFT-SPWVD. Taining Classifier prepared for Pre-processing Feature processing Taining set the training Feature selection and dimension De-mean reduction Identification results analysis Trained Feature Testing set Normalization classifier extraction Testing Figure 9: General framework of trombone timbre feature recognition. avoid ﬂuctuations in the mean and amplitude from aﬀecting Figure 10(b) are the loudness diagrams and the normalized the stability of the ﬁnal model. In the feature processing energy changes and note onset diagrams of the trombone stage, timbre features, such as MFCC, are extracted. playing segment. In this paper, the feature recognition of the music On the basis of the abovementioned research, the performed by the trombone is carried out. Figure 10(a) and trombone timbre feature extraction method based on the Frequency (kHz) Frequency (kHz) Frequency (kHz) Frequency (kHz) Journal of Robotics 11 0.20 1.0 0.15 0.8 0.10 0.05 0.6 0.00 0.4 -0.05 -0.10 0.2 -0.15 -0.20 0.0 01 1.5 3 4.5 6 7.5 9 102 0 2 4 6 8 10 12 Time (s) Time (s) (a) (b) Figure 10: Recognition of musical features of trombone performance. (a) Performance section loudness and (b) schematic diagram of energy change and note onset. Table 1: Accuracy of trombone timbre feature extraction based on It can be seen from the abovementioned research results the CNN model. that the trombone timbre feature extraction method based on the CNN model proposed in this paper can eﬀectively Num Accuracy (%) identify the trombone timbre in a variety of musical 1 96.596 performances. 2 96.894 3 94.641 4. Conclusion 4 92.261 5 95.843 (e trombone, also known as the trombone and the tele- 6 95.197 scopic horn, belongs to the brass playing instrument. It is the 7 94.792 only musical instrument that has not been greatly improved 8 96.391 9 95.713 in shape and structure since its origin. (e trombone 10 92.192 originated in BC and was ﬁrst used in church and opera 11 96.639 performances. In the nineteenth century, the trombone 12 95.870 entered the symphony camp. Because of its unique timbre, it 13 93.771 was mainly used in the performance of military bands, 14 96.725 showing an impassioned, majestic, and powerful momen- 15 93.189 tum. In addition, the trombone is also used in jazz per- 16 93.969 formance, and the trombone is also known as the “king of 17 93.313 jazz.” In this paper, the trombone timbre feature extraction 18 93.680 model is constructed by combining the CNN model to 19 94.964 20 93.144 improve the training and learning eﬀect of the trombone. 21 92.075 (e simulation results show that the trombone timbre 22 94.132 feature extraction method based on the CNN model pro- 23 95.261 posed in this paper can eﬀectively identify the trombone 24 93.835 timbre in various musical performances. 25 95.069 26 95.382 Data Availability 27 96.601 28 93.989 (e labeled dataset used to support the ﬁndings of this study 29 93.865 is available from the author upon request. 30 95.699 31 94.760 Conflicts of Interest 32 93.008 (e author declares no conﬂicts of interest. CNN model proposed in this paper is performed eﬀectively, Acknowledgments and the accuracy of the trombone timbre extraction is calculated, and the results shown in Table 1 are obtained. (is work was supported by the Shandong College of Arts. amptitude Rate of energy change 12 Journal of Robotics References [1] X. Serra, “(e computational study of a musical culture through its digital traces,” Acta Musicologica, vol. 89, no. 1, pp. 24–44, 2017. [2] I. B. Gorbunova and N. N. Petrova, “Digital sets of instru- ments in the system of contemporary artistic education in music: socio-cultural aspect,” Journal of Critical Reviews, vol. 7, no. 19, pp. 982–989, 2020. [3] E. Partesotti, A. Peñalba, and J. Manzolli, “Digital instruments and their uses in music therapy,” Nordic Journal of Music 4erapy, vol. 27, no. 5, pp. 399–418, 2018. [4] B. Babich, “Musical “covers” and the culture industry: from antiquity to the age of digital reproducibility,” Research in Phenomenology, vol. 48, no. 3, pp. 385–407, 2018. [5] L. L. Gonçalves and F. L. Schiavoni, “Creating digital musical instruments with libmosaic-sound and mosaicode,” Revista de Informatica ´ Teorica ´ e Aplicada, vol. 27, no. 4, pp. 95–107, [6] I. B. Gorbunova, “Music computer technologies in the per- spective of digital humanities, arts, and researches,” Opcion ´ , vol. 35, no. 24, pp. 360–375, 2019. [7] A. Dickens, C. Greenhalgh, and B. Koleva, “Facilitating ac- cessibility in performance: participatory design for digital musical instruments,” Journal of the Audio Engineering So- ciety, vol. 66, no. 4, pp. 211–219, 2018. [8] O. Y. Vereshchahina-Biliavska, O. V. Cherkashyna, Y. O. Moskvichova, O. M. Yakymchuk, and O. V. Lys, “Anthropological view on the history of musical art,” Lin- guistics and Culture Review, vol. 5, no. S2, pp. 108–120, 2021. [9] A. C. Tabuena, “Chord-interval, direct-familiarization, mu- sical instrument digital interface, circle of ﬁfths, and functions as basic piano accompaniment transposition techniques,” International Journal of Research Publications, vol. 66, no. 1, pp. 1–11, 2020. [10] L. Turchet and M. Barthet, “An ubiquitous smart guitar system for collaborative musical practice,” Journal of New Music Research, vol. 48, no. 4, pp. 352–365, 2019. [11] R. Khulusi, J. Kusnick, C. Meinecke, C. Gillmann, J. Focht, and S. Janicke, ¨ “A survey on visualizations for musical data,” Computer Graphics Forum, vol. 39, no. 6, pp. 82–110, 2020. [12] E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F. R. Stoter, ¨ “Musical source separation: an introduction,” IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 31–40, [13] T. Magnusson, “(e migration of musical instruments: on the socio-technological conditions of musical evolution,” Journal of New Music Research, vol. 50, no. 2, pp. 175–183, 2021. [14] I. B. Gorbunova and N. N. Petrova, “Music computer tech- nologies, supply chain strategy and transformation processes in socio-cultural paradigm of performing art: using digital button accordion,” International Journal of Supply Chain Management, vol. 8, no. 6, pp. 436–445, 2019. [15] J. A. Anaya Amarillas, “David Andres ´ Mart´ın marketing musical: musica, ´ digital knowledge sharing in the age of music, industry and promotion BY-NC-ND 3.O, eBook,” Inter Disciplina, vol. 9, no. 25, pp. 333–335, 2021. [16] G. Scavone and J. O. Smith, “A landmark article on nonlinear time-domain modeling in musical acoustics,” Journal of the Acoustical Society of America, vol. 150, no. 2, pp. R3–R4, 2021. [17] L. Turchet, T. West, and M. M. Wanderley, “Touching the audience: musical haptic wearables for augmented and par- ticipatory live music performances,” Personal and Ubiquitous Computing, vol. 25, no. 4, pp. 749–769, 2021.

### Journal

Journal of Robotics
– Hindawi Publishing Corporation

**Published: ** Sep 23, 2022