How to Speech analysis?

receiving one or more person images of showing at least one face,
receiving a message to be enacted by the person, wherein the message comprises at least a text or a emotional and movement command,
processing the message to extract or receive an audio data related to voice of the person, and a facial movement data related to expression to be carried on face of the person,
processing the image/s, the audio data, and the facial movement data, and generating an animation of the person enacting the message.

Wherein emotional and movement command is a GUI or multimedia based instruction to invoke the generation of facial expression/s and or body part's movement.

Producing realistic talking Face with Expression using Images text and voice

View all

Owner:VATS NITIN

Payment authentication method, device thereof and system thereof

InactiveCN103679452ASpeech analysisIndividual entry/exit registersText messagingOperational costs

The invention discloses a payment authentication method, a device thereof and a system thereof, belonging to the computer technical field. The method comprises the following steps of receiving a payment authentication request sent by a terminal, detecting whether the identification information in the payment authentication request is same with the prestored identification information, extracting a current voice characteristic if the identification information in the payment authentication request is same with the prestored identification information, matching the current voice characteristic with a prestored voiceprint model, and sending authentication reply information for allowing a payment operation to the terminal if the current voice characteristic is successfully matched with the prestored voiceprint model. According to the payment authentication method, the device thereof and the system thereof, a current voice signal is confirmed through the voiceprint model, after the confirmation is successful, the subsequent payment operation is allowed, a problem that a server needs to send an authentication message and the operation cost is increased in the payment operation process in the prior art is solved, an effect that the payment security can be greatly raised by only using the voiceprint identification of the voice signal is achieved, and the operation cost brought by message authentication is greatly reduced.

Payment authentication method, device thereof and system thereof

View all

Owner:TENCENT TECH (SHENZHEN) CO LTD

Hybrid speech coding and system

ActiveUS7139700B1Improve performanceTimely maintenanceSpeech analysisWaveform codingSpeech code

Linear predictive speech coding system with classification of frames and a hybrid coder using both waveform coding and parametric coding for different classes of frames. Phase alignment for a parametric coder aligns synthesized speech frames with adjacent waveform coder synthesized frames. Zero phase alignment of speech prior to waveform coding aligns synthesized speech frames of a waveform coder with frames synthesized with a parametric coder. Inter-frame interpolation of LP coefficients suppresses artifacts in resultant synthesized speech frames.

View all

Owner:TEXAS INSTR INC

Device and Method for Graduated Encoding of a Multichannel Audio Signal Based on a Principal Component Analysis

ActiveUS20090083045A1Quality improvementCode conversionSpeech analysisAudio signalAudio frequency

A system and a method for the scalable coding of a multi-channel audio signal comprising a principal component analysis (PCA) transformation of at least two channels (L, R) of the audio signal into a principal component (CP) and at least one residual sub-component (r) by rotation defined by a transformation parameter (θ), comprising the following steps: formation of a frequency subband-based residual structure (Sf_r) on the basis of the at least one residual sub-component (r), and definition of a coded audio signal (SC) comprising the principal component (CP), at least one residual structure (Sf_r) of a frequency subband and the transformation parameter (θ).

Device and Method for Graduated Encoding of a Multichannel Audio Signal Based on a Principal Component Analysis

View all

Owner:FRANCE TELECOM SA

Audio signal adjustment device and audio signal adjustment method

InactiveUS20110255712A1High control precisionGain controlSpeech analysisShort termsComputer science

To perform volume control matching the actual auditory sensation level of a human so as to improve the control accuracy. As shown in FIG. 1, an audio gain adjustment device 10 includes: an audio signal input section 12 that acquires an audio signal to be adjusted; a long-term gain reflection section 14 that reflects the adjustment of long-term gain in the acquired audio signal; a frequency division section 20 that divides the audio signal from the long-term gain reflection section 14 into three frequency bands; a short-term gain reflection section 30 that reflects the adjustment of short-term gain in each frequency band; a synthesis section 40 that synthesizes audio signals output from the short-term gain reflection section 30; and an audio signal output section 42 that outputs the synthesized audio signal.

Audio signal adjustment device and audio signal adjustment method

View all

Owner:SHARP KK

Method For Communicating and Displaying Interactive Avatar

InactiveUS20120058747A1Speech analysisSpecial service for subscribersMultimedia

A method for communication and for displaying an interactive avatar or hologram corresponding to a remote party.

View all

Owner:YIANNIOS JAMES +1

Program endpoint time detection apparatus and method, and program information retrieval system

ActiveUS20110106531A1Rapidly and easily findExtraction of informationMetadata audio data retrievalSpeech analysisAdversarial information retrievalAudio signal

This invention relates to retrieval for multimedia content, and provides a program endpoint time detection apparatus for detecting an endpoint time of a program by performing processing on audio signals of said program, comprising an audio classification unit for classifying said audio signals into a speech signal portion and a non-speech signal portion; a keyword retrieval unit for retrieving, as a candidate endpoint keyword, an endpoint keyword indicating start or end of the program from said speech signal portion; a content analysis unit for performing content analysis on context of the candidate endpoint keyword retrieved by the keyword retrieval unit to determine whether the candidate endpoint keyword is a valid endpoint keyword; and a program endpoint time determination unit for performing statistics analysis based on the retrieval result of said keyword retrieval unit and the determination result of said content analysis unit, and determining the endpoint time of the program. In addition, this invention also provides a program information retrieval system. With present invention, program information regarding a program attended by user can be rapidly obtained.

Program endpoint time detection apparatus and method, and program information retrieval system

View all

Owner:SONY CORP +1

Sound source recording apparatus and method adaptable to operating environment

ActiveUS20110103617A1Increase sound source recognition capabilityGain controlElectronic editing digitised analogue information signalsSound sourcesEngineering

Disclosed herein is a sound source recording apparatus and method adaptable to an operating environment, which can record a target sound source at a predetermined level without being affected by characteristics of the sound source or ambient noise. A target sound source is separated from a sound source signal received through an array of microphones and a recording sound pressure level and a gain are estimated using a reference sound pressure level and a reference distance for the target sound source, thereby controlling or adjusting the gain of the microphones.

Owner:SAMSUNG ELECTRONICS CO LTD

Method and an apparatus for processing an audio signal

ActiveUS20100017002A1Speech analysisStereophonic systemsMetadataAudio signal

An apparatus for processing an audio signal and method thereof are disclosed. The method comprises receiving a downmix signal, object information indicating attribute of the object and including object number information, preset information to render the downmix signal, external preset information being inputted from external, and applied object number information indicating the number of object being applied the external preset information; determining whether the applied object number information is identical to the object number information; and rendering the downmix signal by using the external preset information, if the applied object number information is identical to the object number information, wherein the external preset rendering parameter renders the object being included in the downmix signal and the external preset metadata indicates attribute of the external preset rendering parameter.

Accordingly, an audio signal can be efficiently reconstructed by individually selecting and applying external preset information by a data region unit or by selecting and applying the same external preset information to a whole downmix signal.

Method and an apparatus for processing an audio signal

View all

Owner:LG ELECTRONICS INC

Transition from a transform coding/decoding to a predictive coding/decoding

ActiveUS20160293173A1Reduce complexityEasy to implementCode conversionSpeech analysisLinear predictive codingDigital audio signals

Methods and apparatus are provided for coding and decoding a digital audio signal. Decoding includes: decoding according to an inverse transform decoding of a previous frame of samples of the digital signal, which is received and coded according to a transform coding; and decoding according to a predictive decoding of a current frame of samples of the digital signal, which is received and coded according to a predictive coding. The predictive decoding of the current frame is a transition predictive decoding which does not use any adaptive dictionary arising from the previous frame. At least one state of the predictive decoding is reinitialized to a predetermined default value, and an add-overlap step combines a signal segment synthesized by predictive decoding of the current frame and a signal segment synthesized by inverse transform decoding, corresponding to a stored segment of the decoding of the previous frame.

Transition from a transform coding/decoding to a predictive coding/decoding

View all

Owner:ORANGE SA (FR)

Signal analyzing method, signal synthesizing method of complex exponential modulation filter bank, program thereof and recording medium thereof

InactiveUS20050108002A1Digital technique networkSpeech analysisFilter bankSignal synthesis

Provided is a complex exponential modulation filter bank which can reduce quantity of arithmetic operation, and can realize low electric power consumption or speeding-up. This complex exponential modulation filter bank has a step of calculating a first intermediate signal from an input signal, a step of calculating a second intermediate signal from the first intermediate signal, a step of calculating a third intermediate signal from the second intermediate signal with fast Fourier transform, and a step of calculating a complex band output signal from the third intermediate signal.

Signal analyzing method, signal synthesizing method of complex exponential modulation filter bank, program thereof and recording medium thereof

View all

Owner:GK BRIDGE 1

Method, device and equipment for recording synchronization

InactiveCN103269374AProtection securityImprove stabilitySpeech analysisTransmissionData streamTerminal equipment

The invention discloses a method for recording synchronization, a device for the recording synchronization and equipment for the recording synchronization and belongs to the field of terminal equipment. The method comprises the steps that an audio data stream is obtained in the process of recording; the obtained audio data stream is coded according to a preset coded format and data generated by coding are written in an audio file; the data are read from the audio file which involves in the process of writing; the data which are read each time are uploaded to a server, so that the synchronization is conducted by the server according to the received data. The device comprises an audio data stream obtaining module, a writing module, a data reading module and an uploading module. Real-time recording synchronization is achieved, the effects that storage can be conducted at the moment of recording and uploading can be conducted at the moment of the storage are achieved, recording content can be recovered to the maximum extent according to the conducted synchronization on the server when the recording is not finished and interrupted by accident or deleted by accident, safety of the audio file is protected, the stability of the audio file is improved and the purpose of recovering the data is achieved.

Method, device and equipment for recording synchronization

View all

Owner:XIAOMI INC

Emotion recognition method, device and computer readable storage medium

ActiveCN109859772ASpeech analysisNeural architecturesIdentification rateSpeech spectrum

The scheme relates to artificial intelligence, and provides an emotion recognition method, a device and a computer readable storage medium, andthe method comprises the following steps of: receiving audio data, generating a voice spectrum graph aiming at the audio data, inputting the voice spectrum graph to a first recognition module, inputting the audio data to a second recognition module, whereinthe first recognition module extracts a first feature vector by adopting a DCNN network and an RNN network which are sequentially connected the second recognition module extracts a MFCC from audio data, andthe MFCC is transformed into a second feature vector by s non-linear transformation;and the first and second feature vectors output from the first and second recognition modules are connected to form a combined feature and be input into a full connection layer and a soft Max layer in turn for emotional recognition.. By means of the scheme, the combination of the DCNN and the RNN is beneficial for simulating subtle local emotion clues, the MFCC describes voice characteristics in a short time, more emotion related features are captured through two features extraction modes, and the emotion recognition rate is improved.

Emotion recognition method, device and computer readable storage medium

View all

Owner:PING AN TECH (SHENZHEN) CO LTD

Hearing aid and method of detecting vibration

ActiveUS20130156208A1Signal processingSpeech analysisNoiseHearing aid

The present invention provides a hearing aid capable of detecting contact vibration noise from a collected sound signal. A hearing aid (100) is provided with two microphones (110-1, 110-2), a vibration component extracting unit (120) which extracts from collected sound signals respectively obtained by the two microphones (110-1, 110-2) an uncorrelated component between two collected sound signals as a vibration component for each frequency band, a vibration noise identifying unit (130) which determines whether or not a contact noise occurs based on the vibration component for each frequency band extracted by the vibration component extracting unit (120), an acoustic signal processing unit (140) which, when generating an acoustic signal by hearing aid processing of the two collected sound signals, processes the acoustic signal depending on the presence or absence of the occurrence of the contact vibration noise, and a receiver (150) which converts the acoustic signal to sound.

Hearing aid and method of detecting vibration

View all

Owner:PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO LTD

Video data fraud detection method and device, computer equipment and storage medium

PendingCN110781916AImprove accuracyIncrease diversitySpeech analysisAcquiring/recognising facial featuresFeature vectorData set

The invention relates to a fraud detection method and device for video data, computer equipment and a storage medium. The method comprises the following steps: acquiring to-be-detected video data; extracting image data of each video frame from the to-be-detected video data, and dividing the image data into a plurality of image data sets according to the time sequence of each video frame, the imagedata sets which comprises image data corresponding to continuous video frames; inputting each image data set into a pre-trained image feature extraction model to obtain an image feature vector; extracting voice data from the to-be-detected video data, and obtaining a voice feature vector of the voice data; performing cascade splicing on the image feature vector and the voice feature vector to obtain a multi-modal feature vector; and inputting the multi-modal feature vector into a pre-trained fraud detection model to obtain a fraud detection result corresponding to the to-be-detected video data output by the fraud detection model. By adopting the method, the characteristic information amount can be increased, the comprehensiveness and diversity of the characteristic information are improved, and the accuracy of video data fraud detection is effectively improved.

Video data fraud detection method and device, computer equipment and storage medium

View all

Owner:PING AN TECH (SHENZHEN) CO LTD

Microphone array-oriented channel attention weighted speech enhancement method

PendingCN112151059AEfficient integrationReduce the amount of parametersSpeech analysisTime domainFrequency spectrum

View all

Owner:NANJING INST OF TECH

Method and device for reducing echoes, and communication equipment

ActiveCN105810202ATwo-way loud-speaking telephone systemsSpeech analysisAdaptive filterVIT signals

The invention discloses a method and device for reducing echoes, and communication equipment. The communication equipment is provided with at least two microphones, wherein the distance between the first microphone and a loudspeaker is greater than the distance between the second microphone and the loudspeaker. The method comprises the steps: carrying out the filtering of a first echo signal s1(t) and/or a second echo signal s2(t); obtaining a target signal d(t) corresponding to the first echo signal s1(t), and a reference signal r(t) corresponding to the second echo signal s2(t), wherein the first echo signal s1(t) is an echo signal generated when the loudspeaker plays a downlink signal x(t) picked by the first microphone, and the second echo signal s2(t) is an echo signal generated when the loudspeaker plays the downlink signal x(t) picked by the second microphone; carrying out the filtering of the reference signal r(t) through employing an adaptive filter, and obtaining a filtering signal y(t); solving the difference between the filtering signal y(t) and the target signal d(t), and obtaining and outputting a residual signal e(t). The method makes the most of the signal picked by the microphone closer to the loudspeaker. Compared with the prior art, the method is better in echo inhibition effect.

View all

Owner:芯鑫融资租赁(天津)有限责任公司

Audio frequency bit distribution and quantitative method and audio frequency coding device

InactiveCN101101755AReduce computationReduce the number of cyclesSpeech analysisComputer scienceAudio frequency

This method includes steps of: A, according to the parameter bit distributing and quantizing (BDQ) the last frame signal and also meeting the coding requirement, obtaining the initial value (IV) of the parameter for BDQ the current frame signal (CFS); B, using IV to BDQ repeatedly CFS until finding the parameter meeting the requirement of coding the current frame; BDQ the found parameter to obtain the audio specimen output. Besides, this invention also discloses a corresponding audio coding device. This method and device well uses the energy and frequency component relativity between two neighboring frame signals, therefore reduces the circular number and the calculating work.

Audio frequency bit distribution and quantitative method and audio frequency coding device

View all

Owner:VIMICRO CORP

Method and device for calling

InactiveCN105448300AAvoid additional operations such as deselectionSave computing resourcesSpeech analysisPersonalizationVoice transformation

The invention relates to a method and a device for calling. The method comprises the steps of acquiring a first voice signal of the local party when talking to the other party, transforming the first voice signal by use of a preset voice model to get a second voice signal, and transmitting the second voice signal to the other party. As the voice signal transmitted in the call process is the voice signal after voice transformation, the other party gets the voice after voice transformation. Therefore, the call effect desired by users is achieved, the personalized need of the caller for call voice is satisfied, and the user experience is enhanced.

Owner:XIAOMI INC

Echo cancellation method based on convex combination for M-estimaion proportional affine projection

InactiveCN109102794AFast convergenceSmall steady state errorSpeech analysisSound producing devicesSignal cancellationAffine projection

The invention relates to an echo cancellation method based on convex combination for M-estimation proportional affine projection. The method comprises steps of A of far-end signal sampling; B of convex combination, a large-step filter value y1(n) and a small-step filter value y2(n) at the current time n are subjected to convex combination through a large-step filter weight lambda(n) of the currenttime n to obtain a combination filter value y(n) of the current time n; C of echo signal elimination, an echoed near-end signal d(n) picked up by a near-end microphone is subtracted with an output value y(n) of an adaptive filter and then returned to a far end, a return signal is a residual signal e(n), and e(n)=d(n)-y(n); D of updating a filter tap weight vector, the method of M-estimation proportional affine projection based on convex combination is utilized to calculate update of an adaptive filter tap weight vector; E of updating a filter weight; F of limiting the filter weight; G, let n=n+1, and the steps A, B, C, D, E, F, G are repeated till the end of iteration.

Owner:SOUTHWEST JIAOTONG UNIV

Electro-acoustic adaption in a hearing prosthesis

ActiveUS20170359661A1ElectrotherapySignal processingElectro stimulationProsthesis

Presented herein are techniques for dynamically setting, in real-time, a ratio of acoustical stimulation signals to electrical stimulation signals delivered by a hearing prosthesis. The ratio of the acoustical stimulation signals to the electrical stimulation signals is set based on one or more characteristics or attributes of the input sound signals that are received and processed by the hearing prosthesis in order to generate the acoustical and electrical stimulation signals.

Electro-acoustic adaption in a hearing prosthesis

View all

Owner:COCHLEAR LIMITED

Anti-tonal modification interference audio fingerprint extraction method

ActiveCN110767248AImprove the immunityOvercome the deficiency of being unable to resist the interference of pitch shiftingSpeech analysisSpecial data processing applicationsNoiseEngineering

The invention discloses an anti-tonal modification interference audio fingerprint extraction method. The method comprises the following steps of framing and windowing an audio signal, performing Fourier transform, and collecting Fourier coefficients corresponding to all frames of signals; calculating energy segment indexes corresponding to all the frames of signals by using an energy segment indexcalculation method based on a peak point in the collected Fourier coefficients; and calculating energy segment values by using the energy segment indexes and filtering energy segments by a two-dimensional filtering core to extract fingerprints. By means of the method, the defect that an existing Philips fingerprint cannot resist tonal modification interference can be overcome; and on the premiseof keeping the sub-fingerprint continuity of the original Philips fingerprint and the robustness of various noise interferences, the resistance to the tonal modification interference is improved.

View all

Owner:TAIYUAN UNIV OF TECH

Method for estimating noise in an audio signal, noise estimator, audio encoder, audio decoder, and system for transmitting audio signals

ActiveUS20170133031A1Efficient processingDemand on data precision is notSpeech analysisNoise levelNoise estimation

A method is described that estimates noise in an audio signal. An energy value for the audio signal is estimated and converted into the logarithmic domain. A noise level for the audio signal is estimated based on the converted energy value.

Method for estimating noise in an audio signal, noise estimator, audio encoder, audio decoder, and system for transmitting audio signals

View all

Owner:FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG EV

Method and system for multimedia data recognition, and method for multimedia customization which uses the method for multimedia data recognition

InactiveUS20100324707A1Digital data information retrievalDigital data processing detailsRecognition system

System and method for multimedia data recognition and method for multimedia customization which uses the method for multimedia data recognition are disclosed. Wherein the system includes a data capturing unit, a data recognition unit, and a waveform feature database. In which, the data capturing unit is for capturing a set of multimedia data to be recognized. The data recognition unit has a sound waveform conversion unit, a waveform feature capturing unit, and a waveform feature comparison unit, which are respectively used for converting sound data into waveform data, capturing waveform feature from waveform data, and comparing the captured waveform feature with at least a known waveform feature. By analyzing the sound data of the multimedia data, the multimedia data can be recognized.

Owner:IPEER MULTIMEDIA INT

Objective examination method of flat-tongue sound and cacuminal in standard Chinese

InactiveCN101546553AGood distinctionSpeech analysisSpeech soundExamination method

The invention discloses an objective examination method of flat-tongue sound and cacuminal in standard Chinese, including the steps: receiving input voice; syncopating the input voice; distilling distinguishing characteristics; giving a mark according to an evaluating model and obtaining a pronunciation score. By applying the objective examination method and adopting the distinguishing characteristics which can better reflect the pronunciation essential to distinguish the flat-tongue sound and the cacuminal, the better distinguishing performance can be obtained.

Owner:INST OF ACOUSTICS CHINESE ACAD OF SCI +1

Feeling-based multimedia processing

InactiveCN105335595ASpeech analysisSpecial data processing applicationsComputer networkComputer program

The embodiment of the invention relates to feeling-based multimedia processing. The invention discloses a method used for processing multimedia data. The method comprises the following steps: on the basis of multiple clusters, automatically determining user feelings in regard to the fragments of the multimedia data, wherein the multiple clusters and pre-defined user feelings are obtained in an associated way; and processing the fragments of the multimedia data at least partially on the basis of the determined user feelings in regard to the fragments. The invention also discloses a corresponding system and a computer program product.

View all

Owner:DOLBY LAB LICENSING CORP

Signal processing device and signal processing method

ActiveUS20160104499A1Avoid sound qualityElectrical apparatusSpeech analysisSignal onFrequency characteristic

A signal processing device comprises: a band detecting means for detecting a frequency band which satisfies a predetermined condition from an audio signal; a reference signal generating means for generating a reference signal in accordance with a detection band by the band detecting means; a reference signal correcting means for correcting the generated reference signal on the basis of a frequency characteristic thereof; a frequency band extending means for extending the corrected reference signal up to a frequency band higher than the detection band; an interpolation signal generating means for generating an interpolation signal by weighting each frequency component within the extended frequency band in accordance with a frequency characteristic of the audio signal; and a signal synthesizing means for synthesizing the generated interpolation signal with the audio signal.

Signal processing device and signal processing method

View all

Owner:CLARION CO LTD

Equalization of an Audio Signal

ActiveUS20130195286A1MicrophonesLoudspeakersPhase shiftedEqualization

A method of processing an input audio signal, the method comprising forming a plurality of output audio signals from the input audio signal, wherein each output audio signal is formed by performing respective processing on the input audio signal, wherein for a first output audio signal there is a target audio equalization operation comprising a target filter twice, wherein for the first output 10 audio signal, the respective processing comprises a first audio equalization operation, the first audio equalization operation being the target audio equalization operation modified to compensate for phase shifts that correspond to zeros of the transfer function of the target audio equalization operation, wherein for each output audio signal other than the first output audio signal, the respective 15 processing comprises a compensation filter that compensates for phase shifts that correspond to poles of the transfer function of the target audio equalization operation. 20

Owner:OXFORD DIGITAL

Abnormal sound detection method based on edge cloud intelligent architecture

InactiveCN110544489AReasonable configurationRelieve pressureSpeech analysisTransmissionPattern recognitionSound detection

An abnormal sound detection method based on an edge cloud intelligent architecture is provided. The method comprises the following steps: collecting audio data on an edge end; deploying tasks that canbe processed by the edge end to an edge device as much as possible; using the Docker container technology to perform encapsulation on task processing operators on a cloud end to realize migration ofcomputing tasks, and storing an audio detection result; using a deep neural network model to perform abnormal sound determination; and performing message communication between different devices through the MTZT protocol. According to the method provided by the present invention, the pressures on the cloud computing center and the network bandwidth are alleviated, the system real-time performance and responsiveness are improved, and data security can be better protected.

Owner:JIANGSU HUIZHONG DATA TECH CO LTD

66results about "Speech analysis" patented technology

Popular searches