Bånder

MFCC extends BaseAlgorithm
in package

MFCC

Inputs:

[vector_real] spectrum - the audio spectrum

Outputs:

[vector_real] bands - the energies in mel bands [vector_real] mfcc - the mel frequency cepstrum coefficients

Parameters:

dctType: integer ∈ [2,3] (default = 2) the DCT type

highFrequencyBound: real ∈ (0,inf) (default = 11000) the upper bound of the frequency range [Hz]

inputSize: integer ∈ (1,inf) (default = 1025) the size of input spectrum

liftering: integer ∈ [0,inf) (default = 0) the liftering coefficient. Use '0' to bypass it

logType: string ∈ {natural,dbpow,dbamp,log} (default = "dbamp") logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes

lowFrequencyBound: real ∈ [0,inf) (default = 0) the lower bound of the frequency range [Hz]

normalize: string ∈ {unit_sum,unit_tri,unit_max} (default = "unit_sum") spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth

numberBands: integer ∈ [1,inf) (default = 40) the number of mel-bands in the filter

numberCoefficients: integer ∈ [1,inf) (default = 13) the number of output mel coefficients

sampleRate: real ∈ (0,inf) (default = 44100) the sampling rate of the audio signal [Hz]

silenceThreshold: real ∈ (0,inf) (default = 1.00000001335e-10) silence threshold for computing log-energy bands

type: string ∈ {magnitude,power} (default = "power") use magnitude or power spectrum

warpingFormula: string ∈ {slaneyMel,htkMel} (default = "htkMel") The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]

weighting: string ∈ {warping,linear} (default = "warping") type of weighting function for determining triangle area

Description:

This algorithm computes the mel-frequency cepstrum coefficients of a spectrum. As there is no standard implementation, the MFCC-FB40 is used by default:

  • filterbank of 40 bands from 0 to 11000Hz
  • take the log value of the spectrum energy in each mel band. Bands energy values below silence threshold will be clipped to its value before computing log-energies
  • DCT of the 40 bands down to 13 mel coefficients There is a paper describing various MFCC implementations [1].

The parameters of this algorithm can be configured in order to behave like HTK [3] as follows:

  • type = 'magnitude'
  • warpingFormula = 'htkMel'
  • weighting = 'linear'
  • highFrequencyBound = 8000
  • numberBands = 26
  • numberCoefficients = 13
  • normalize = 'unit_max'
  • dctType = 3
  • logType = 'log'
  • liftering = 22

In order to completely behave like HTK the audio signal has to be scaled by 2^15 before the processing and if the Windowing and FrameCutter algorithms are used they should also be configured as follows.

FrameGenerator:

  • frameSize = 1102
  • hopSize = 441
  • startFromZero = True
  • validFrameThresholdRatio = 1

Windowing:

  • type = 'hamming'
  • size = 1102
  • zeroPadding = 946
  • normalized = False

This algorithm depends on the algorithms MelBands and DCT and therefore inherits their parameter restrictions. An exception is thrown if any of these restrictions are not met. The input "spectrum" is passed to the MelBands algorithm and thus imposes MelBands' input requirements. Exceptions are inherited by MelBands as well as by DCT.

IDCT can be used to compute smoothed Mel Bands. In order to do this:

  • compute MFCC
  • smoothedMelBands = 10^(IDCT(MFCC)/20)

Note: The second step assumes that 'logType' = 'dbamp' was used to compute MFCCs, otherwise that formula should be changed in order to be consistent.

References: [1] T. Ganchev, N. Fakotakis, and G. Kokkinakis, "Comparative evaluation of various MFCC implementations on the speaker verification task," in International Conference on Speach and Computer (SPECOM’05), 2005, vol. 1, pp. 191–194.

[2] Mel-frequency cepstrum - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Mel_frequency_cepstral_coefficient

[3] Young, S. J., Evermann, G., Gales, M. J. F., Hain, T., Kershaw, D., Liu, X., … Woodland, P. C. (2009). The HTK Book (for HTK Version 3.4). Construction, (July 2000), 384, https://doi.org/http://htk.eng.cam.ac.uk

[4] Slaney, M. Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling Work. Technical Report, version 2, Interval Research Corporation, 1998.

Category: Spectral Mode: standard

Table of Contents

Properties

$algorithmName  : string
$category  : string
$essentia  : EssentiaFFI
$mode  : string
$parameters  : array<string|int, mixed>
$algorithmHandle  : CData|null
$configured  : bool

Methods

__construct()  : mixed
__destruct()  : mixed
compute()  : array<string|int, mixed>
getAlgorithmName()  : string
getCategory()  : string
getMode()  : string
getParameters()  : array<string|int, mixed>
setParameter()  : self
configure()  : void
isValidParameter()  : bool
validateInput()  : void
cleanupAlgorithm()  : void
configureAlgorithmParameters()  : void
estimateOutputSize()  : int
executeAlgorithm()  : array<string|int, mixed>
executeGenericAlgorithm()  : array<string|int, mixed>
executeSpecificAlgorithm()  : array<string|int, mixed>
getAlgorithmCreateFunction()  : string
getValidParameters()  : array<string|int, mixed>
initializeAlgorithm()  : void
prepareInput()  : mixed
processOutput()  : array<string|int, mixed>
processRhythmOutput()  : array<string|int, mixed>
processSpectralOutput()  : array<string|int, mixed>
processStatsOutput()  : array<string|int, mixed>
processTemporalOutput()  : array<string|int, mixed>
processTonalOutput()  : array<string|int, mixed>
setAlgorithmParameter()  : void
setArrayParameter()  : void
validateAlgorithmInput()  : void

Properties

$algorithmName

protected string $algorithmName = 'MFCC'

$category

protected string $category = 'Spectral'

$mode

protected string $mode = 'standard'

$parameters

protected array<string|int, mixed> $parameters = []

$algorithmHandle

private CData|null $algorithmHandle = null

$configured

private bool $configured = false

Methods

__construct()

public __construct([array<string|int, mixed> $parameters = [] ]) : mixed
Parameters
$parameters : array<string|int, mixed> = []

__destruct()

public __destruct() : mixed

compute()

public compute(mixed $input) : array<string|int, mixed>
Parameters
$input : mixed
Return values
array<string|int, mixed>

getAlgorithmName()

public getAlgorithmName() : string
Return values
string

getCategory()

public getCategory() : string
Return values
string

getParameters()

public getParameters() : array<string|int, mixed>
Return values
array<string|int, mixed>

setParameter()

public setParameter(string $key, mixed $value) : self
Parameters
$key : string
$value : mixed
Return values
self

configure()

protected configure(array<string|int, mixed> $parameters) : void
Parameters
$parameters : array<string|int, mixed>

isValidParameter()

protected isValidParameter(string $parameter) : bool
Parameters
$parameter : string
Return values
bool

validateInput()

protected validateInput(mixed $input, string $expectedType) : void
Parameters
$input : mixed
$expectedType : string

cleanupAlgorithm()

private cleanupAlgorithm() : void

configureAlgorithmParameters()

private configureAlgorithmParameters() : void

estimateOutputSize()

private estimateOutputSize(mixed $input) : int
Parameters
$input : mixed
Return values
int

executeAlgorithm()

private executeAlgorithm(mixed $input) : array<string|int, mixed>
Parameters
$input : mixed
Return values
array<string|int, mixed>

executeGenericAlgorithm()

private executeGenericAlgorithm(FFI $ffi, mixed $input) : array<string|int, mixed>
Parameters
$ffi : FFI
$input : mixed
Return values
array<string|int, mixed>

executeSpecificAlgorithm()

private executeSpecificAlgorithm(FFI $ffi, mixed $input) : array<string|int, mixed>
Parameters
$ffi : FFI
$input : mixed
Return values
array<string|int, mixed>

getAlgorithmCreateFunction()

private getAlgorithmCreateFunction() : string
Return values
string

getValidParameters()

private getValidParameters() : array<string|int, mixed>
Return values
array<string|int, mixed>

initializeAlgorithm()

private initializeAlgorithm() : void

prepareInput()

private prepareInput(mixed $input) : mixed
Parameters
$input : mixed

processOutput()

private processOutput(array<string|int, mixed> $result) : array<string|int, mixed>
Parameters
$result : array<string|int, mixed>
Return values
array<string|int, mixed>

processRhythmOutput()

private processRhythmOutput(array<string|int, mixed> $result) : array<string|int, mixed>
Parameters
$result : array<string|int, mixed>
Return values
array<string|int, mixed>

processSpectralOutput()

private processSpectralOutput(array<string|int, mixed> $result) : array<string|int, mixed>
Parameters
$result : array<string|int, mixed>
Return values
array<string|int, mixed>

processStatsOutput()

private processStatsOutput(array<string|int, mixed> $result) : array<string|int, mixed>
Parameters
$result : array<string|int, mixed>
Return values
array<string|int, mixed>

processTemporalOutput()

private processTemporalOutput(array<string|int, mixed> $result) : array<string|int, mixed>
Parameters
$result : array<string|int, mixed>
Return values
array<string|int, mixed>

processTonalOutput()

private processTonalOutput(array<string|int, mixed> $result) : array<string|int, mixed>
Parameters
$result : array<string|int, mixed>
Return values
array<string|int, mixed>

setAlgorithmParameter()

private setAlgorithmParameter(FFI $ffi, string $key, mixed $value) : void
Parameters
$ffi : FFI
$key : string
$value : mixed

setArrayParameter()

private setArrayParameter(FFI $ffi, string $key, array<string|int, mixed> $value) : void
Parameters
$ffi : FFI
$key : string
$value : array<string|int, mixed>

validateAlgorithmInput()

private validateAlgorithmInput(mixed $input) : void
Parameters
$input : mixed

        
On this page

Search results