Follow us on Facebook
Follow us on Twitter
Signalogic on LinkedIn

ASR Edge Computing

Below is a high-level explanation of the ASR (Automatic Speech Recognition) implementation for SigSRF software, based on the Kaldi open source speech recognition toolkit. A second implementation is ongoing for KubeEdge, the Edge Computing version of Kubernetes.

Contents

Overview

ASR Offloading

Demo Capability

Kaldi Interface

Data Flow

Software Architecture

KubeEdge Integration

Kaldi Info

Run-Time Inference

Kaldi Integration

Kaldi Architecture, DNNs

Overview

SigSRF packet + media processing software

ASR

KubeEdge

diagram showing mobile voice application, teleco network, and KubeEdge (Kubernetes edge computing) container with SigSRF libs and packet/media threads and Kaldi libs

1 Linux Foundation


ASR Offloading

diagram showing ASR offloading from mobile app to edge computing node

Demo Capability

ASR based on Kaldi's mini-librispeech model

SigSRF packet + media software

Call groups (one or more endpoints)

Kaldi Interface

Expects wideband audio

Real-time inference is called "online decoding"

diagram showing Kaldi default online decoding dataflow, including RTP audio packet input, GStreamer decoding to raw audio, and Kaldi ASR processing

GStreamer not suitable for telecom / wideband audio

Data Flow

SigSRF replaces GStreamer

Inferlib

data flow diagram, showing SigSRF packet and media processing, signal processing, and Kaldi ASR processing

Software Architecture

software architecture diagram showing SigSRF libs, Kaldi libs, and test as they relate to data flow & measurement I/O

KubeEdge Integration

SigSRF and Kaldi libs inside KubeEdge container

Mobile device app

diagram showing integration of SigSRF libs, Kaldi libs, SigSRF packet/media threads within a KubeEdge (Kubernetes edge computing) container

Run-Time Inference

One end-to-end thread on one Xeon x86 core

Kaldi developers are focused on state-of-the-art R&D

diagram showing Kaldi data flow and which components (libs) act at which data flow stage

Kaldi Integration

Kaldi is its own framework

To integrate Kaldi into production applications takes effort

Acceleration

Kaldi Architecture, DNNs

Architecture

raw audio shown as time domain, or time series, data prior to Kaldi input
Sliding FFT time domain (time series)
frequency domain data, shown after sliding FFT processing, before formatting as Kaldi DNN input layers
DNN Input Layers (ILn) frequency domain

Kaldi DNN input layer slices, shown as a series of successive CNNs using frequency domain data images diagram showing Kaldi DNN followed by HMM and/or GMM

DNN frequency domain data

Training

1 Deep Neural Network
2 Hidden Markov Model, Gaussian Mixed Model
3 Convolutional Neural Network
4 Finite State Transducer