Skip to content

Commit

Permalink
Examples: 8b QuartzNet for speech recognition
Browse files Browse the repository at this point in the history
  • Loading branch information
Giuseppe5 authored and volcacius committed Mar 10, 2020
1 parent 904707d commit df05090
Show file tree
Hide file tree
Showing 24 changed files with 3,862 additions and 0 deletions.
32 changes: 32 additions & 0 deletions examples/speech_to_text/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Examples

The models provided in this folder are meant to showcase how to leverage the quantized layers provided by Brevitas,
and by no means a direct mapping to hardware should be assumed.

Below in the table is a list of example pretrained models made available for reference.

| Name | Cfg | Scaling Type | Inner layers bit width | Outer layers bit width | WER (Word Error Rate) on dev-other | Pretrained model | Retrained from |
|--------------|-----------------------|----------------------------|------------------------|------------------------|------------------------|----------------------|-------------------------------|
| Quartznet 8b | quant_quartznet_pertensorscaling_8b | Floating-point per tensor | 8 bit | 8 bit | 11.03% | [Encoder](https://github.com/Xilinx/brevitas/releases/download/quant_quartznet_8b-r0/quant_quartznet_encoder_8b-fbff9a95.pth) [Decoder](https://github.com/Xilinx/brevitas/releases/download/quant_quartznet_8b-r0/quant_quartznet_decoder_8b-d09ea039.pth) | [link](https://ngc.nvidia.com/catalog/models/nvidia:quartznet_15x5_ls_sp) |
| Quartznet 8b | quant_quartznet_perchannelscaling_8b | Floating-point per channel | 8 bit | 8 bit | 10.98% | [Encoder](https://github.com/Xilinx/brevitas/releases/download/quant_quartznet_8b-r0/quant_quartznet_encoder_8b-fbff9a95.pth) [Decoder](https://github.com/Xilinx/brevitas/releases/download/quant_quartznet_8b-r0/quant_quartznet_decoder_8b-d09ea039.pth) | [link](https://ngc.nvidia.com/catalog/models/nvidia:quartznet_15x5_ls_sp) |

It is highly recommended to setup a virtual environment.

Download and pre-process the LibriSpeech dataset with the following command:
```
python utilities/get_librispeech_data.py --data_root=/path/to/validation/folder --data_set=DEV_OTHER
```

To evaluate a pretrained quantized model on LibriSpeech:

- Install pytorch from the [Pytorch Website](https://pytorch.org/), and Cython with the following command:
`python install --upgrade cython`
- Install the Quartznet requirements with `pip install requirements.txt`
- Make sure you have Brevitas installed
- Pass the corresponding cfg .ini file as an input to the evaluation script. The required checkpoint will be downloaded automatically.

For example, for the evaluation on GPU 0:

```
python quartznet_val.py --input-folder /path/to/validation/folder --model-cfg cfg/quant_quartznet_pertensorscaling_8b.ini --gpu 0
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[MODEL]
ARCH: quartznet
TOPOLOGY_FILE: cfg/topology/quartznet15x5.yaml
PRETRAINED_ENCODER_URL: https://github.com/Xilinx/brevitas/releases/download/quant_quartznet_8b-r0/quant_quartznet_encoder_8b-fbff9a95.pth
PRETRAINED_DECODER_URL: https://github.com/Xilinx/brevitas/releases/download/quant_quartznet_8b-r0/quant_quartznet_decoder_8b-d09ea039.pth

[QUANT]
OUTER_LAYERS_BIT_WIDTH: 8
INNER_LAYERS_BIT_WIDTH: 8
FUSED_BN: True

[WEIGHT]
ENCODER_SCALING_PER_OUTPUT_CHANNEL: False
DECODER_SCALING_PER_OUTPUT_CHANNEL: False

[ACTIVATIONS]
INNER_SCALING_PER_CHANNEL: False
OTHER_SCALING_PER_CHANNEL: False
ABS_ACT_VAL: 1



Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[MODEL]
ARCH: quartznet
TOPOLOGY_FILE: cfg/topology/quartznet15x5.yaml
PRETRAINED_ENCODER_URL: https://github.com/Xilinx/brevitas/releases/download/quant_quartznet_8b-r0/quant_quartznet_encoder_8b-fbff9a95.pth
PRETRAINED_DECODER_URL: https://github.com/Xilinx/brevitas/releases/download/quant_quartznet_8b-r0/quant_quartznet_decoder_8b-d09ea039.pth

[QUANT]
OUTER_LAYERS_BIT_WIDTH: 8
INNER_LAYERS_BIT_WIDTH: 8
FUSED_BN: True

[WEIGHT]
ENCODER_SCALING_PER_OUTPUT_CHANNEL: False
DECODER_SCALING_PER_OUTPUT_CHANNEL: False

[ACTIVATIONS]
INNER_SCALING_PER_CHANNEL: False
OTHER_SCALING_PER_CHANNEL: False
ABS_ACT_VAL: 1


199 changes: 199 additions & 0 deletions examples/speech_to_text/cfg/topology/quartznet15x5.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
model: "QuartzNet"
sample_rate: 16000

AudioToTextDataLayer:
max_duration: 16.7
trim_silence: true

train:
shuffle: true

eval:
shuffle: false
max_duration: null

AudioToMelSpectrogramPreprocessor:
window_size: 0.02
window_stride: 0.01
window: "hann"
normalize: "per_feature"
n_fft: 512
features: 64
feat_type: "logfbank"
dither: 0.00001
pad_to: 16
stft_conv: true

SpectrogramAugmentation:
rect_masks: 5
rect_time: 120
rect_freq: 50

JasperEncoder:
activation: "relu"
conv_mask: true

jasper:
- filters: 256
repeat: 1
kernel: [33]
stride: [2]
dilation: [1]
dropout: 0.0
residual: false
separable: true

- filters: 256
repeat: 5
kernel: [33]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 256
repeat: 5
kernel: [33]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 256
repeat: 5
kernel: [33]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 256
repeat: 5
kernel: [39]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 256
repeat: 5
kernel: [39]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 256
repeat: 5
kernel: [39]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [51]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [51]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [51]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [63]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [63]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [63]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [75]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [75]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 5
kernel: [75]
stride: [1]
dilation: [1]
dropout: 0.0
residual: true
separable: true

- filters: 512
repeat: 1
kernel: [87]
stride: [1]
dilation: [2]
dropout: 0.0
residual: false
separable: true

- filters: 1024
repeat: 1
kernel: [1]
stride: [1]
dilation: [1]
dropout: 0.0
residual: false

labels: [" ", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
31 changes: 31 additions & 0 deletions examples/speech_to_text/quartznet/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Adapted from https://github.com/NVIDIA/NeMo/blob/r0.9/collections/nemo_asr/
# Copyright (C) 2020 Xilinx (Giuseppe Franco)
# Copyright (C) 2019 NVIDIA CORPORATION.
#
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


# from .audio_preprocessing import AudioToMelSpectrogramPreprocessor
from .data_layer import (
AudioToTextDataLayer)
from .greedy_ctc_decoder import GreedyCTCDecoder
from .quartznet import quartznet
from .losses import CTCLossNM

__all__ = ['AudioToTextDataLayer',
'quartznet']


name = "quarznet_release"
Loading

0 comments on commit df05090

Please sign in to comment.