The class implements a layer that performs convolution on a set of binary one-channel images in RLE format.
An RLE image is represented by the background value (GetNonStrokeValue()
) and a set of horizontal strokes filled with GetStrokeValue()
.
static const int MaxRleConvImageWidth = 64;
// A stroke in RLE format
struct CRleStroke {
short Start; // starting coordinate
short End; // ending coordinate (the first pixel that does NOT belong to the stroke)
// A special stroke value for the end of the row
static CRleStroke Sentinel() { return { SHRT_MAX, -1 }; }
};
static const CRleStroke Sentinel = { SHRT_MAX, -1 };
struct CRleImage {
int StrokesCount; // the number of strokes in the image
int Height; // the image height; may be smaller than the blob height,
// in which case the top (blob.GetHeight() - Height) / 2
// and the bottom (blob.GetHeight() - Height + 1) / 2 rows will be filled with GetNonStrokeValue()
int Width; // the image width; may be smaller than the blob width,
// in which case the left (blob.GetWidth() - Width) / 2
// and the right (blob.GetWidth() - Width + 1) / 2 columns will be filled with GetNonStrokeValue()
CRleStroke Stub; // RESERVED
CRleStroke Lines[1]; // the lines array. As the strokes' coordinates are stored in short variables, it is guaranteed that the size of any RLE image will not exceed the size of a float buffer necessary to store it
};
// We will encode the following image in RLE format:
// 01110
// 00000
// 01010
// 00110
CPtr<CDnnBlob> imageBlob = CDnnBlob::Create2DImageBlob(GetDefaultCpuMathEngine(), CT_Float, 1, 1, 4, 5, 1);
CArray<float> imageBuff;
imageBuff.Add(0, 4 * 5);
CRleImage* rleImage = reinterpret_cast<CRleImage*>(imageBuff.GetPtr());
rleImage->Height = 4;
rleImage->Width = 3; // the left and the right columns are filled with zeros
rleImage->StrokesCount = 8; // there are 4 strokes on the image + 4 end-of-rows are needed
rleImage->Lines[0] = CRleStroke{ 0, 3 }; // the 3-length stroke in the first row; the stroke coordinates are relative to rleImage->Width and NOT to the blob width
rleImage->Lines[1] = CRleStroke::Sentinel(); // the first row ends
rleImage->Lines[2] = CRleStroke::Sentinel(); // the second row is empty
rleImage->Lines[3] = CRleStroke{ 0, 1 }; // the first one pixel in the third row
rleImage->Lines[4] = CRleStroke{ 2, 3 }; // the second one pixel in the third row
rleImage->Lines[5] = CRleStroke::Sentinel(); // the third row ends
rleImage->Lines[6] = CRleStroke{ 1, 3 };
rleImage->Lines[7] = CRleStroke::Sentinel();
imageBlob->CopyFrom(imageBuff.GetPtr());
// If you want to store several images, place the nth image in the imageBuff array
// shifted by the image size * (n-1)
// CRleImage* nthRleImage = reinterpret_cast<CRleImage*>(imageBuff.GetPtr() + (4 * 5) * (n - 1));
void SetFilterHeight( int filterHeight );
void SetFilterWidth( int filterWidth );
void SetFilterCount( int filterCount );
void SetStrideHeight( int strideHeight );
void SetStrideWidth( int strideWidth );
Sets the convolution stride. By default, the stride is 1
.
void SetStrokeValue( float _strokeValue );
void SetNonStrokeValue( float _nonStrokeValue );
The values to be used for the RLE strokes and outside them. See above for details on format.
void SetZeroFreeTerm(bool isZeroFreeTerm);
Specifies if the free terms should be used. If you set this value to true
, the free terms vector will be set to all zeros and won't be trained. By default, this value is set to false
.
CPtr<CDnnBlob> GetFilterData() const;
The filters are represented by a blob of the following dimensions:
BatchLength * BatchWidth * ListSize
is equal to the number of filters used (GetFilterCount()
).Height
is equal toGetFilterHeight()
.Width
is equal toGetFilterWidth()
.Depth
andChannels
are equal to1
.
CPtr<CDnnBlob> GetFreeTermData() const;
The free terms are represented by a blob of the total size equal to the number of filters used (GetFilterCount()
).
Each input accepts a blob with several images in RLE format. The dimensions of all inputs should be the same:
BatchLength * BatchWidth * ListSize
- the number of images in the set.Height
- the images' height.Width
- the images' width, not more than64
.Depth
- the images' depth, should be equal to1
.Channels
- the number of channels, should be equal to1
.
For each input the layer has one output. It contains a blob with the result of the convolution, in the same format as the regular CConvLayer
.
The output blob dimensions are:
BatchLength
is equal to the inputBatchLength
.BatchWidth
is equal to the inputBatchWidth
.ListSize
is equal to the inputListSize
.Height
can be calculated from the inputHeight
as(Height - FilterHeight)/StrideHeight + 1
.Width
can be calculated from the inputWidth
as(Width - FilterWidth)/StrideWidth + 1
.Depth
is equal to1
.Channels
is equal toGetFilterCount()
.