From 5f974dec6a180c7065e7131f44ff9784df454822 Mon Sep 17 00:00:00 2001
From: hit9 <hit9@icloud.com>
Date: Mon, 1 Feb 2021 20:25:24 +0800
Subject: [PATCH] Add performance.rst and fix some typos

---
 README.rst            |   6 +-
 docs/c-guide.rst      |  28 +++++-----
 docs/go-guide.rst     |  14 ++---
 docs/index.rst        |  14 ++++-
 docs/language.rst     |  26 +++++----
 docs/performance.rst  | 124 ++++++++++++++++++++++++++++++++++++++++++
 docs/python-guide.rst |  13 ++---
 docs/quickstart.rst   |   2 +-
 8 files changed, 182 insertions(+), 45 deletions(-)
 create mode 100644 docs/performance.rst

diff --git a/README.rst b/README.rst
index 10221e4..e92f80f 100644
--- a/README.rst
+++ b/README.rst
@@ -1,10 +1,10 @@
 bitproto
 ========
 
-Bitproto is a lightweight, easy-to-use and production-proven bit level data
+Bitproto is a lightweight, easy-to-use and fast bit level data
 interchange data format for serializing data structures.
 
-Website: TODO
+Website: https://bitproto.readthedocs.io
 
 Features
 ---------
@@ -21,3 +21,5 @@ Features
   - C - No dynamic memory allocation.
   - Go - No reflection or type assertions.
   - Python - No magic :)
+
+- Blazing fast encoding/decoding.
diff --git a/docs/c-guide.rst b/docs/c-guide.rst
index a253259..6912660 100644
--- a/docs/c-guide.rst
+++ b/docs/c-guide.rst
@@ -14,14 +14,14 @@ Firstly, run the bitproto compiler to generate code for C:
 
    $ bitproto c pen.bitproto
 
-Where the `pen.bitproto` is introduced in earlier section :ref:`quickstart-example-bitproto`.
+Where the ``pen.bitproto`` is introduced in earlier section :ref:`quickstart-example-bitproto`.
 
 We will find that bitproto generates us two files in current directory:
 
-- `pen_bp.h`: Contains the declarations of structs, macros and api functions etc.
-- `pen_bp.c`: Contains the function implementations.
+- ``pen_bp.h``: Contains the declarations of structs, macros and api functions etc.
+- ``pen_bp.c``: Contains the function implementations.
 
-It's recommended to open this two generated files to have a look. In the generated file `pen_bp.h`:
+It's recommended to open this two generated files to have a look. In the generated file ``pen_bp.h``:
 
 * The ``enum Color`` in bitproto is mapped to a ``typedef`` statement in C, and the enum
   values are mapped to macros:
@@ -71,7 +71,7 @@ encoder and decoder depends on the bitproto C library underlying.
 
 Download the bitproto library for C language from
 `this github link <https://github.com/hit9/bitproto/tree/master/lib/c>`_,
-and put them (the `bitproto.c` and `bitproto.h`) to current working directory.
+and put them (the ``bitproto.c`` and ``bitproto.h``) to current working directory.
 
 Run the code
 ^^^^^^^^^^^^
@@ -102,17 +102,17 @@ Now, we create a file named ``main.c`` and put the following code in it:
      return 0;
    }
 
-In the code above, we firstly creates a ``p`` of type ``struct Pen`` with data initilization,
+In the code above, we firstly create a ``p`` of type ``struct Pen`` with data initilization,
 then call a function ``EncodePen`` to encode ``p`` into buffer ``s``. The length of buffer ``s``
 is generated by compiler as a macro defined as ``BYTES_LENGTH_PEN``.
 
-In the decoding part, we constructs another ``p1`` instance of type ``struct Pen`` with zero
+In the decoding part, we construct another ``p1`` instance of type ``struct Pen`` with zero
 initilization, then call a function ``DecodePen`` to decode bytes from buffer ``s`` into ``p1``.
 
-Finally, uses a function ``JsonPen`` generated by the compiler to format the structure ``p1``
+Finally, use a function ``JsonPen`` generated by the compiler to format the structure ``p1``
 to json string to checkout if the decoding works ok.
 
-Let's compile it with the C library `bitproto.c` and generated `pen_bp.c`, and run:
+Let's compile it with the C library ``bitproto.c`` and generated ``pen_bp.c``, and run:
 
 .. sourcecode:: bash
 
@@ -129,15 +129,15 @@ Naming Prefix
 ^^^^^^^^^^^^^
 
 As we know, there's no namespace mechanism to scope definition names across including header files in C.
-Bitproto provide an option to add a name prefix to all generated types. To use it, define an ``option``
+Bitproto provides an option to add a name prefix to all generated types. To use it, define an ``option``
 at the global scope of the bitproto file:
 
 .. sourcecode:: bitproto
 
    option c.name_prefix = "my_prefix_"
 
-Run the bitproto compiler again, we will that names in `pen_bp.h` are changed:
+Run the bitproto compiler again, we will that names in ``pen_bp.h`` are changed:
 
-* The ``enum Color`` is mapped to ``MyPrefixColor``.
-* The ``Timestamp`` is mapped to ``MyPrefixTimestamp``.
-* The ``message Pen`` is mapped to ``struct MyPrefixPen``.
+* The ``enum Color`` is now mapped to ``MyPrefixColor``.
+* The ``Timestamp`` is now mapped to ``MyPrefixTimestamp``.
+* The ``message Pen`` is now mapped to ``struct MyPrefixPen``.
diff --git a/docs/go-guide.rst b/docs/go-guide.rst
index a6794ed..d7c2c1c 100644
--- a/docs/go-guide.rst
+++ b/docs/go-guide.rst
@@ -20,12 +20,12 @@ Then run the bitproto compiler to generate code for Go:
 
    $ bitproto go pen.bitproto bp/
 
-Where the `pen.bitproto` is introduced in earlier section :ref:`quickstart-example-bitproto`.
+Where the ``pen.bitproto`` is introduced in earlier section :ref:`quickstart-example-bitproto`.
 
-We will find that bitproto generates us a file named `pen_bp.go` in the output directory,
+We will find that bitproto generates us a file named ``pen_bp.go`` in the output directory,
 which contains the mapped structs, constants and api methods etc.
 
-In the generated `pen_bp.go`:
+In the generated ``pen_bp.go``:
 
 * The ``enum Color`` in bitproto is mapped to a ``type`` definition on unsigned integer
   statement in Go, and the enum values are mapped to constants:
@@ -85,7 +85,7 @@ If you wish to install bitproto go library to local vendor directory via ``go mo
 Run the code
 ^^^^^^^^^^^^
 
-Now, we create a file named  `main.go` and put the following code in it:
+Now, we create a file named  ``main.go`` and put the following code in it:
 
 .. sourcecode:: go
 
@@ -109,13 +109,13 @@ Now, we create a file named  `main.go` and put the following code in it:
    	fmt.Printf("%v", p1)
    }
 
-Notes to replace the import path of the generated `pen_bp.go` to yours.
+Note to replace the import path of the generated ``pen_bp.go`` to yours.
 
-In the code above, we firstly creates a ``p`` of type ``Pen`` with data initilization,
+In the code above, we firstly create a ``p`` of type ``Pen`` with data initilization,
 then call a method ``p.Encode()`` to encode ``p`` and return the encoded buffer ``s``, which
 is a slice of bytes.
 
-In the decoding part, we constructs another ``p1`` instance of type ``Pen`` with zero initilization,
+In the decoding part, we construct another ``p1`` instance of type ``Pen`` with zero initilization,
 then call a method ``p1.Decode()`` to decode bytes from buffer ``s`` into ``p1``.
 
 The compiler also generates json tags on the generated struct's fields. And generates a method ``String()``
diff --git a/docs/index.rst b/docs/index.rst
index 44e57fd..57f3647 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -10,7 +10,7 @@ The bit level data interchange format
 Introduction
 ------------
 
-Bitproto is a lightweight, easy-to-use and production-proven bit level data
+Bitproto is a fast, lightweight and easy-to-use bit level data
 interchange data format for serializing data structures.
 
 The protocol describing syntax looks like the great
@@ -50,6 +50,7 @@ Features
    - :ref:`C (ANSI C)<quickstart-c-guide>` - No dynamic memory allocation.
    - :ref:`Go <quickstart-go-guide>` - No reflection or type assertions.
    - :ref:`Python <quickstart-python-guide>` - No magic :)
+- Blazing fast encoding/decoding (:ref:`benchmark <performance-benchmark>`).
 
 Code Example
 ------------
@@ -113,7 +114,7 @@ The differences between bitproto and protobuf are:
 
 * bitproto doesn't use any dynamic memory allocations. Few of
   `protobuf C implementations <https://github.com/protocolbuffers/protobuf/blob/master/docs/third_party.md>`_
-  support this except `nanopb <https://jpa.kapsi.fi/nanopb>`_.
+  support this, except `nanopb <https://jpa.kapsi.fi/nanopb>`_.
 
 * bitproto doesn't support varying sized data, all types are fixed sized.
 
@@ -149,6 +150,14 @@ Known shortcomes of bitproto:
   tight and compact. Consider to wrap a compression mechanism like `zlib <https://zlib.net/>`_
   on the encoded buffer if you really care.
 
+* bitproto can't provide :ref:`best encoding performance <performance-optimization-mode>`
+  with :ref:`extensibility <language-guide-extensibility>`.
+
+  There's an :ref:`optimization mode <performance-optimization-mode>` designed in bitproto
+  to generate plain encoding/decoding statements directly at code-generation time, since all
+  types in bitproto are fixed-sized, how-to-encode can be determined earlier at code-generation
+  time. This mode gives a huge performance improvement, but I still haven't found a way to
+  make it work with bitproto's extensibility mechanism together.
 
 Content list
 ------------
@@ -162,5 +171,6 @@ Content list
     python-guide
     compiler
     language
+    performance
     changelog
     license
diff --git a/docs/language.rst b/docs/language.rst
index d6278bc..818c31a 100644
--- a/docs/language.rst
+++ b/docs/language.rst
@@ -314,8 +314,8 @@ Nested types can also be referenced across message scopes:
        Outer.Color color = 1;
    }
 
-A bitproto message opens a scope, bitproto will lookup a type from local scope first
-and then the outer scope. In the following example, the type of field ``color`` is
+A bitproto message opens a scope, bitproto will lookup a type from local scopes first
+and then the outer scopes. In the following example, the type of field ``color`` is
 enum ``Color`` in local ``B``:
 
 .. sourcecode:: bitproto
@@ -326,10 +326,10 @@ enum ``Color`` in local ``B``:
 
    message A {
        message B {
-           enum Color : uint3 {}  // Local first
+           enum Color : uint3 {}
        }
 
-       B.Color color = 1
+       B.Color color = 1   // Local `B.Color` wins
    }
 
 In bitproto, only messages and enums can be nested declared.
@@ -416,7 +416,7 @@ However it is sometimes desirable to bind to a different name, to avoid name cla
 
    import lib "path/to/shared.bitproto"
 
-The statement above import `shared.bitproto` as a name ``lib`` in current bitproto, the reference
+The statement above import ``shared.bitproto`` as a name ``lib`` in current bitproto, the reference
 now starts with ``lib.``:
 
 .. sourcecode:: bitproto
@@ -432,12 +432,12 @@ now starts with ``lib.``:
 Extensibility
 ^^^^^^^^^^^^^
 
-Bitproto knows exactly how many bits a message occupy at compile time, because all types
+Bitproto knows exactly how many bits a message will occupy at compile time, because all types
 are fix-sized. This may make backwards-compatibility hard.
 
 It seems ok to add new fields to the end of a message in use, because the structures of
 existing fields are unchanged, the decoding end won't scan the encoded bytes of new fields,
-then the backward-compatibility achieved:
+then "the backward-compatibility achieved":
 
 .. sourcecode:: bitproto
 
@@ -449,10 +449,11 @@ then the backward-compatibility achieved:
 
 But this mechanism works only if there's no data after this message, that's to say, to make
 this mechanism work, this message should be a top-level message, none of other messages can
-refer it, for instance, it can be a communication packet itself.
+refer it, for instance, it can only be a communication packet itself.
 
 This mechanism fails with in-middle messages, for instance, we can't add new fields to the
-following message ``Middle``, it affects the decoding of other old fields like ``following_field``:
+following message ``Middle``, it affects the decoding of other old fields, like the
+``following_field``:
 
 .. sourcecode:: bitproto
 
@@ -489,7 +490,7 @@ Bitproto introduces a symbol ``'`` to mark a message to be extensible:
 In the code above, ``ExtensibleMessage`` occupies ``1+16`` bits, and ``TraditionalMessage`` still
 occupies ``1`` bit.
 
-By marking a message to be extensible via a single quote, we increases buffer size by two bytes
+By marking a message to be extensible via a single quote, we increase buffer size by two bytes
 in exchange for the possibility of adding new fields in the future. You should balance buffer size
 and extensibility when declaring a message, mark the messages those will be extended in the future.
 
@@ -524,7 +525,7 @@ Back to the example of message ``Middle``, if this message in use is marked to b
        uint7 following_field = 2
    }
 
-Decoding will goes wrong if you exchange data between two ends, of which one marks this message as extensible,
+But decoding will go wrong if you exchange data between two ends, of which one marks this message as extensible,
 and the other marks it as traditional.
 
 Extensible messages can also be nested declared, in the example below, message ``Outer`` occupies ``2+2`` bytes:
@@ -533,6 +534,7 @@ Extensible messages can also be nested declared, in the example below, message `
 
    message Outer' {
        message Inner' {}
+       // Ha, empty extensible messages still cost bytes ~
    }
 
 In addtion, arrays are also supported to be marked as extensible:
@@ -551,7 +553,7 @@ It is the same with extensible messages, an extensible array gains ``2`` bytes o
    For enums, extensibility is not supported, because enum values are atomic in targeting languages,
    the decoding end holding an older version protocol will get a wrong enum value if the encoder end
    increases the enum's number of bits, the unsigned integer types mapped in languages may cast large
-   values to smaller values unexpected.
+   values to unexpected smaller values.
 
 .. _language-guide-option:
 
diff --git a/docs/performance.rst b/docs/performance.rst
new file mode 100644
index 0000000..ac2ae93
--- /dev/null
+++ b/docs/performance.rst
@@ -0,0 +1,124 @@
+.. _performance:
+
+Performance
+===========
+
+This document will introduce the performance of bitproto encoding and decoding,
+along with the optimization mechanism.
+
+.. _performance-benchmark:
+
+Performance Benchmark
+^^^^^^^^^^^^^^^^^^^^^
+
+Benchmark of bitproto encoding/decoding shows that it runs very fast.
+
+Unix OS
+''''''''
+
+On unix like systems (mac, ubuntu etc.), a single encoding/decoding call costs:
+
+* ``< 2μs`` in C
+* ``< 10μs`` in Go
+* ``< 1ms`` in Python
+
+You can checkout `the detail benchmark results for unix on github <https://github.com/hit9/bitproto/tree/master/benchmark/unix>`_.
+
+Embedded
+'''''''''
+
+I have tested the benchmark on a `stm32 board <https://www.st.com/content/st_com/en/products/microcontrollers-microprocessors/stm32-32-bit-arm-cortex-mcus/stm32-mainstream-mcus/stm32f1-series/stm32f103/stm32f103ze.html>`_
+(arm cortex-m3 72MHz cpu), a single encoding/decoding call costs around ``160 μs``, and can be optimized to around ``9 μs``
+in :ref:`the optimization mode <performance-optimization-mode>`.
+
+You can checkout `the detail benchmark results for stm32 on github <https://github.com/hit9/bitproto/tree/master/benchmark/stm32>`_.
+
+.. _performance-optimization-mode:
+
+The Optimization Mode
+^^^^^^^^^^^^^^^^^^^^^^
+
+For most cases, the performance may meet the requirements. But if you are not satisfied with this,
+there's still a way to go, the called "optimization mode" in bitproto, by adding an option ``-O`` to the
+bitproto compiler:
+
+.. sourcecode:: bash
+
+   $ bitproto c example.bitproto -O
+
+By this way, bitproto will generate code for you in optimization mode.
+
+The mechanism behind optimization mode is to generate plain encoding/decoding code statements directly
+at code-generation time. We known that all types are fixed-sized in bitproto, so the encoding and decoding
+processing can be totally determined at code-generation time, bitproto just iterates all the fields of a message
+and generate bits coping statements.
+
+.. note::
+
+   The optimization mode doesn't work for :ref:`extensible messages <language-guide-extensibility>`. Because
+   extensible messages decoding requires dynamic calculation.
+
+For an instance in C, the generated code in optimization mode looks like this:
+
+.. sourcecode:: c
+
+   int EncodeDrone(struct Drone *m, unsigned char *s) {
+       s[0] |= (((unsigned char *)&((*m).position.latitude))[0] << 3) & 248;
+       s[1] = (((unsigned char *)&((*m).position.latitude))[0] >> 5) & 7;
+       ...
+   }
+
+   int DecodeDrone(struct Drone *m, unsigned char *s) {
+       ((unsigned char *)&((*m).position.latitude))[0] = (s[0] >> 3) & 31;
+       ((unsigned char *)&((*m).position.latitude))[0] |= (s[1] << 5) & 224;
+       ...
+   }
+
+See the generated code example above, there's no loops, no if-else, all statements are plain bit operations.
+In this way, bitproto's optimization mode gives us a maximum performance improvement on encoding/decoding.
+
+It's fine of course to use optimization mode on one end and non-optimization mode (the standard mode) on another end
+in message communication. The optimization mode only changes the way how to execute the encoder and decoder,
+without changing the format of the message encoding.
+
+In fact, using the optimization mode is also a trade-off sometimes. In this mode, we have to drop the benefits of
+`extensibility <language-guide-extensibility>`_, it's not friendly to the compatibility design of the protocol.
+Optimization mode is designed for performance-sensitive scenarios, such as low power consumption embedded boards,
+compute-intensive microcontrollers. I recommend to use the optimization mode when:
+
+* Performance-sensitive scenarios, where ``100μs`` means totally different with ``10μs``.
+* The firmwares of communication ends are always upgraded together, thus the backward-compatibility is not so important.
+* Firmware updates are not frequent, even only once for a long time.
+
+Specially, for the scenario that firmware-upgrading of communication ends have to be processed partially,
+such as the typical one-to-many `client-server artitecture <https://en.wikipedia.org/wiki/Client%E2%80%93server_model>`_,
+I recommend to stick to the standard mode rather than the optimization mode.
+
+The optimization mode is currently supported for language C and Go, (not yet Python).
+
+Another benefit of optimization mode is that the bitproto libraries are no longer required to be dropped in.
+The bitproto compiler in optimization mode already throws out the final encoding and decoding statements,
+so the bitproto libraries aren't required. The libraries are designed to used with standard mode, where
+protocol extensibility is a feature.
+
+Smaller Code Size
+''''''''''''''''''
+
+Embedded firmware may be limited in program size. Bitproto provides another compiler option ``-F`` to filter
+messages to generate in optimization mode:
+
+.. sourcecode:: bash
+
+   $ bitproto example.bitproto -O -F "Packet"
+
+The command above tells bitproto only to generate encoder and decoder functions for message ``Packet``, other messages's
+encoder and decoder functions will be skpped without generating.
+
+The ``-F`` trick is useful because in most scenarios we just exchange a single "top-level" bitproto message
+in communication. This option can also be used with multiple message names:
+
+.. sourcecode:: bash
+
+   $ bitproto example.bitproto -O -F "PacketA,PacketB"
+
+Finally to note that, the ``-F`` option can be only used together with option ``-O``.
diff --git a/docs/python-guide.rst b/docs/python-guide.rst
index 2d558ed..6853e4f 100644
--- a/docs/python-guide.rst
+++ b/docs/python-guide.rst
@@ -12,7 +12,6 @@ Prerequisites
 The python file generated by bitproto file is in Python 3, uses the
 `typing hint <https://docs.python.org/3/library/typing.html>`_ and
 `dataclasses <https://docs.python.org/3/library/dataclasses.html>`_.
-
 So make sure you are using `Python3.7+ <https://www.python.org/downloads/>`_ to use bitproto in Python.
 
 Compile bitproto for Python
@@ -24,12 +23,12 @@ Firstly, run the bitproto compiler to generate code for Python:
 
    $ bitproto py pen.bitproto
 
-Where the `pen.bitproto` is introduced in earlier section :ref:`quickstart-example-bitproto`.
+Where the ``pen.bitproto`` is introduced in earlier section :ref:`quickstart-example-bitproto`.
 
-We will find that bitproto generates us a file named `pen_bp.py`, which contains
+We will find that bitproto generates us a file named ``pen_bp.py``, which contains
 the mapped classes, constants and api methods etc.
 
-In the generated `pen_bp.py`:
+In the generated ``pen_bp.py``:
 
 * The ``enum Color`` in bitproto is mapped to a typing hint alias, and the enum values are mapped
   to constants:
@@ -85,7 +84,7 @@ The source code of the bitproto Python library is hosted on `Github <https://git
 Run the code
 ^^^^^^^^^^^^
 
-Now, we create a file named `main.py` and put the following code in it:
+Now, we create a file named ``main.py`` and put the following code in it:
 
 .. sourcecode:: python
 
@@ -103,11 +102,11 @@ Now, we create a file named `main.py` and put the following code in it:
    print(p1.to_json())
 
 
-In the code above, we firstly creates a ``p`` instance of type ``Pen`` with data initilization,
+In the code above, we firstly create a ``p`` instance of type ``Pen`` with data initilization,
 then call a method ``p.encode()`` to encode ``p`` and return the encoded buffer ``s``, which is
 an ``bytearray``.
 
-In the decoding part, we constructs another ``p1`` instance of type ``Pen`` with zero initilization,
+In the decoding part, we construct another ``p1`` instance of type ``Pen`` with zero initilization,
 then call a method ``p1.decode()`` to decode bytes from buffer ``s`` into ``p1``.
 
 The compiler also generates a method ``to_json()`` to return the json string format of the structure.
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
index 0c246f3..6ca8ab3 100644
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
@@ -18,7 +18,7 @@ This document will introduce how to start with using bitproto.
 An example bitproto
 -------------------
 
-Suppose that we have a bitproto named `pen.bitproto`, with the following content:
+Suppose that we have a bitproto named ``pen.bitproto``, with the following content:
 
 .. sourcecode:: bitproto