diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
new file mode 100644
index 0000000..3a8519e
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,49 @@
+---
+name: Bug 報告
+about: 建立報告以協助我們改善。請提供盡可能多的資訊以助於我們理解並重現此 bug。
+title: '[BUG] 簡短描述問題'
+labels: 'bug'
+assignees: ''
+
+---
+
+**描述 Bug**
+提供清晰詳細的 bug 描述。如果有的話，請包括任何錯誤訊息、截圖或日誌。
+
+**重現步驟**
+提供重現此行為的詳細步驟：
+1. 導航至 '...'
+2. 點擊 '....'
+3. 滾動至 '....'
+4. 觀察錯誤
+
+**預期行為**
+提供你預期會發生什麼的清晰簡潔描述。
+
+**截圖**
+如有必要，添加截圖以幫助解釋你的問題。你可以在這裡拖放圖片。
+
+**日誌**
+如有必要，提供與問題相關的任何錯誤日誌或輸出。使用代碼區塊（```）格式化日誌。
+
+**桌面（請提供以下信息）：**
+ - OS: [例如：Windows 10, MacOS 11, Ubuntu 20.04]
+ - 瀏覽器: [例如：Chrome, Safari, Firefox]
+ - 版本: [例如：22]
+
+**智能手機（請提供以下信息）：**
+ - 設備: [例如：iPhone 12, Samsung Galaxy S21]
+ - OS: [例如：iOS 14, Android 11]
+ - 瀏覽器: [例如：Safari, Chrome]
+ - 版本: [例如：22]
+
+**環境（請提供以下信息）：**
+- 應用版本: [例如：1.0.0]
+- 環境名稱: [例如：staging, production]
+- 部署方法: [例如：AWS, Heroku]
+
+**其他背景資訊**
+在這裡提供有關此問題的任何其他上下文。例如：此問題以前是否發生過？這是一個一致的問題嗎？何時開始發生？
+
+**可能的修復**
+如果你對如何修復此問題有想法，請在這裡提供詳細資訊。
diff --git a/.github/ISSUE_TEMPLATE/bug_report_en.md b/.github/ISSUE_TEMPLATE/bug_report_en.md
new file mode 100644
index 0000000..f7506a0
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report_en.md
@@ -0,0 +1,49 @@
+---
+name: Bug report
+about: Create a report to help us improve. Please provide as much information as possible to help us understand and reproduce the bug.
+title: '[BUG] Brief description of the issue'
+labels: 'bug'
+assignees: ''
+
+---
+
+**Describe the bug**
+Provide a clear and detailed description of the bug. Include any error messages, screenshots, or logs if available.
+
+**To Reproduce**
+Provide detailed steps to reproduce the behavior:
+1. Navigate to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. Observe error
+
+**Expected behavior**
+Provide a clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem. You can drag and drop images here.
+
+**Logs**
+If applicable, provide any error logs or output related to the issue. Use code blocks (```) to format logs.
+
+**Desktop (please complete the following information):**
+ - OS: [e.g., Windows 10, MacOS 11, Ubuntu 20.04]
+ - Browser: [e.g., Chrome, Safari, Firefox]
+ - Version: [e.g., 22]
+
+**Smartphone (please complete the following information):**
+ - Device: [e.g., iPhone 12, Samsung Galaxy S21]
+ - OS: [e.g., iOS 14, Android 11]
+ - Browser: [e.g., Safari, Chrome]
+ - Version: [e.g., 22]
+
+**Environment (please complete the following information):**
+- App version: [e.g., 1.0.0]
+- Environment name: [e.g., staging, production]
+- Deployment method: [e.g., AWS, Heroku]
+
+**Additional context**
+Provide any other context about the problem here. For example, has this issue happened before? Is it a consistent issue? When did it start occurring?
+
+**Possible Fix**
+If you have an idea on how to fix this issue, please provide details here.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
new file mode 100644
index 0000000..3470c1e
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,29 @@
+---
+name: 功能建議
+about: 為此項目提出一個新的功能或改進建議。
+title: '[功能建議] 簡短描述功能'
+labels: '功能建議'
+assignees: ''
+
+---
+
+**您的功能請求是否與問題相關？請描述。**
+提供清晰簡潔的問題描述。例如：我總是很困擾當 [...]
+
+**描述您想要的解決方案**
+提供清晰簡潔的想要實施的功能或改進描述。
+
+**描述您考慮過的替代方案**
+提供您已探索的任何替代解決方案或功能的清晰簡潔描述。
+
+**利益與風險**
+描述此功能將帶來的好處，以及可能伴隨的任何潛在風險。
+
+**優先級和時間表**
+提供對此功能的優先級的建議，如果可能，提供實施的估計時間表。
+
+**截圖 / Mockups**
+如有必要，添加截圖或模擬圖以幫助可視化所提議的功能。
+
+**額外背景資訊**
+在這裡添加有關功能請求的任何其他上下文，參考資料或截圖。
diff --git a/.github/ISSUE_TEMPLATE/feature_request_en.md b/.github/ISSUE_TEMPLATE/feature_request_en.md
new file mode 100644
index 0000000..a1cec88
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature_request_en.md
@@ -0,0 +1,29 @@
+---
+name: Feature Request
+about: Propose a new feature or improvement for this project.
+title: '[FEATURE REQUEST] Brief description of the feature'
+labels: 'feature request'
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+Provide a clear and concise description of the problem. E.g., It's always frustrating when [...]
+
+**Describe the solution you'd like**
+Provide a clear and concise description of the feature or improvement you want to see implemented.
+
+**Describe alternatives you've considered**
+Provide a clear and concise description of any alternative solutions or features you've explored.
+
+**Benefits and Risks**
+Describe the benefits this feature will bring, and any potential risks that might come with it.
+
+**Prioritization and Timeline**
+Provide your suggestions on the priority of this feature, and if possible, an estimated timeline for implementation.
+
+**Screenshots / Mockups**
+If applicable, add screenshots or mockups to help visualize the proposed feature.
+
+**Additional context**
+Add any other context, references, or screenshots about the feature request here.
diff --git a/.github/workflows/.pylintrc b/.github/workflows/.pylintrc
new file mode 100644
index 0000000..12ccc22
--- /dev/null
+++ b/.github/workflows/.pylintrc
@@ -0,0 +1,539 @@
+[MASTER]
+
+# A comma-separated list of package or module names from where C extensions may
+# be loaded. Extensions are loading into the active Python interpreter and may
+# run arbitrary code.
+extension-pkg-whitelist=
+
+# Specify a score threshold to be exceeded before program exits with error.
+fail-under=10.0
+
+# Add files or directories to the blacklist. They should be base names, not
+# paths.
+ignore=CVS,cpuinfo.py,system_info.py
+
+# Add files or directories matching the regex patterns to the blacklist. The
+# regex matches against base names, not paths.
+ignore-patterns=
+
+# Python code to execute, usually for sys.path manipulation such as
+# pygtk.require().
+init-hook='import sys; import os; sys.path.append(os.getcwd())'
+#init-hook='import sys; import os'
+
+# Use multiple processes to speed up Pylint. Specifying 0 will auto-detect the
+# number of processors available to use.
+jobs=1
+
+# Control the amount of potential inferred values when inferring a single
+# object. This can help the performance when dealing with large functions or
+# complex, nested conditions.
+limit-inference-results=100
+
+# List of plugins (as comma separated values of python module names) to load,
+# usually to register additional checkers.
+load-plugins=
+
+# Pickle collected data for later comparisons.
+persistent=yes
+
+# When enabled, pylint would attempt to guess common misconfiguration and emit
+# user-friendly hints instead of false-positive error messages.
+suggestion-mode=yes
+
+# Allow loading of arbitrary C extensions. Extensions are imported into the
+# active Python interpreter and may run arbitrary code.
+unsafe-load-any-extension=no
+
+
+[MESSAGES CONTROL]
+
+# Only show warnings with the listed confidence levels. Leave empty to show
+# all. Valid levels: HIGH, INFERENCE, INFERENCE_FAILURE, UNDEFINED.
+confidence=
+
+# Disable the message, report, category or checker with the given id(s). You
+# can either give multiple identifiers separated by comma (,) or put this
+# option multiple times (only on the command line, not in the configuration
+# file where it should appear only once). You can also use "--disable=all" to
+# disable everything first and then reenable specific checks. For example, if
+# you want to run only the similarities checker, you can use "--disable=all
+# --enable=similarities". If you want to run only the classes checker, but have
+# no Warning level messages displayed, use "--disable=all --enable=classes
+# --disable=W".
+disable=all
+
+# Enable the message, report, category or checker with the given id(s). You can
+# either give multiple identifier separated by comma (,) or put this option
+# multiple time (only on the command line, not in the configuration file where
+# it should appear only once). See also the "--disable" option for examples.
+enable=
+    F,
+    unreachable,
+    duplicate-key,
+    unnecessary-semicolon,
+    global-variable-not-assigned,
+    unused-variable,
+    unused-wildcard-import,
+    binary-op-exception,
+    bad-format-string,
+    anomalous-backslash-in-string,
+    bad-open-mode,
+    E0001,E0011,E0012,E0100,E0101,
+    E0102,E0103,E0104,E0105,E0107,
+    E0108,E0110,E0111,E0112,E0113,
+    E0114,E0115,E0116,E0117,E0118,
+    E0202,E0203,E0211,E0213,E0236,
+    E0237,E0238,E0239,E0240,E0241,
+    E0301,E0302,E0303,E0401,E0402,
+    E0601,E0602,E0603,E0604,E0611,
+    E0632,E0633,E0701,E0702,E0703,
+    E0704,E0710,E0711,E0712,E1003,
+    #E1102,
+    E1111,E1120,E1121,E1123,
+    E1124,E1125,E1126,E1127,E1128,
+    E1129,E1130,E1131,E1132,E1133,
+    E1134,E1135,E1136,E1137,E1138,
+    E1139,E1200,E1201,E1205,E1206,
+    E1300,E1301,E1302,E1303,E1304,
+    E1305,E1306,E1310,E1700,E1701,
+
+[REPORTS]
+
+# Python expression which should return a score less than or equal to 10. You
+# have access to the variables 'error', 'warning', 'refactor', and 'convention'
+# which contain the number of messages in each category, as well as 'statement'
+# which is the total number of statements analyzed. This score is used by the
+# global evaluation report (RP0004).
+evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)
+
+# Template used to display messages. This is a python new-style format string
+# used to format the message information. See doc for all details.
+#msg-template=
+
+# Set the output format. Available formats are text, parseable, colorized, json
+# and msvs (visual studio). You can also give a reporter class, e.g.
+# mypackage.mymodule.MyReporterClass.
+output-format=text
+
+# Tells whether to display a full report or only the messages.
+reports=no
+
+# Activate the evaluation score.
+score=yes
+
+
+[REFACTORING]
+
+# Maximum number of nested blocks for function / method body
+max-nested-blocks=5
+
+# Complete name of functions that never returns. When checking for
+# inconsistent-return-statements if a never returning function is called then
+# it will be considered as an explicit return statement and no message will be
+# printed.
+never-returning-functions=sys.exit
+
+
+[STRING]
+
+# This flag controls whether inconsistent-quotes generates a warning when the
+# character used as a quote delimiter is used inconsistently within a module.
+check-quote-consistency=no
+
+# This flag controls whether the implicit-str-concat should generate a warning
+# on implicit string concatenation in sequences defined over several lines.
+check-str-concat-over-line-jumps=no
+
+
+[BASIC]
+
+# Naming style matching correct argument names.
+argument-naming-style=snake_case
+
+# Regular expression matching correct argument names. Overrides argument-
+# naming-style.
+#argument-rgx=
+
+# Naming style matching correct attribute names.
+attr-naming-style=snake_case
+
+# Regular expression matching correct attribute names. Overrides attr-naming-
+# style.
+#attr-rgx=
+
+# Bad variable names which should always be refused, separated by a comma.
+bad-names=foo,
+          bar,
+          baz,
+          toto,
+          tutu,
+          tata
+
+# Bad variable names regexes, separated by a comma. If names match any regex,
+# they will always be refused
+bad-names-rgxs=
+
+# Naming style matching correct class attribute names.
+class-attribute-naming-style=any
+
+# Regular expression matching correct class attribute names. Overrides class-
+# attribute-naming-style.
+#class-attribute-rgx=
+
+# Naming style matching correct class names.
+class-naming-style=PascalCase
+
+# Regular expression matching correct class names. Overrides class-naming-
+# style.
+#class-rgx=
+
+# Naming style matching correct constant names.
+const-naming-style=UPPER_CASE
+
+# Regular expression matching correct constant names. Overrides const-naming-
+# style.
+#const-rgx=
+
+# Minimum line length for functions/classes that require docstrings, shorter
+# ones are exempt.
+docstring-min-length=-1
+
+# Naming style matching correct function names.
+function-naming-style=snake_case
+
+# Regular expression matching correct function names. Overrides function-
+# naming-style.
+#function-rgx=
+
+# Good variable names which should always be accepted, separated by a comma.
+good-names=i,
+           j,
+           k,
+           ex,
+           Run,
+           _
+
+# Good variable names regexes, separated by a comma. If names match any regex,
+# they will always be accepted
+good-names-rgxs=
+
+# Include a hint for the correct naming format with invalid-name.
+include-naming-hint=no
+
+# Naming style matching correct inline iteration names.
+inlinevar-naming-style=any
+
+# Regular expression matching correct inline iteration names. Overrides
+# inlinevar-naming-style.
+#inlinevar-rgx=
+
+# Naming style matching correct method names.
+method-naming-style=snake_case
+
+# Regular expression matching correct method names. Overrides method-naming-
+# style.
+#method-rgx=
+
+# Naming style matching correct module names.
+module-naming-style=snake_case
+
+# Regular expression matching correct module names. Overrides module-naming-
+# style.
+#module-rgx=
+
+# Colon-delimited sets of names that determine each other's naming style when
+# the name regexes allow several styles.
+name-group=
+
+# Regular expression which should only match function or class names that do
+# not require a docstring.
+no-docstring-rgx=^_
+
+# List of decorators that produce properties, such as abc.abstractproperty. Add
+# to this list to register other decorators that produce valid properties.
+# These decorators are taken in consideration only for invalid-name.
+property-classes=abc.abstractproperty
+
+# Naming style matching correct variable names.
+variable-naming-style=snake_case
+
+# Regular expression matching correct variable names. Overrides variable-
+# naming-style.
+#variable-rgx=
+
+
+[TYPECHECK]
+
+# List of decorators that produce context managers, such as
+# contextlib.contextmanager. Add to this list to register other decorators that
+# produce valid context managers.
+contextmanager-decorators=contextlib.contextmanager
+
+# List of members which are set dynamically and missed by pylint inference
+# system, and so shouldn't trigger E1101 when accessed. Python regular
+# expressions are accepted.
+generated-members=torch.*,numpy.*
+
+# Tells whether missing members accessed in mixin class should be ignored. A
+# mixin class is detected if its name ends with "mixin" (case insensitive).
+ignore-mixin-members=yes
+
+# Tells whether to warn about missing members when the owner of the attribute
+# is inferred to be None.
+ignore-none=yes
+
+# This flag controls whether pylint should warn about no-member and similar
+# checks whenever an opaque object is returned when inferring. The inference
+# can return multiple potential results while evaluating a Python object, but
+# some branches might not be evaluated, which results in partial inference. In
+# that case, it might be useful to still emit no-member and other checks for
+# the rest of the inferred objects.
+ignore-on-opaque-inference=yes
+
+# List of class names for which member attributes should not be checked (useful
+# for classes with dynamically set attributes). This supports the use of
+# qualified names.
+ignored-classes=optparse.Values,thread._local,_thread._local
+
+# List of module names for which member attributes should not be checked
+# (useful for modules/projects where namespaces are manipulated during runtime
+# and thus existing member attributes cannot be deduced by static analysis). It
+# supports qualified module names, as well as Unix pattern matching.
+ignored-modules=gpu_nms,cpu_nms
+
+# Show a hint with possible names when a member name was not found. The aspect
+# of finding the hint is based on edit distance.
+missing-member-hint=yes
+
+# The minimum edit distance a name should have in order to be considered a
+# similar match for a missing member name.
+missing-member-hint-distance=1
+
+# The total number of similar names that should be taken in consideration when
+# showing a hint for a missing member.
+missing-member-max-choices=1
+
+# List of decorators that change the signature of a decorated function.
+signature-mutators=
+
+
+[SPELLING]
+
+# Limits count of emitted suggestions for spelling mistakes.
+max-spelling-suggestions=4
+
+# Spelling dictionary name. Available dictionaries: none. To make it work,
+# install the python-enchant package.
+spelling-dict=
+
+# List of comma separated words that should not be checked.
+spelling-ignore-words=
+
+# A path to a file that contains the private dictionary; one word per line.
+spelling-private-dict-file=
+
+# Tells whether to store unknown words to the private dictionary (see the
+# --spelling-private-dict-file option) instead of raising a message.
+spelling-store-unknown-words=no
+
+
+[MISCELLANEOUS]
+
+# List of note tags to take in consideration, separated by a comma.
+notes=FIXME,
+      XXX,
+      TODO
+
+# Regular expression of note tags to take in consideration.
+#notes-rgx=
+
+
+[SIMILARITIES]
+
+# Ignore comments when computing similarities.
+ignore-comments=yes
+
+# Ignore docstrings when computing similarities.
+ignore-docstrings=yes
+
+# Ignore imports when computing similarities.
+ignore-imports=no
+
+# Minimum lines number of a similarity.
+min-similarity-lines=4
+
+
+[VARIABLES]
+
+# List of additional names supposed to be defined in builtins. Remember that
+# you should avoid defining new builtins when possible.
+additional-builtins=
+
+# Tells whether unused global variables should be treated as a violation.
+allow-global-unused-variables=yes
+
+# List of strings which can identify a callback function by name. A callback
+# name must start or end with one of those strings.
+callbacks=cb_,
+          _cb
+
+# A regular expression matching the name of dummy variables (i.e. expected to
+# not be used).
+dummy-variables-rgx=_+$|(_[a-zA-Z0-9_]*[a-zA-Z0-9]+?$)|dummy|^ignored_|^unused_
+
+# Argument names that match this expression will be ignored. Default to name
+# with leading underscore.
+ignored-argument-names=_.*|^ignored_|^unused_
+
+# Tells whether we should check for unused import in __init__ files.
+init-import=no
+
+# List of qualified module names which can have objects that can redefine
+# builtins.
+redefining-builtins-modules=six.moves,past.builtins,future.builtins,builtins,io
+
+
+[LOGGING]
+
+# The type of string formatting that logging methods do. `old` means using %
+# formatting, `new` is for `{}` formatting.
+logging-format-style=old
+
+# Logging modules to check that the string format arguments are in logging
+# function parameter format.
+logging-modules=logging
+
+
+[FORMAT]
+
+# Expected format of line ending, e.g. empty (any line ending), LF or CRLF.
+expected-line-ending-format=
+
+# Regexp for a line that is allowed to be longer than the limit.
+ignore-long-lines=^\s*(# )?<?https?://\S+>?$
+
+# Number of spaces of indent required inside a hanging or continued line.
+indent-after-paren=4
+
+# String used as indentation unit. This is usually "    " (4 spaces) or "\t" (1
+# tab).
+indent-string='    '
+
+# Maximum number of characters on a single line.
+max-line-length=100
+
+# Maximum number of lines in a module.
+max-module-lines=1000
+
+# Allow the body of a class to be on the same line as the declaration if body
+# contains single statement.
+single-line-class-stmt=no
+
+# Allow the body of an if to be on the same line as the test if there is no
+# else.
+single-line-if-stmt=no
+
+
+[DESIGN]
+
+# Maximum number of arguments for function / method.
+max-args=5
+
+# Maximum number of attributes for a class (see R0902).
+max-attributes=7
+
+# Maximum number of boolean expressions in an if statement (see R0916).
+max-bool-expr=5
+
+# Maximum number of branch for function / method body.
+max-branches=12
+
+# Maximum number of locals for function / method body.
+max-locals=15
+
+# Maximum number of parents for a class (see R0901).
+max-parents=7
+
+# Maximum number of public methods for a class (see R0904).
+max-public-methods=20
+
+# Maximum number of return / yield for function / method body.
+max-returns=6
+
+# Maximum number of statements in function / method body.
+max-statements=50
+
+# Minimum number of public methods for a class (see R0903).
+min-public-methods=2
+
+
+[CLASSES]
+
+# List of method names used to declare (i.e. assign) instance attributes.
+defining-attr-methods=__init__,
+                      __new__,
+                      setUp,
+                      __post_init__
+
+# List of member names, which should be excluded from the protected access
+# warning.
+exclude-protected=_asdict,
+                  _fields,
+                  _replace,
+                  _source,
+                  _make
+
+# List of valid names for the first argument in a class method.
+valid-classmethod-first-arg=cls
+
+# List of valid names for the first argument in a metaclass class method.
+valid-metaclass-classmethod-first-arg=cls
+
+
+[IMPORTS]
+
+# List of modules that can be imported at any level, not just the top level
+# one.
+allow-any-import-level=
+
+# Allow wildcard imports from modules that define __all__.
+allow-wildcard-with-all=no
+
+# Analyse import fallback blocks. This can be used to support both Python 2 and
+# 3 compatible code, which means that the block might have code that exists
+# only in one or another interpreter, leading to false positives when analysed.
+analyse-fallback-blocks=no
+
+# Deprecated modules which should not be used, separated by a comma.
+deprecated-modules=optparse,tkinter.tix
+
+# Create a graph of external dependencies in the given file (report RP0402 must
+# not be disabled).
+ext-import-graph=
+
+# Create a graph of every (i.e. internal and external) dependencies in the
+# given file (report RP0402 must not be disabled).
+import-graph=
+
+# Create a graph of internal dependencies in the given file (report RP0402 must
+# not be disabled).
+int-import-graph=
+
+# Force import order to recognize a module as part of the standard
+# compatibility libraries.
+known-standard-library=
+
+# Force import order to recognize a module as part of a third party library.
+known-third-party=enchant
+
+# Couples of modules and preferred modules, separated by a comma.
+preferred-modules=
+
+
+[EXCEPTIONS]
+
+# Exceptions that will emit a warning when being caught. Defaults to
+# "BaseException, Exception".
+overgeneral-exceptions=BaseException,
+                       Exception
diff --git a/.github/workflows/pull_request.yml b/.github/workflows/pull_request.yml
new file mode 100644
index 0000000..6f36453
--- /dev/null
+++ b/.github/workflows/pull_request.yml
@@ -0,0 +1,57 @@
+name: Pull Request
+
+on:
+  workflow_dispatch:
+  pull_request:
+    branches:
+      - main
+
+jobs:
+  test:
+    name: Run Tests
+    runs-on: [self-hosted, unicorn]
+    strategy:
+      matrix:
+        python-version:
+          - "3.10"
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install packages
+        run: |
+          python -m pip install pytest wheel pylint pylint-flask
+          python setup.py bdist_wheel
+          wheel_file=$(echo dist/*.whl)
+          python -m pip install "${wheel_file}[torch]" --force-reinstall
+
+      - name: Lint with pylint
+        run: |
+          python -m pylint ${{ github.workspace }}/chameleon \
+            --rcfile=.github/workflows/.pylintrc \
+            --load-plugins pylint_flask \
+
+      - name: Test with pytest
+        run: |
+          python -m pip install pytest pytest-cov typeguard
+
+          # Test all
+          python -m pytest tests
+
+          # Report all
+          mkdir -p tests/coverage
+          python -m pytest -x \
+            --junitxml=tests/coverage/cov-jumitxml.xml \
+            --cov=chameleon tests | tee tests/coverage/cov.txt
+
+      - name: Pytest coverage comment
+        id: coverageComment
+        uses: MishaKav/pytest-coverage-comment@main
+        with:
+          pytest-coverage-path: tests/coverage/cov.txt
+          junitxml-path: tests/coverage/cov-jumitxml.xml
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
new file mode 100644
index 0000000..25b138c
--- /dev/null
+++ b/.github/workflows/release.yml
@@ -0,0 +1,75 @@
+name: Release wheel
+
+on:
+  workflow_dispatch:
+    inputs:
+      branch:
+        description: "Create release with branch or sha1"
+        default: "main"
+        required: true
+      version_tag:
+        description: "Release version (a.b.c)"
+        required: true
+
+jobs:
+  Release:
+    runs-on: [self-hosted, unicorn]
+    strategy:
+      matrix:
+        python-version:
+          - "3.10"
+    steps:
+      - name: Checkout repo
+        uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.inputs.branch }}
+          fetch-depth: 0
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          check-latest: true
+          python-version: ${{ matrix.python-version }}
+
+      - name: Update Python version
+        run: |
+          sed -i "s/__version__ = '[0-9]\+\(\.[0-9]\+\)\{1,2\}\(rc[0-9]\+\|[ab][0-9]\+\)\?'/__version__ = '${{ github.event.inputs.version_tag }}'/g" chameleon/__init__.py
+
+      - name: Commit changes
+        run: |
+          git add .
+          git commit -m "[C] Update python version"
+
+      - name: Commit & Push changes of Version Updating
+        uses: ad-m/github-push-action@master
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          branch: ${{ github.event.inputs.branch }}
+
+      - name: Build wheel
+        run: |
+          python -m pip install wheel twine
+          python setup.py bdist_wheel
+          python -m twine upload -u admin -p ${{ secrets.TWINE_PASSWORD }} \
+            --repository-url http://192.168.0.105:18080 \
+              ${{ github.workspace }}/dist/*.whl
+
+      - name: Create Release
+        id: create_release
+        uses: ncipollo/release-action@v1
+        with:
+          tag: ${{ github.event.inputs.version_tag }}
+          name: Release ${{ github.event.inputs.version_tag }}
+          body: |
+            # Release Note
+          allowUpdates: true
+          artifactErrorsFailBuild: true
+          draft: true
+          prerelease: false
+          generateReleaseNotes: true
+          discussionCategory: General
+          artifacts: "${{ github.workspace }}/dist/chameleon-*-none-any.whl"
+
+      - name: Clean wheel
+        run: |
+          rm -fr ${{ github.workspace }}/dist/*
diff --git a/chameleon/__init__.py b/chameleon/__init__.py
new file mode 100644
index 0000000..be1aed3
--- /dev/null
+++ b/chameleon/__init__.py
@@ -0,0 +1,10 @@
+from .backbone import *
+from .efficientdet import *
+from .metrics import *
+from .neck import *
+from .nn import *
+from .optim import *
+from .tools import *
+from .transformers import *
+
+__version__ = '0.1.0'
diff --git a/chameleon/backbone/__init__.py b/chameleon/backbone/__init__.py
new file mode 100644
index 0000000..ef12c5c
--- /dev/null
+++ b/chameleon/backbone/__init__.py
@@ -0,0 +1,40 @@
+import fnmatch
+from functools import partial
+
+from timm import create_model, list_models
+
+from .gpunet import GPUNet
+
+__all__ = [
+    'BACKBONE', 'build_backbone', 'list_backbones',
+]
+
+GPUNET_NAMES = [
+    'gpunet_0',
+    'gpunet_1',
+    'gpunet_2',
+    'gpunet_p0',
+    'gpunet_p1',
+    'gpunet_d1',
+    'gpunet_d2',
+]
+
+
+BACKBONE = {
+    **{k: partial(create_model, model_name=k) for k in list_models()},
+    **{name: partial(GPUNet.build_gpunet, name=name) for name in GPUNET_NAMES},
+}
+
+
+def build_backbone(name: str, **kwargs):
+    if name not in BACKBONE:
+        raise ValueError(f'Backbone={name} is not supported.')
+    return BACKBONE[name](**kwargs)
+
+
+def list_backbones(filter=''):
+    model_list = list(BACKBONE.keys())
+    if len(filter):
+        return fnmatch.filter(model_list, filter)  # include these models
+    else:
+        return model_list
diff --git a/chameleon/backbone/gpunet.py b/chameleon/backbone/gpunet.py
new file mode 100644
index 0000000..3c9f0ff
--- /dev/null
+++ b/chameleon/backbone/gpunet.py
@@ -0,0 +1,125 @@
+from typing import List, Optional
+
+import torch
+import torch.nn as nn
+
+from ..nn import PowerModule
+from ..tools import has_children
+
+__all__ = ['GPUNet']
+
+
+class GPUNet(PowerModule):
+
+    MetaParams = {
+        'gpunet_0': {
+            'name': 'GPUNet-0',
+            'stage_index': [3, 5, 8, 11, 13]
+        },
+        'gpunet_1': {
+            'name': 'GPUNet-1',
+            'stage_index': [2, 4, 6, 11, 15]
+        },
+        'gpunet_2': {
+            'name': 'GPUNet-2',
+            'stage_index': [4, 5, 8, 18, 33]
+        },
+        'gpunet_p0': {
+            'name': 'GPUNet-P0',
+            'stage_index': [3, 4, 7, 10, 14]
+        },
+        'gpunet_p1': {
+            'name': 'GPUNet-P1',
+            'stage_index': [3, 6, 8, 11, 15]
+        },
+        'gpunet_d1': {
+            'name': 'GPUNet-D1',
+            'stage_index': [2, 5, 9, 17, 23]
+        },
+        'gpunet_d2': {
+            'name': 'GPUNet-D2',
+            'stage_index': [2, 5, 9, 19, 26]
+        },
+    }
+
+    def __init__(
+        self,
+        stages: List[torch.nn.Sequential],
+        out_indices: Optional[List[int]] = None
+    ):
+        super().__init__()
+        for i, stage in enumerate(stages):
+            self.add_module(f'stage_{i}', stage)
+
+        self.channels = []
+        with torch.no_grad():
+            self.out_indices = None
+            for x in self.forward(torch.rand(1, 3, 224, 224)):
+                self.channels.append(x.shape[1])
+
+        self.out_indices = out_indices
+
+    def forward(self, x: torch.tensor) -> List[torch.tensor]:
+        outs = []
+        for i in range(5):
+            x = self._modules[f'stage_{i}'](x)
+            outs.append(x)
+
+        if self.out_indices is not None:
+            outs = [outs[i] for i in self.out_indices]
+
+        return outs
+
+    def __repr__(self):
+        return str(self._modules)
+
+    @ classmethod
+    def build_gpunet(
+        cls,
+        name: str,
+        pretrained: bool = False,
+        precision: str = 'fp32',
+        out_indices: Optional[List[int]] = None,
+    ):
+
+        def _replace_padding(model):
+            for m in model.children():
+                if has_children(m):
+                    _replace_padding(m)
+                else:
+                    if isinstance(m, nn.Conv2d) and getattr(m, 'stride') == (2, 2):
+                        ksize = tuple(map(lambda x: int(x // 2),
+                                      getattr(m, 'kernel_size')))
+                        setattr(m, 'padding', ksize)
+
+        allow_model_name = '\n\t'.join(
+            [name for name in cls.MetaParams.keys()])
+        if name not in cls.MetaParams:
+            raise ValueError(
+                f'Input `name`: {name} is invalid.\n'
+                'Try the model names as follow: \n'
+                f'\t{allow_model_name}'
+                f'\n    Ref: https://pytorch.org/hub/nvidia_deeplearningexamples_gpunet/'
+            )
+
+        model = torch.hub.load(
+            'NVIDIA/DeepLearningExamples:torchhub',
+            'nvidia_gpunet',
+            pretrained=pretrained,
+            model_type=cls.MetaParams[name]['name'],
+            model_math=precision,
+            trust_repo=True
+        )
+
+        stages = []
+        start_idx = 0
+        for stop_idx in cls.MetaParams[name]['stage_index']:
+            layers = model.network[start_idx: stop_idx]
+            _replace_padding(layers)
+            stages.append(layers)
+            start_idx = stop_idx
+
+        return cls(
+            stages=stages,
+            out_indices=out_indices,
+        )
diff --git a/chameleon/efficientdet/__init__.py b/chameleon/efficientdet/__init__.py
new file mode 100644
index 0000000..3a0503e
--- /dev/null
+++ b/chameleon/efficientdet/__init__.py
@@ -0,0 +1 @@
+from .efficientdet import EfficientDet
diff --git a/chameleon/efficientdet/efficientdet.py b/chameleon/efficientdet/efficientdet.py
new file mode 100644
index 0000000..ac508ba
--- /dev/null
+++ b/chameleon/efficientdet/efficientdet.py
@@ -0,0 +1,74 @@
+from typing import List
+
+import torch
+from timm import create_model
+
+from ..neck import BiFPNs
+from ..nn import PowerModule
+
+__all__ = ['EfficientDet']
+
+
+class EfficientDet(PowerModule):
+
+    def __init__(self, compound_coef: int = 0, pretrained: bool = True, **kwargs):
+        """
+        EfficientDet backbone.
+
+        Args:
+            compound_coef (int, optional):
+                Compound scaling factor for the model architecture. Defaults to 0.
+            pretrained (bool, optional):
+                If True, returns a model pre-trained on ImageNet. Defaults to True.
+        """
+        super().__init__()
+        self.compound_coef = compound_coef
+
+        # Number of filters for each FPN layer at each compound coefficient
+        self.fpn_num_filters = [64, 88, 112, 160, 224, 288, 384, 384, 384]
+
+        # Number of BiFPN repeats for each compound coefficient
+        self.fpn_cell_repeats = [3, 4, 5, 6, 7, 7, 8, 8, 8]
+
+        # Number of channels for each input feature map at each compound coefficient
+        conv_channel_coef = {
+            # the channels of P3/P4/P5.
+            0: [40, 112, 320],
+            1: [40, 112, 320],
+            2: [48, 120, 352],
+            3: [48, 136, 384],
+            4: [56, 160, 448],
+            5: [64, 176, 512],
+            6: [72, 200, 576],
+            7: [80, 224, 640],
+            8: [88, 248, 704],
+        }
+
+        self.backbone = create_model(
+            f'efficientnet_b{compound_coef}',
+            pretrained=pretrained,
+            features_only=True,
+            exportable=True,
+        )
+
+        self.bifpn = BiFPNs(
+            in_channels_list=conv_channel_coef[compound_coef],
+            out_channels=self.fpn_num_filters[compound_coef],
+            n_bifpn=self.fpn_cell_repeats[compound_coef],
+            attention=True if compound_coef < 6 else False,
+            extra_layers=3 if compound_coef > 7 else 2,
+            **kwargs,
+        )
+
+    def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
+        """
+        Forward pass of the EfficientDet backbone.
+
+        Args:
+            x (torch.Tensor): Input tensor of shape (batch_size, channels, height, width).
+
+        Returns:
+            List[torch.Tensor]: A list of feature maps, each with shape (batch_size, channels, height, width),
+                                where the number of feature maps is equal to the number of BiFPN layers.
+        """
+        return self.bifpn(self.backbone(x)[2:])
diff --git a/chameleon/metrics/__init__.py b/chameleon/metrics/__init__.py
new file mode 100644
index 0000000..fb5bed3
--- /dev/null
+++ b/chameleon/metrics/__init__.py
@@ -0,0 +1 @@
+from .normalized_levenshtein_similarity import NormalizedLevenshteinSimilarity
diff --git a/chameleon/metrics/normalized_levenshtein_similarity.py b/chameleon/metrics/normalized_levenshtein_similarity.py
new file mode 100644
index 0000000..ee47ab4
--- /dev/null
+++ b/chameleon/metrics/normalized_levenshtein_similarity.py
@@ -0,0 +1,152 @@
+from typing import Any, Literal, Optional, Sequence, Union
+
+import torch
+from torch import Tensor
+from torchmetrics.metric import Metric
+from torchmetrics.text import EditDistance
+from torchmetrics.utilities.data import dim_zero_cat
+
+
+class NormalizedLevenshteinSimilarity(Metric):
+    """
+    Normalized Levenshtein Similarity (NLS) is a metric that computes the
+    normalized Levenshtein similarity between two sequences.
+    This metric is calculated as 1 - (levenshtein_distance / max_length),
+    where `levenshtein_distance` is the Levenshtein distance between the two
+    sequences and `max_length` is the maximum length of the two sequences.
+
+    NLS aims to provide a similarity measure for character sequences
+    (such as text), making it useful in areas like text similarity analysis,
+    Optical Character Recognition (OCR), and Natural Language Processing (NLP).
+
+    This class inherits from `Metric` and uses the `EditDistance` class to
+    compute the Levenshtein distance.
+
+    Inputs to the ``update`` and ``compute`` methods are as follows:
+
+    - ``preds`` (:class:`~Union[str, Sequence[str]]`):
+        Predicted text sequences or a collection of sequences.
+    - ``target`` (:class:`~Union[str, Sequence[str]]`):
+        Target text sequences or a collection of sequences.
+
+    Output from the ``compute`` method is as follows:
+
+    - ``nls`` (:class:`~torch.Tensor`): A tensor containing the NLS value.
+        Returns 0.0 when there are no samples; otherwise, it returns the NLS.
+
+    Args:
+        substitution_cost:
+            The cost of substituting one character for another. Default is 1.
+        reduction:
+            Method to aggregate metric scores.
+            Default is 'mean', options are 'sum' or None.
+
+            - ``'mean'``: takes the mean over samples, which is ANLS.
+            - ``'sum'``: takes the sum over samples
+            - ``None`` or ``'none'``: returns the score per sample
+
+        kwargs: Additional keyword arguments.
+
+    Example::
+        Multiple strings example:
+
+        >>> metric = NormalizedLevenshteinSimilarity(reduction=None)
+        >>> preds = ["rain", "lnaguaeg"]
+        >>> target = ["shine", "language"]
+        >>> metric(preds, target)
+        tensor([0.4000, 0.5000])
+        >>> metric = NormalizedLevenshteinSimilarity(reduction="mean")
+        >>> metric(preds, target)
+        tensor(0.4500)
+    """
+
+    def __init__(
+        self,
+        substitution_cost: int = 1,
+        reduction: Optional[Literal["mean", "sum", "none"]] = "mean",
+        **kwargs: Any
+    ) -> None:
+        super().__init__(**kwargs)
+        self.edit_distance = EditDistance(
+            substitution_cost=substitution_cost,
+            reduction=None  # Set to None to get distances for all string pairs
+        )
+
+        allowed_reduction = (None, "mean", "sum", "none")
+        if reduction not in allowed_reduction:
+            raise ValueError(
+                f"Expected argument `reduction` to be one of {allowed_reduction}, but got {reduction}")
+        self.reduction = reduction
+
+        if self.reduction == "none" or self.reduction is None:
+            self.add_state(
+                "nls_values_list",
+                default=[],
+                dist_reduce_fx="cat"
+            )
+        else:
+            self.add_state(
+                "nls_score",
+                default=torch.tensor(0.0),
+                dist_reduce_fx="sum"
+            )
+            self.add_state(
+                "num_elements",
+                default=torch.tensor(0),
+                dist_reduce_fx="sum"
+            )
+
+    def update(self, preds: Union[str, Sequence[str]], target: Union[str, Sequence[str]]) -> None:
+        """Update state with predictions and targets."""
+        if isinstance(preds, str):
+            preds = [preds]
+        if isinstance(target, str):
+            target = [target]
+
+        distances = self.edit_distance(preds, target)
+        max_lengths = torch.tensor([
+            max(len(p), len(t))
+            for p, t in zip(preds, target)
+        ], dtype=torch.float)
+
+        ratio = torch.where(
+            max_lengths == 0,
+            torch.zeros_like(distances).float(),
+            distances.float() / max_lengths
+        )
+
+        nls_values = 1 - ratio
+
+        if self.reduction == "none" or self.reduction is None:
+            self.nls_values_list.append(nls_values)
+        else:
+            self.nls_score += nls_values.sum()
+            self.num_elements += nls_values.shape[0]
+
+    def _compute(
+        self,
+        nls_score: Tensor,
+        num_elements: Union[Tensor, int],
+    ) -> Tensor:
+        """Compute the ANLS over state."""
+        if nls_score.numel() == 0:
+            return torch.tensor(0, dtype=torch.int32)
+        if self.reduction == "mean":
+            return nls_score.sum() / num_elements
+        if self.reduction == "sum":
+            return nls_score.sum()
+        if self.reduction is None or self.reduction == "none":
+            return nls_score
+
+    def compute(self) -> torch.Tensor:
+        """Compute the NLS over state."""
+        if self.reduction == "none" or self.reduction is None:
+            return self._compute(dim_zero_cat(self.nls_values_list), 1)
+        return self._compute(self.nls_score, self.num_elements)
+
+
+if __name__ == "__main__":
+    anls = NormalizedLevenshteinSimilarity(reduction='mean')
+    preds = ["rain", "lnaguaeg"]
+    target = ["shine", "language"]
+    print(anls(preds, target))
diff --git a/chameleon/neck/__init__.py b/chameleon/neck/__init__.py
new file mode 100644
index 0000000..081cfe9
--- /dev/null
+++ b/chameleon/neck/__init__.py
@@ -0,0 +1,31 @@
+import fnmatch
+
+from .bifpn import BiFPN, BiFPNs
+from .fpn import FPN, FPNs
+
+NECK = {
+    'fpn': FPN,
+    'fpns': FPNs,
+    'bifpn': BiFPN,
+    'bifpns': BiFPNs,
+}
+
+__all__ = [
+    'NECK', 'BiFPN', 'BiFPNs', 'FPN', 'FPNs', 'build_neck', 'list_necks',
+]
+
+def build_neck(name: str, **kwargs):
+    if name in NECK:
+        neck = NECK[name](**kwargs)
+    else:
+        raise ValueError(f'Neck={name} is not support.')
+
+    return neck
+
+
+def list_necks(filter=''):
+    model_list = list(NECK.keys())
+    if len(filter):
+        return fnmatch.filter(model_list, filter)  # include these models
+    else:
+        return model_list
diff --git a/chameleon/neck/bifpn.py b/chameleon/neck/bifpn.py
new file mode 100644
index 0000000..def0ab6
--- /dev/null
+++ b/chameleon/neck/bifpn.py
@@ -0,0 +1,375 @@
+from copy import deepcopy
+from typing import List, Optional, Union
+
+import torch
+import torch.nn as nn
+
+from ..nn import (CNN2Dcell, PowerModule, SeparableConvBlock, WeightedSum,
+                  build_activation, build_norm)
+
+__all__ = ['BiFPN', 'BiFPNs']
+
+
+class BiFPN(PowerModule):
+
+    def __init__(
+        self,
+        in_channels_list: List[int],
+        out_channels: int,
+        extra_layers: int = 0,
+        out_indices: Optional[List[int]] = None,
+        norm: Optional[Union[dict, nn.Module]] = None,
+        act: Optional[Union[dict, nn.Module]] = None,
+        upsample_mode: str = 'bilinear',
+        use_conv: bool = False,
+        attention: bool = True,
+    ) -> None:
+        """
+        BiFPN (Bidirectional Feature Pyramid Network) is a type of feature extraction
+        module commonly used in object detection and instance segmentation tasks.
+        It was introduced in the EfficientDet paper by Mingxing Tan et al. in 2020.
+
+        BiFPN is an extension of the FPN (Feature Pyramid Network) module that
+        incorporates bidirectional connections between feature maps of different
+        resolutions. It takes multiple feature maps with different spatial resolutions
+        and merges them into a single feature map with consistent resolution.
+
+        The bidirectional connections enable efficient feature propagation and
+        fusion across multiple scales, improving the quality of the extracted
+        features and ultimately leading to better performance in object detection
+        and instance segmentation.
+
+        BiFPN consists of several stages of repeated bifurcated networks, where
+        each stage processes the output of the previous stage. The bifurcated
+        network consists of a top-down path that performs upsampling, and a
+        bottom-up path that performs downsampling. The outputs of both paths are
+        then fused together through a learnable weight sharing mechanism. This
+        process is repeated across different scales and levels of the feature
+        pyramid, creating a hierarchical feature representation that captures
+        rich and diverse information.
+
+        BiFPN has become a popular choice for many state-of-the-art object detection
+        and instance segmentation architectures due to its efficiency and effectiveness
+        in feature extraction.
+
+        Args:
+            in_channels_list (List[int]):
+                A list of integers representing the number of channels in each
+                input feature map.
+            out_channels (int):
+                The number of output channels for all feature maps.
+            extra_layers (int, optional):
+                The number of extra down-sampling layers to add. Defaults to 0.
+            out_indices (Optional[List[int]], optional):
+                A list of integers indicating the indices of the feature maps
+                to output. If None, all feature maps are output. Defaults to None.
+            norm Optional[Union[dict, nn.Module]]:
+                Optional normalization module or dictionary of its parameters.
+                Defaults to None.
+            act Optional[Union[dict, nn.Module]]:
+                Optional activation function or dictionary of its parameters.
+                Defaults to None.
+            upsample_mode (str, optional):
+                The type of upsampling method to use, which can be 'bilinear'
+                or 'nearest'. Bilinear upsampling is recommended in most cases
+                for its better performance. Nearest neighbor upsampling may be
+                useful when input feature maps have a small spatial resolution.
+                Defaults to 'bilinear'.
+            use_conv (bool, optional):
+                In BiFPN, SeparableConvBlock is used by default to replace CNN.
+                If you want to use a general CNN, set use_conv to True.
+                Defaults to False.
+            attention (bool, optional):
+                Whether to use attention mechanism in each WeightedSum block.
+
+        Raises:
+            ValueError: If the number of input feature maps does not match the
+            length of `in_channels_list` or if `extra_layers` is negative.
+        """
+        super().__init__()
+
+        self.attention = attention
+        self.upsample_mode = upsample_mode
+        self.in_channels_list = in_channels_list
+
+        num_in_features = len(in_channels_list)
+        num_out_features = num_in_features + extra_layers
+
+        if extra_layers < 0:
+            raise ValueError('extra_layers < 0, which is not invalid.')
+
+        conv2d = CNN2Dcell if use_conv else SeparableConvBlock
+
+        # Lateral layers
+        conv1x1s = []
+        for i in range(num_out_features):
+            in_channels = in_channels_list[i] if i < num_in_features else in_channels_list[-1]
+            if in_channels != out_channels:
+                conv1x1s.append(
+                    CNN2Dcell(
+                        in_channels,
+                        out_channels,
+                        kernel=1,
+                        stride=1,
+                        padding=0,
+                        norm=deepcopy(norm),
+                    )
+                )
+            else:
+                conv1x1s.append(nn.Identity())
+        self.conv1x1s = nn.ModuleList(conv1x1s)
+
+        self.conv_up_3x3s = nn.ModuleList([
+            conv2d(
+                out_channels,
+                out_channels,
+                kernel=3,
+                stride=1,
+                padding=1,
+                norm=deepcopy(norm),
+                act=deepcopy(act),
+            )
+            for _ in range(num_out_features - 1)
+        ])
+
+        self.conv_down_3x3s = nn.ModuleList([
+            conv2d(
+                out_channels,
+                out_channels,
+                kernel=3,
+                stride=1,
+                padding=1,
+                norm=deepcopy(norm),
+                act=deepcopy(act),
+            )
+            for _ in range(num_out_features - 1)
+        ])
+
+        if extra_layers > 0:
+            self.extra_conv_downs = nn.ModuleList([
+                conv2d(
+                    in_channels_list[-1],
+                    in_channels_list[-1],
+                    kernel=3,
+                    stride=2,
+                    padding=1,
+                    norm=nn.BatchNorm2d(
+                        in_channels_list[-1]) if norm is not None else None,
+                    act=deepcopy(act),
+                )
+                for _ in range(extra_layers)
+            ])
+
+        self.upsamples = nn.ModuleList([
+            nn.Upsample(
+                scale_factor=2,
+                mode=upsample_mode,
+                align_corners=False if upsample_mode != 'nearest' else None,
+            )
+            for _ in range(num_out_features - 1)
+        ])
+
+        self.downsamples = nn.ModuleList([
+            nn.MaxPool2d(
+                kernel_size=3,
+                stride=2,
+                padding=1,
+            )
+            for _ in range(num_out_features - 1)
+        ])
+
+        # Weight
+        self.weighted_sum_2_input = nn.ModuleList([
+            WeightedSum(2, act=nn.ReLU(False), requires_grad=attention)
+            for _ in range(num_out_features)
+        ])
+
+        self.weighted_sum_3_input = nn.ModuleList([
+            WeightedSum(3, act=nn.ReLU(False), requires_grad=attention)
+            for _ in range(num_out_features-2)
+        ])
+
+        self.num_in_features = num_in_features
+        self.out_indices = out_indices
+        self.initialize_weights_()
+
+    def forward(self, xs: List[torch.Tensor]) -> List[torch.Tensor]:
+        """
+        illustration of a minimal bifpn unit
+            P7_0 -------------------------> P7_2 -------->
+               |-------------|                ↑
+                             ↓                |
+            P6_0 ---------> P6_1 ---------> P6_2 -------->
+               |-------------|--------------↑ ↑
+                             ↓                |
+            P5_0 ---------> P5_1 ---------> P5_2 -------->
+               |-------------|--------------↑ ↑
+                             ↓                |
+            P4_0 ---------> P4_1 ---------> P4_2 -------->
+               |-------------|--------------↑ ↑
+                             |--------------↓ |
+            P3_0 -------------------------> P3_2 -------->
+        """
+
+        if len(xs) != self.num_in_features:
+            raise ValueError(
+                'The length of given xs is not equal to the length of in_channels_list.'
+            )
+
+        # make extra levels by conv2d if needed
+        #   for example: P3, P4, P5 -> P3, P4, P5, P6, P7
+        if hasattr(self, 'extra_conv_downs'):
+            extras = []
+            x = xs[-1]
+            for conv in self.extra_conv_downs:
+                x = conv(x)
+                extras.append(x)
+            xs = xs + extras
+
+        # Fixed input channels
+        out_fixed = [self.conv1x1s[i](xs[i]) for i in range(len(xs))]
+
+        # top-down pathway
+        outs_top_down = []
+        for i in range(len(out_fixed)-1, -1, -1):
+            out = out_fixed[i]
+            if i != len(xs)-1:
+                hidden = self.weighted_sum_2_input[i](
+                    [out, self.upsamples[i](hidden)])
+                out = self.conv_up_3x3s[i](hidden)
+            hidden = out
+            outs_top_down.append(out)
+        outs_top_down = outs_top_down[::-1]
+
+        # down-top pathway
+        outs_down_top = []
+        for i in range(len(outs_top_down)):
+            out = outs_top_down[i]
+            residual = out_fixed[i]
+            if i != 0 and i != len(outs_top_down) - 1:
+                hidden = self.weighted_sum_3_input[i - 1](
+                    [out, self.downsamples[i - 1](hidden), residual])
+                out = self.conv_down_3x3s[i - 1](hidden)
+            elif i == len(outs_top_down) - 1:
+                hidden = self.weighted_sum_2_input[0](
+                    [self.downsamples[i - 1](hidden), residual])
+                out = self.conv_down_3x3s[i - 1](hidden)
+
+            hidden = out
+            outs_down_top.append(out)
+
+        if self.out_indices is not None:
+            outs_down_top = [outs_down_top[i] for i in self.out_indices]
+
+        return outs_down_top
+
+    @classmethod
+    def build_convbifpn(
+        cls,
+        in_channels_list: List[int],
+        out_channels: int,
+        extra_layers: int = 0,
+        out_indices: Optional[List[int]] = None,
+        upsample_mode: str = 'bilinear',
+        attention: bool = True,
+    ):
+        return cls(
+            in_channels_list=in_channels_list,
+            out_channels=out_channels,
+            extra_layers=extra_layers,
+            out_indices=out_indices,
+            norm=nn.BatchNorm2d(num_features=out_channels,
+                                momentum=0.003, eps=1e-4),
+            act=nn.ReLU(False),
+            upsample_mode=upsample_mode,
+            use_conv=True,
+            attention=attention,
+        )
+
+    @classmethod
+    def build_bifpn(
+        cls,
+        in_channels_list: List[int],
+        out_channels: int,
+        extra_layers: int = 0,
+        out_indices: Optional[List[int]] = None,
+        upsample_mode: str = 'bilinear',
+        attention: bool = True,
+    ):
+        return cls(
+            in_channels_list=in_channels_list,
+            out_channels=out_channels,
+            extra_layers=extra_layers,
+            out_indices=out_indices,
+            norm=nn.BatchNorm2d(num_features=out_channels,
+                                momentum=0.003, eps=1e-4),
+            act=nn.ReLU(False),
+            upsample_mode=upsample_mode,
+            use_conv=False,
+            attention=attention,
+        )
+
+
+class BiFPNs(PowerModule):
+
+    def __init__(
+        self,
+        in_channels_list: List[int],
+        out_channels: int,
+        n_bifpn: int,
+        extra_layers: int = 0,
+        out_indices: Optional[List[int]] = None,
+        upsample_mode: str = 'bilinear',
+        attention: bool = True,
+        use_conv: bool = False,
+    ):
+        """
+        Constructor of the BiFPN module.
+
+        Args:
+            in_channels_list (List[int]):
+                A list of integers representing the number of input channels for
+                each feature map.
+            out_channels (int):
+                The number of output channels for each feature map.
+            n_bifpn (int):
+                The number of BiFPN blocks to be stacked.
+            extra_layers (int, optional):
+                The number of additional convolutional layers added after the
+                BiFPN blocks. Defaults to 0.
+            out_indices (Optional[List[int]], optional):
+                A list of integers representing the indices of output feature maps.
+                Defaults to None.
+            upsample_mode (str, optional):
+                The interpolation method used in the upsampling operation.
+                Defaults to 'bilinear'.
+            attention (bool, optional):
+                A boolean flag indicating whether to use attention mechanism.
+                Defaults to True.
+            use_conv (bool, optional):
+                In BiFPN, SeparableConvBlock is used by default to replace CNN.
+                If you want to use a general CNN, set use_conv to True.
+                Defaults to False.
+
+        Raises:
+            ValueError: If the input `cls_method` is not supported.
+        """
+        super().__init__()
+        cls_method = 'build_bifpn' if not use_conv else 'build_convbifpn'
+        num_out_features = len(in_channels_list) + extra_layers
+        self.block = nn.ModuleList([
+            getattr(BiFPN, cls_method)(
+                out_channels=out_channels,
+                in_channels_list=in_channels_list if i == 0 else [
+                    out_channels] * num_out_features,
+                extra_layers=extra_layers if i == 0 else 0,
+                out_indices=out_indices if i == n_bifpn - 1 else None,
+                attention=attention,
+                upsample_mode=upsample_mode,
+            ) for i in range(n_bifpn)
+        ])
+
+    def forward(self, xs: List[torch.Tensor]) -> List[torch.Tensor]:
+        for bifpn in self.block:
+            xs = bifpn(xs)
+        return xs
diff --git a/chameleon/neck/fpn.py b/chameleon/neck/fpn.py
new file mode 100644
index 0000000..37630ca
--- /dev/null
+++ b/chameleon/neck/fpn.py
@@ -0,0 +1,260 @@
+from copy import deepcopy
+from typing import List, Optional, Union
+
+import torch
+import torch.nn as nn
+
+from ..nn import CNN2Dcell, PowerModule, SeparableConvBlock
+
+__all__ = ['FPN', 'FPNs']
+
+
+class FPN(PowerModule):
+
+    def __init__(
+        self,
+        in_channels_list: List[int],
+        out_channels: int,
+        extra_layers: int = 0,
+        out_indices: Optional[List[int]] = None,
+        norm: Optional[Union[dict, nn.Module]] = None,
+        act: Optional[Union[dict, nn.Module]] = None,
+        upsample_mode: str = 'bilinear',
+        use_dwconv: bool = False,
+    ) -> None:
+        """
+        Feature Pyramid Network (FPN) module.
+
+        Args:
+            in_channels_list (List[int]):
+                A list of integers representing the number of channels in each
+                input feature map.
+            out_channels (int):
+                The number of output channels for all feature maps.
+            extra_layers (int, optional):
+                The number of extra down-sampling layers to add. Defaults to 0.
+            out_indices (Optional[List[int]], optional):
+                A list of integers indicating the indices of the feature maps to
+                output. If None, all feature maps are output. Defaults to None.
+            norm Optional[Union[dict, nn.Module]]:
+                Optional normalization module or dictionary of its parameters.
+                Defaults to None.
+            act Optional[Union[dict, nn.Module]]:
+                Optional activation function or dictionary of its parameters.
+                Defaults to None.
+            upsample_mode (str, optional):
+                The type of upsampling method to use, which can be 'bilinear' or
+                'nearest'. Bilinear upsampling is recommended in most cases for
+                its better performance. Nearest neighbor upsampling may be useful
+                when input feature maps have a small spatial resolution.
+                Defaults to 'bilinear'.
+            use_dwconv (bool, optional):
+                Whether to use depth-wise convolution in each Conv2d block.
+                Depth-wise convolution can reduce the number of parameters and
+                improve computation efficiency. However, it may also degrade the
+                quality of feature maps due to its low capacity.
+                Defaults to False.
+
+        Raises:
+            ValueError: If the number of input feature maps does not match the length of `in_channels_list`.
+                Or if `extra_layers` is negative.
+        """
+        super().__init__()
+
+        self.upsample_mode = upsample_mode
+        self.in_channels_list = in_channels_list
+
+        num_in_features = len(in_channels_list)
+        num_out_features = num_in_features + extra_layers
+
+        if extra_layers < 0:
+            raise ValueError('extra_layers < 0, which is not invalid.')
+
+        conv2d = SeparableConvBlock if use_dwconv else CNN2Dcell
+
+        self.conv1x1s = []
+        for i in range(num_out_features):
+            in_channels = in_channels_list[i] if i < num_in_features else in_channels_list[-1]
+            if in_channels != out_channels:
+                self.conv1x1s.append(
+                    CNN2Dcell(
+                        in_channels,
+                        out_channels,
+                        kernel=1,
+                        stride=1,
+                        padding=0,
+                        norm=deepcopy(norm),
+                    )
+                )
+            else:
+                self.conv1x1s.append(nn.Identity())
+        self.conv1x1s = nn.ModuleList(self.conv1x1s)
+
+        self.smooth3x3s = nn.ModuleList([
+            conv2d(
+                out_channels,
+                out_channels,
+                kernel=3,
+                stride=1,
+                padding=1,
+                norm=deepcopy(norm),
+                act=deepcopy(act),
+            )
+            for _ in range(num_out_features - 1)
+        ])
+
+        if extra_layers > 0:
+            self.extra_conv_downs = nn.ModuleList([
+                conv2d(
+                    in_channels_list[-1],
+                    in_channels_list[-1],
+                    kernel=3,
+                    stride=2,
+                    padding=1,
+                    norm=getattr(nn, norm.__class__.__name__)(in_channels_list[-1]) if norm is not None else None,
+                    act=deepcopy(act),
+                )
+                for _ in range(extra_layers)
+            ])
+
+        self.upsamples = nn.ModuleList([
+            nn.Upsample(
+                scale_factor=2,
+                mode=upsample_mode,
+                align_corners=False if upsample_mode != 'nearest' else None,
+            )
+            for _ in range(num_out_features - 1)
+        ])
+
+        self.num_in_features = num_in_features
+        self.out_indices = out_indices
+        self.initialize_weights_()
+
+    def forward(self, xs: List[torch.Tensor]) -> List[torch.Tensor]:
+
+        if len(xs) != self.num_in_features:
+            raise ValueError('Num of feats is not correct.')
+
+        # make downsample if needed
+        #   for example: P3, P4, P5 -> P3, P4, P5, P6, P7
+        if hasattr(self, 'extra_conv_downs'):
+            extras = []
+            x = xs[-1]
+            for conv in self.extra_conv_downs:
+                x = conv(x)
+                extras.append(x)
+            xs = xs + extras
+
+        # top-down pathway
+        outs = []
+        for i in range(len(xs)-1, -1, -1):
+            out = self.conv1x1s[i](xs[i])
+            if i != len(xs)-1:
+                hidden = out + self.upsamples[i](hidden)
+                out = self.smooth3x3s[i](hidden)
+            hidden = out
+            outs.append(out)
+        outs = outs[::-1]
+
+        if self.out_indices is not None:
+            outs = [outs[i] for i in self.out_indices]
+
+        return outs
+
+    @classmethod
+    def build_dwfpn(
+        cls,
+        in_channels_list: List[int],
+        out_channels: int,
+        extra_layers: int = 0,
+        out_indices: Optional[List[int]] = None,
+        upsample_mode: str = 'bilinear',
+    ):
+        return cls(
+            in_channels_list=in_channels_list,
+            out_channels=out_channels,
+            extra_layers=extra_layers,
+            out_indices=out_indices,
+            norm=nn.BatchNorm2d(num_features=out_channels, momentum=0.003),
+            act=nn.ReLU(False),
+            upsample_mode=upsample_mode,
+            use_dwconv=True,
+        )
+
+    @classmethod
+    def build_fpn(
+        cls,
+        in_channels_list: List[int],
+        out_channels: int,
+        extra_layers: int = 0,
+        out_indices: Optional[List[int]] = None,
+        upsample_mode: str = 'bilinear',
+    ):
+        return cls(
+            in_channels_list=in_channels_list,
+            out_channels=out_channels,
+            extra_layers=extra_layers,
+            out_indices=out_indices,
+            norm=nn.BatchNorm2d(num_features=out_channels, momentum=0.003),
+            act=nn.ReLU(False),
+            upsample_mode=upsample_mode,
+            use_dwconv=False,
+        )
+
+
+class FPNs(PowerModule):
+
+    def __init__(
+        self,
+        in_channels_list: List[int],
+        out_channels: int,
+        n_fpn: int,
+        extra_layers: int = 0,
+        out_indices: Optional[List[int]] = None,
+        upsample_mode: str = 'bilinear',
+        use_dwconv: bool = False,
+    ):
+        """
+        Constructor of the FPN module.
+
+        Args:
+
+            in_channels_list (List[int]):
+                A list of integers representing the number of channels in each
+                input feature map.
+            out_channels (int):
+                The number of output channels for all feature maps.
+            n_fpn (int):
+                The number of FPN blocks to be stacked.
+            extra_layers (int, optional):
+                The number of extra down-sampling layers to add. Defaults to 0.
+            out_indices (Optional[List[int]], optional):
+                A list of integers indicating the indices of the feature maps to
+                output. If None, all feature maps are output. Defaults to None.
+            use_dwconv (bool, optional):
+                Whether to use depth-wise convolution in each Conv2d block.
+                Depth-wise convolution can reduce the number of parameters and
+                improve computation efficiency. However, it may also degrade the
+                quality of feature maps due to its low capacity.
+                Defaults to False.
+
+        Raises:
+            ValueError: If the input `cls_method` is not supported.
+        """
+        super().__init__()
+        cls_method = 'build_fpn' if not use_dwconv else 'build_dwfpn'
+        num_out_features = len(in_channels_list) + extra_layers
+        self.block = nn.ModuleList([
+            getattr(FPN, cls_method)(
+                out_channels=out_channels,
+                in_channels_list=in_channels_list if i == 0 else [out_channels] * num_out_features,
+                extra_layers=extra_layers if i == 0 else 0,
+                out_indices=out_indices if i == n_fpn - 1 else None,
+                upsample_mode=upsample_mode,
+            ) for i in range(n_fpn)
+        ])
+
+    def forward(self, xs: List[torch.Tensor]) -> List[torch.Tensor]:
+        for fpn in self.block:
+            xs = fpn(xs)
+        return xs
diff --git a/chameleon/nn/__init__.py b/chameleon/nn/__init__.py
new file mode 100644
index 0000000..6304b80
--- /dev/null
+++ b/chameleon/nn/__init__.py
@@ -0,0 +1,24 @@
+from torch.nn import *
+
+from .aspp import *
+from .block import *
+from .cnn import *
+from .components import *
+from .dwcnn import *
+from .grl import *
+from .mbcnn import *
+from .positional_encoding import *
+from .selayer import *
+from .utils import *
+from .vae import *
+
+
+def build_nn_cls(name):
+    cls_ = globals().get(name, None)
+    if cls_ is None:
+        raise ImportError(f'name {name} is not in nn.')
+    return cls_
+
+
+def build_nn(name, **kwargs):
+    return build_nn_cls(name)(**kwargs)
diff --git a/chameleon/nn/aspp.py b/chameleon/nn/aspp.py
new file mode 100644
index 0000000..3e9773f
--- /dev/null
+++ b/chameleon/nn/aspp.py
@@ -0,0 +1,74 @@
+import torch
+import torch.nn as nn
+
+from .cnn import CNN2Dcell
+from .components import Hswish
+from .utils import PowerModule
+
+__all__ = ['ASPPLayer']
+
+__doc__ = """
+    REFERENCES: DeepLab: Semantic Image Segmentation with Deep Convolutional
+                Nets, Atrous Convolution, and Fully Connected CRFs
+    URL: https://arxiv.org/pdf/1606.00915.pdf
+"""
+
+
+class ASPPLayer(PowerModule):
+
+    ARCHS = {
+        # ksize, stride, padding, dilation, is_use_hs
+        'DILATE1': [3, 1, 1, 1, True],
+        'DILATE2': [3, 1, 2, 2, True],
+        'DILATE3': [3, 1, 4, 4, True],
+        'DILATE4': [3, 1, 8, 8, True],
+    }
+
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        output_activate: nn.Module = nn.ReLU(),
+    ):
+        """
+        Constructor for the ASPPLayer class.
+
+        Args:
+            in_channels (int):
+                Number of input channels.
+            out_channels (int):
+                Number of output channels.
+            output_activate (nn.Module, optional):
+                Activation function to apply to the output. Defaults to nn.ReLU().
+        """
+        super().__init__()
+        self.layers = nn.ModuleDict()
+        for dilate_name, cfg in self.ARCHS.items():
+            ksize, stride, padding, dilation, use_hs = cfg
+            layer = CNN2Dcell(
+                in_channels=in_channels,
+                out_channels=in_channels,
+                kernel=ksize,
+                stride=stride,
+                padding=padding,
+                dilation=dilation,
+                norm=nn.BatchNorm2d(in_channels),
+                act=Hswish() if use_hs else nn.ReLU(),
+            )
+            self.layers[dilate_name] = layer
+
+        self.output_layer = CNN2Dcell(
+            in_channels=in_channels * len(self.layers),
+            out_channels=out_channels,
+            kernel=1,
+            stride=1,
+            padding=0,
+            norm=nn.BatchNorm2d(out_channels),
+            act=output_activate,
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        outputs = [layer(x) for layer in self.layers.values()]
+        outputs = torch.cat(outputs, dim=1)
+        outputs = self.output_layer(outputs)
+        return outputs
diff --git a/chameleon/nn/block.py b/chameleon/nn/block.py
new file mode 100644
index 0000000..cf959f9
--- /dev/null
+++ b/chameleon/nn/block.py
@@ -0,0 +1,83 @@
+from typing import Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+
+from .components import build_activation, build_norm
+from .utils import PowerModule
+
+__all__ = ['SeparableConvBlock']
+
+
+class SeparableConvBlock(PowerModule):
+
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int = None,
+        kernel: Union[int, Tuple[int, int]] = 3,
+        stride: Union[int, Tuple[int, int]] = 1,
+        padding: Union[int, Tuple[int, int]] = 1,
+        bias: Optional[bool] = None,
+        norm: Optional[Union[dict, nn.Module]] = None,
+        act: Optional[Union[dict, nn.Module]] = None,
+    ):
+        """
+        A separable convolution block consisting of a depthwise convolution and a pointwise convolution.
+
+        Args:
+            in_channels (int):
+                Number of input channels.
+            out_channels (int, optional):
+                Number of output channels. If not provided, defaults to `in_channels`.
+            kernel (int or Tuple[int, int], optional):
+                Size of the convolution kernel. Defaults to 3.
+            stride (int or Tuple[int, int], optional):
+                Stride of the convolution. Defaults to 1.
+            padding (int or Tuple[int, int], optional):
+                Padding added to all four sides of the input. Defaults to 1.
+            bias (bool, optional):
+                Whether to include a bias term in the convolutional layer.
+                If bias = None, bias would be set as Ture when normalization layer is None and
+                False when normalization layer is not None.
+                Defaults to None.
+            norm (dict or nn.Module, optional):
+                Configuration of normalization layer. Defaults to None.
+            act (dict or nn.Module, optional):
+                Configuration of activation layer. Defaults to None.
+        """
+        super().__init__()
+        out_channels = in_channels if out_channels is None else out_channels
+
+        if bias is None:
+            bias = True if norm is None else False
+
+        self.depthwise_conv = nn.Conv2d(
+            in_channels,
+            in_channels,
+            kernel_size=kernel,
+            stride=stride,
+            padding=padding,
+            groups=in_channels,
+            bias=bias,
+        )
+
+        self.pointwise_conv = nn.Conv2d(
+            in_channels,
+            out_channels,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias=bias,
+        )
+        self.norm = build_norm(**norm) if isinstance(norm, dict) else norm
+        self.act = build_activation(**act) if isinstance(act, dict) else act
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.pointwise_conv(x)
+        if self.norm is not None:
+            x = self.norm(x)
+        if self.act is not None:
+            x = self.act(x)
+        return x
diff --git a/chameleon/nn/cnn.py b/chameleon/nn/cnn.py
new file mode 100644
index 0000000..62ea897
--- /dev/null
+++ b/chameleon/nn/cnn.py
@@ -0,0 +1,123 @@
+from collections import OrderedDict
+from typing import Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+
+from .components import build_activation, build_dropout, build_norm, build_pool
+from .utils import PowerModule
+
+__all__ = [
+    'CNN2Dcell',
+]
+
+
+class CNN2Dcell(PowerModule):
+
+    def __init__(
+        self,
+        in_channels: Union[float, int],
+        out_channels: Union[float, int],
+        kernel: Union[int, Tuple[int, int]] = 3,
+        stride: Union[int, Tuple[int, int]] = 1,
+        padding: Union[int, Tuple[int, int]] = 1,
+        dilation: int = 1,
+        groups: int = 1,
+        bias: Optional[bool] = None,
+        padding_mode: str = 'zeros',
+        norm: Union[dict, nn.Module] = None,
+        dropout: Union[dict, nn.Module] = None,
+        act: Union[dict, nn.Module] = None,
+        pool: Union[dict, nn.Module] = None,
+        init_type: str = 'normal',
+    ):
+        """
+        This class is used to build a 2D convolutional neural network cell.
+
+        Args:
+            in_channels (int or float):
+                Number of input channels.
+            out_channels (int or float):
+                Number of output channels.
+            kernel (int or tuple, optional):
+                Size of the convolutional kernel. Defaults to 3.
+            stride (int or tuple, optional):
+                Stride size. Defaults to 1.
+            padding (int or tuple, optional):
+                Padding size. Defaults to 1.
+            dilation (int, optional):
+                Spacing between kernel elements. Defaults to 1.
+            groups (int, optional):
+                Number of blocked connections from input channels to output
+                channels. Defaults to 1.
+            bias (bool, optional):
+                Whether to include a bias term in the convolutional layer.
+                If bias = None, bias would be set as Ture when normalization layer is None and
+                False when normalization layer is not None.
+                Defaults to None.
+            padding_mode (str, optional):
+                Options = {'zeros', 'reflect', 'replicate', 'circular'}.
+                Defaults to 'zeros'.
+            norm (Union[dict, nn.Module], optional):
+                normalization layer or a dictionary of arguments for building a
+                normalization layer. Default to None.
+            dropout (Union[dict, nn.Module], optional):
+                dropout layer or a dictionary of arguments for building a dropout
+                layer. Default to None.
+            act (Union[dict, nn.Module], optional):
+                Activation function or a dictionary of arguments for building an
+                activation function. Default to None.
+            pool (Union[dict, nn.Module], optional):
+                pooling layer or a dictionary of arguments for building a pooling
+                layer. Default to None.
+            init_type (str):
+                Method for initializing model parameters. Default to 'normal'.
+                Options = {'normal', 'uniform'}
+
+        Examples for using norm, act, and pool:
+            1. cell = CNN2Dcell(in_channels=3,
+                                out_channels=12,
+                                norm=nn.BatchNorm2d(12),
+                                act=nn.ReLU(),
+                                pool=nn.AdaptiveAvgPool2d(1))
+            2. cell = CNN2Dcell(in_channels=3,
+                                out_channels=12,
+                                norm={'name': 'BatchNorm2d', 'num_features': 12},
+                                act={'name': 'ReLU', 'inplace': True})
+
+        Attributes:
+            layer (nn.ModuleDict): a dictionary of layer contained in the cell.
+        """
+        super().__init__()
+        self.layer = nn.ModuleDict()
+
+        if bias is None:
+            bias = True if norm is None else False
+
+        self.layer['cnn'] = nn.Conv2d(
+            int(in_channels),
+            int(out_channels),
+            kernel_size=kernel,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=groups,
+            bias=bias,
+            padding_mode=padding_mode,
+        )
+
+        optional_modules = OrderedDict({
+            'norm': build_norm(**norm) if isinstance(norm, dict) else norm,
+            'dp': build_dropout(**dropout) if isinstance(dropout, dict) else dropout,
+            'act': build_activation(**act) if isinstance(act, dict) else act,
+            'pool': build_pool(**pool) if isinstance(pool, dict) else pool,
+        })
+        for name, m in optional_modules.items():
+            if m is not None:
+                self.layer[name] = m
+        self.initialize_weights_(init_type)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        for _, m in self.layer.items():
+            x = m(x)
+        return x
diff --git a/chameleon/nn/components/__init__.py b/chameleon/nn/components/__init__.py
new file mode 100644
index 0000000..bfa7884
--- /dev/null
+++ b/chameleon/nn/components/__init__.py
@@ -0,0 +1,5 @@
+from .activation import *
+from .dropout import *
+from .loss import *
+from .norm import *
+from .pooling import *
diff --git a/chameleon/nn/components/activation.py b/chameleon/nn/components/activation.py
new file mode 100644
index 0000000..eee225d
--- /dev/null
+++ b/chameleon/nn/components/activation.py
@@ -0,0 +1,104 @@
+from typing import Union
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn.modules.activation import (CELU, ELU, GELU, GLU, Hardsigmoid,
+                                         Hardswish, Hardtanh, LeakyReLU,
+                                         LogSigmoid, LogSoftmax,
+                                         MultiheadAttention, PReLU, ReLU,
+                                         ReLU6, RReLU, Sigmoid, SiLU, Softmax,
+                                         Softmax2d, Softmin, Softplus,
+                                         Softshrink, Softsign, Tanh,
+                                         Tanhshrink, Threshold)
+
+__all__ = [
+    'Swish', 'Hsigmoid', 'Hswish', 'build_activation', 'StarReLU', 'SquaredReLU',
+]
+
+__all__ += ['CELU', 'ELU', 'GELU', 'GLU', 'LeakyReLU', 'LogSigmoid',
+            'LogSoftmax', 'MultiheadAttention', 'PReLU', 'ReLU', 'ReLU6',
+            'RReLU', 'Sigmoid', 'Softmax', 'Softmax2d', 'Softmin', 'Softplus',
+            'Softshrink', 'Softsign', 'Tanh', 'Tanhshrink', 'Threshold',
+            'Hardsigmoid', 'Hardswish', 'Hardtanh', 'SiLU',]
+
+
+class Hsigmoid(nn.Module):
+    def __init__(self, inplace: bool = False):
+        super().__init__()
+        self.inplace = inplace
+
+    def forward(self, x: torch.Tensor):
+        return F.relu6(x + 3., inplace=self.inplace) * 0.16666666667
+
+
+class Hswish(nn.Module):
+    def __init__(self, inplace: bool = False):
+        super().__init__()
+        self.inplace = inplace
+
+    def forward(self, x: torch.Tensor):
+        return x * F.relu6(x + 3., inplace=self.inplace) * 0.16666666667
+
+
+class StarReLU(nn.Module):
+
+    def __init__(
+        self,
+        scale: float = 1.0,
+        bias: float = 0.0,
+        scale_learnable: bool = True,
+        bias_learnable: bool = True,
+        inplace: bool = False
+    ):
+        """
+        StarReLU: s * relu(x) ** 2 + b
+        Ref: MetaFormer Baselines for Vision (2022.12) (https://arxiv.org/pdf/2210.13452.pdf)
+
+        Args:
+            scale (float):
+                Scale factor for the activation function, defaults to 1.0.
+            bias (float):
+                Bias for the activation function, defaults to 0.0.
+            scale_learnable (bool):
+                Whether the scale factor should be learnable, defaults to True.
+            bias_learnable (bool):
+                Whether the bias should be learnable, defaults to True.
+            inplace (bool):
+                Whether to modify the input in place, defaults to False.
+        """
+        super().__init__()
+        self.inplace = inplace
+        self.scale = nn.Parameter(
+            torch.tensor(scale),
+            requires_grad=scale_learnable
+        )
+        self.bias = nn.Parameter(
+            torch.tensor(bias),
+            requires_grad=bias_learnable
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return self.scale * F.relu(x, inplace=self.inplace) ** 2 + self.bias
+
+
+class SquaredReLU(nn.Module):
+
+    def __init__(self, inplace=False):
+        """ Squared ReLU: https://arxiv.org/abs/2109.08668 """
+        super().__init__()
+        self.inplace = inplace
+
+    def forward(self, x):
+        return torch.square(F.relu(x, inplace=self.inplace))
+
+
+# Ref: https://pytorch.org/docs/stable/generated/torch.nn.SiLU.html
+Swish = nn.SiLU
+
+
+def build_activation(name, **options) -> Union[nn.Module, None]:
+    cls = globals().get(name, None)
+    if cls is None:
+        raise ValueError(f'Activation named {name} is not supported.')
+    return cls(**options)
diff --git a/chameleon/nn/components/dropout.py b/chameleon/nn/components/dropout.py
new file mode 100644
index 0000000..ee427a4
--- /dev/null
+++ b/chameleon/nn/components/dropout.py
@@ -0,0 +1,15 @@
+from typing import Union
+
+import torch.nn as nn
+from torch.nn import AlphaDropout, Dropout, Dropout2d, Dropout3d
+
+__all__ = [
+    'Dropout', 'Dropout2d', 'Dropout3d', 'AlphaDropout', 'build_dropout',
+]
+
+
+def build_dropout(name, **options) -> Union[nn.Module, None]:
+    cls = globals().get(name, None)
+    if cls is None:
+        raise ValueError(f'Dropout named {name} is not support.')
+    return cls(**options)
diff --git a/chameleon/nn/components/loss.py b/chameleon/nn/components/loss.py
new file mode 100644
index 0000000..aac682b
--- /dev/null
+++ b/chameleon/nn/components/loss.py
@@ -0,0 +1,149 @@
+import math
+from typing import Union
+
+import torch
+import torch.nn as nn
+from torch.nn.modules.loss import (BCELoss, BCEWithLogitsLoss,
+                                   CrossEntropyLoss, CTCLoss, KLDivLoss,
+                                   L1Loss, MSELoss, SmoothL1Loss)
+
+__all__ = [
+    'build_loss', 'AWingLoss', 'WeightedAWingLoss',
+    'BCELoss', 'BCEWithLogitsLoss', 'CrossEntropyLoss',
+    'CTCLoss', 'KLDivLoss', 'L1Loss', 'MSELoss', 'SmoothL1Loss',
+    'ArcFace', 'CosFace', 'LogCoshDiceLoss',
+]
+
+
+class AWingLoss(nn.Module):
+
+    def __init__(
+        self,
+        alpha: float = 2.1,
+        omega: float = 14,
+        epsilon: float = 1,
+        theta: float = 0.5
+    ):
+        """
+        Initialize the parameters of the AWingLoss loss function.
+
+        Args:
+            alpha (float, optional):
+                The alpha parameter. Defaults to 2.1.
+            omega (float, optional):
+                The omega parameter. Defaults to 14.
+            epsilon (float, optional):
+                The epsilon parameter. Defaults to 1.
+            theta (float, optional):
+                The theta parameter. Defaults to 0.5.
+        """
+        super().__init__()
+        self.alpha = alpha
+        self.omega = omega
+        self.epsilon = epsilon
+        self.theta = theta
+
+    def forward(self, preds, targets):
+        diff = torch.abs(targets - preds)
+        case1_mask = diff < self.theta
+        case2_mask = ~case1_mask
+        loss_case1 = self.omega * \
+            torch.log1p((diff[case1_mask] / self.epsilon) ** self.alpha)
+        A = self.omega * (1 / (1 + (self.theta / self.epsilon)**(self.alpha - targets))) \
+            * (self.alpha - targets) * ((self.theta / self.epsilon)**(self.alpha - targets - 1)) \
+            * (1 / self.epsilon)
+        C = self.theta * A - self.omega * \
+            torch.log1p((self.theta / self.epsilon)**(self.alpha - targets))
+        loss_case2 = A[case2_mask] * diff[case2_mask] - C[case2_mask]
+        loss_matrix = torch.zeros_like(preds)
+        loss_matrix[case1_mask] = loss_case1
+        loss_matrix[case2_mask] = loss_case2
+        return loss_matrix
+
+
+class WeightedAWingLoss(nn.Module):
+
+    def __init__(
+        self,
+        w: float = 10,
+        alpha: float = 2.1,
+        omega: float = 14,
+        epsilon: float = 1,
+        theta: float = 0.5
+    ):
+        super().__init__()
+        self.w = w
+        self.AWingLoss = AWingLoss(alpha, omega, epsilon, theta)
+
+    def forward(self, preds, targets, weight_map=None):
+        loss = self.AWingLoss(preds, targets)
+        if weight_map is None:
+            weight_map = targets > 0
+        weighted = loss * (self.w * weight_map.to(loss.dtype) + 1)
+        return weighted.mean()
+
+
+def build_loss(name: str, **options) -> Union[nn.Module, None]:
+    """Build a loss func layer given the name and options."""
+    cls = globals().get(name, None)
+    if cls is None:
+        raise KeyError(f'Unsupported loss func: {name}')
+    return cls(**options)
+
+
+class ArcFace(nn.Module):
+
+    def __init__(self, s=64.0, m=0.5):
+        super(ArcFace, self).__init__()
+        self.s = s
+        self.margin = m
+        self.cos_m = math.cos(m)
+        self.sin_m = math.sin(m)
+        self.theta = math.cos(math.pi - m)
+        self.sinmm = math.sin(math.pi - m) * m
+        self.easy_margin = False
+
+    def forward(self, logits: torch.Tensor, labels: torch.Tensor):
+        index = torch.where(labels != -1)[0]
+        target_logit = logits[index, labels[index].view(-1)]
+        with torch.no_grad():
+            target_logit.arccos_()
+            logits.arccos_()
+            final_target_logit = target_logit + self.margin
+            logits[index, labels[index].view(-1)] = final_target_logit
+            logits.cos_()
+        logits = logits * self.s
+        return logits
+
+
+class CosFace(nn.Module):
+
+    def __init__(self, s=64.0, m=0.40):
+        super(CosFace, self).__init__()
+        self.s = s
+        self.m = m
+
+    def forward(self, logits: torch.Tensor, labels: torch.Tensor):
+        index = torch.where(labels != -1)[0]
+        logits[index, labels[index].view(-1)] -= self.m
+        logits *= self.s
+        return logits
+
+
+class LogCoshDiceLoss(nn.Module):
+
+    def __init__(self, smooth=1):
+        super().__init__()
+        self.smooth = smooth
+
+    def dice_loss(self, input, target):
+        input_flat = input.view(-1)
+        target_flat = target.view(-1)
+        intersection = (input_flat * target_flat).sum()
+        dice_loss = 1 - (2. * intersection + self.smooth) / \
+            (input_flat.sum() + target_flat.sum() + self.smooth)
+        return dice_loss
+
+    def forward(self, input, target):
+        dice_loss = self.dice_loss(input, target)
+        return torch.log(torch.cosh(dice_loss))
diff --git a/chameleon/nn/components/norm.py b/chameleon/nn/components/norm.py
new file mode 100644
index 0000000..5fa48e3
--- /dev/null
+++ b/chameleon/nn/components/norm.py
@@ -0,0 +1,52 @@
+from typing import Union
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn.modules.batchnorm import (BatchNorm1d, BatchNorm2d, BatchNorm3d,
+                                        SyncBatchNorm)
+from torch.nn.modules.instancenorm import (InstanceNorm1d, InstanceNorm2d,
+                                           InstanceNorm3d)
+from torch.nn.modules.normalization import (CrossMapLRN2d, GroupNorm,
+                                            LayerNorm, LocalResponseNorm)
+
+__all__ = [
+    'BatchNorm1d', 'BatchNorm2d', 'BatchNorm3d', 'SyncBatchNorm', 'InstanceNorm1d',
+    'InstanceNorm2d', 'InstanceNorm3d', 'CrossMapLRN2d', 'GroupNorm', 'LayerNorm',
+    'LocalResponseNorm', 'build_norm', 'LayerNorm2d',
+]
+
+
+class LayerNorm2d(nn.LayerNorm):
+
+    def __init__(self, num_channels: int, eps: float = 1e-6, affine: bool = True):
+        r"""
+        LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W).
+
+        Args:
+            num_channels (int):
+                Number of channels in the input tensor.
+            eps (float, optional):
+                A value added to the denominator for numerical stability.
+                Default: 1e-5
+            affine (bool. optional):
+                A boolean value that when set to `True`, this module has learnable
+                per-element affine parameters initialized to ones (for weights)
+                and zeros (for biases). Default to True.
+        """
+        super().__init__(num_channels, eps=eps, elementwise_affine=affine)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = x.permute(0, 2, 3, 1)
+        x = F.layer_norm(x, self.normalized_shape,
+                         self.weight, self.bias, self.eps)
+        x = x.permute(0, 3, 1, 2)
+        return x
+
+
+def build_norm(name: str, **options) -> Union[nn.Module, None]:
+    cls = globals().get(name, None)
+    if cls is None:
+        raise ValueError(
+            f'Normalization named {name} is not supported. Available options: {__all__}')
+    return cls(**options)
diff --git a/chameleon/nn/components/pooling.py b/chameleon/nn/components/pooling.py
new file mode 100644
index 0000000..1c73db1
--- /dev/null
+++ b/chameleon/nn/components/pooling.py
@@ -0,0 +1,54 @@
+from typing import Union
+
+import torch
+from torch import nn
+from torch.nn.modules.pooling import (AdaptiveAvgPool1d, AdaptiveAvgPool2d,
+                                      AdaptiveAvgPool3d, AdaptiveMaxPool1d,
+                                      AdaptiveMaxPool2d, AdaptiveMaxPool3d,
+                                      AvgPool1d, AvgPool2d, AvgPool3d,
+                                      MaxPool1d, MaxPool2d, MaxPool3d)
+
+__all__ = [
+    'build_pool', 'AvgPool1d', 'AvgPool2d', 'AvgPool3d', 'MaxPool1d',
+    'MaxPool2d', 'MaxPool3d', 'AdaptiveAvgPool1d', 'AdaptiveAvgPool2d',
+    'AdaptiveAvgPool3d', 'AdaptiveMaxPool1d', 'AdaptiveMaxPool2d',
+    'AdaptiveMaxPool3d', 'GAP', 'GMP',
+]
+
+
+class GAP(nn.Module):
+    """Global Average Pooling layer."""
+
+    def __init__(self):
+        super().__init__()
+        self.pool = nn.Sequential(
+            nn.AdaptiveAvgPool2d(1),
+            nn.Flatten(),
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Apply global average pooling on the input tensor."""
+        return self.pool(x)
+
+
+class GMP(nn.Module):
+    """Global Max Pooling layer."""
+
+    def __init__(self):
+        super().__init__()
+        self.pool = nn.Sequential(
+            nn.AdaptiveMaxPool2d(1),
+            nn.Flatten(),
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Apply global max pooling on the input tensor."""
+        return self.pool(x)
+
+
+def build_pool(name: str, **options) -> Union[nn.Module, None]:
+    """Build a pooling layer given the name and options."""
+    cls = globals().get(name, None)
+    if cls is None:
+        raise KeyError(f'Unsupported pooling layer: {name}')
+    return cls(**options)
diff --git a/chameleon/nn/dwcnn.py b/chameleon/nn/dwcnn.py
new file mode 100644
index 0000000..78d2bb1
--- /dev/null
+++ b/chameleon/nn/dwcnn.py
@@ -0,0 +1,41 @@
+from collections import OrderedDict
+
+import torch.nn as nn
+
+__all__ = ['depth_conv2d', 'conv_dw', 'conv_dw_in']
+
+
+def depth_conv2d(in_channels: int, out_channels: int, kernel: int = 1, stride: int = 1, pad: int = 0):
+    return nn.Sequential(
+        OrderedDict([
+            ('conv3x3', nn.Conv2d(in_channels, in_channels, kernel_size=kernel, stride=stride, padding=pad, groups=in_channels),),
+            ('act', nn.ReLU(),),
+            ('conv1x1', nn.Conv2d(in_channels, out_channels, kernel_size=1)),
+        ])
+    )
+
+
+def conv_dw(in_channels: int, out_channels: int, stride: int, act: nn.Module = nn.ReLU()):
+    return nn.Sequential(
+        OrderedDict([
+            ('conv3x3', nn.Conv2d(in_channels, in_channels, 3, stride, 1, groups=in_channels, bias=False)),
+            ('bn1', nn.BatchNorm2d(in_channels)),
+            ('act1', nn.ReLU()),
+            ('conv1x1', nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False)),
+            ('bn2', nn.BatchNorm2d(out_channels)),
+            ('act2', act),
+        ])
+    )
+
+
+def conv_dw_in(in_channels: int, out_channels: int, stride: int, act: nn.Module = nn.ReLU()):
+    return nn.Sequential(
+        OrderedDict([
+            ('conv3x3', nn.Conv2d(in_channels, in_channels, 3, stride, 1, groups=in_channels, bias=False)),
+            ('in1', nn.InstanceNorm2d(in_channels)),
+            ('act1', nn.ReLU()),
+            ('conv1x1', nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False)),
+            ('in2', nn.InstanceNorm2d(out_channels)),
+            ('act2', act),
+        ])
+    )
diff --git a/chameleon/nn/grl.py b/chameleon/nn/grl.py
new file mode 100644
index 0000000..22debd7
--- /dev/null
+++ b/chameleon/nn/grl.py
@@ -0,0 +1,43 @@
+import torch
+from torch.autograd import Function
+
+from .utils import PowerModule
+
+__all__ = ['GradientReversalLayer']
+
+
+class RevGrad(Function):
+    @staticmethod
+    def forward(ctx, input_, alpha_):
+        ctx.save_for_backward(alpha_)
+        return input_
+
+    @staticmethod
+    def backward(ctx, grad_output):  # pragma: no cover
+        grad_input = None
+        alpha_, = ctx.saved_tensors
+        if ctx.needs_input_grad[0]:
+            grad_input = -grad_output * alpha_
+        return grad_input, None
+
+
+revgrad = RevGrad.apply
+
+
+class GradientReversalLayer(PowerModule):
+
+    def __init__(self, warm_up=4000):
+        """
+        A gradient reversal layer.
+        This layer has no parameters, and simply reverses the gradient
+        in the backward pass.
+        """
+        super().__init__()
+        self.n_iters = 0
+        self.warm_up = warm_up
+
+    def forward(self, input_: torch.Tensor) -> torch.Tensor:
+        self.n_iters += 1
+        alpha = min(self.n_iters / self.warm_up, 1)
+        alpha = torch.tensor(alpha, requires_grad=False)
+        return revgrad(input_, alpha)
diff --git a/chameleon/nn/mbcnn.py b/chameleon/nn/mbcnn.py
new file mode 100644
index 0000000..eba63a6
--- /dev/null
+++ b/chameleon/nn/mbcnn.py
@@ -0,0 +1,211 @@
+from copy import deepcopy
+from typing import Tuple, Union
+
+import torch
+import torch.nn as nn
+
+from .cnn import CNN2Dcell
+from .components import build_norm
+from .selayer import SELayer
+from .utils import PowerModule
+
+__all__ = ['MBCNNcell']
+
+
+class MBCNNcell(PowerModule):
+
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        hid_channels: int = None,
+        kernel: Union[int, Tuple[int, int]] = 3,
+        stride: Union[int, Tuple[int, int]] = 1,
+        use_se: bool = False,
+        se_reductioin: int = 4,
+        inner_norm: Union[dict, nn.Module] = None,
+        inner_act: Union[dict, nn.Module] = None,
+        norm: Union[dict, nn.Module] = None,
+    ):
+        """
+        This neural network block is commonly known as the "inverted residual block",
+        which is used in MobileNetV2, MobileNetV3, and EfficientNet (but not always).
+        ref: https://arxiv.org/pdf/1905.02244.pdf
+
+        For MobileNetV1, the block consists of a kxk depth-wise convolution with
+        group normalization, batch normalization, and ReLU activation, followed
+        by a 1x1 projection with batch normalization.
+
+        mbv1:
+            input ---> kxk depth-wise (group, bn, relu) ---> 1x1 projection (bn)
+
+        For MobileNetV2, the block starts with a 1x1 expansion with batch normalization
+        and ReLU6 activation, followed by a kxk depth-wise convolution with group
+        normalization, batch normalization, and ReLU6 activation, and ends with a
+        1x1 projection with batch normalization.
+
+        mbv2:
+            input ---> 1x1 expansion (bn, relu6) ---> kxk depth-wise (group, bn, relu6) ---> 1x1 projection (bn)
+
+        For MobileNetV3, the block starts with a 1x1 expansion with batch normalization
+        and h-swish activation, followed by a kxk depth-wise convolution with group
+        normalization, batch normalization, and h-swish activation, and ends with a
+        1x1 projection with batch normalization. In addition, MobileNetV3 uses a
+        squeeze-and-excitation (SE) layer to enhance feature interdependencies.
+
+        mbv3:
+            input ---> 1x1 expansion (bn, hswish) ---> kxk depth-wise (group, bn, hswish) ---> 1x1 projection (bn)
+                        |                                          ↑
+                        ↓---------->    SE layer (v3)     -------->|
+
+
+        Args:
+            in_channels (int):
+                The number of input channels.
+            hid_channels (int):
+                The number of hidden channels for expanding dimensions.
+            out_channels (int):
+                The number of output channels.
+            kernel (Union[int, Tuple[int, int]], optional):
+                The kernel size of the depth-wise convolution. Defaults to 3.
+            stride (int, optional):
+                The stride size of the depth-wise convolution. Defaults to 1.
+            use_se (bool, optional):
+                Whether to use the SE layer. Defaults to True.
+            se_reduction (int, optional):
+                Reduction ratio for the number of hidden channels in the SE layer.
+                Defaults to 4.
+            inner_norm (Union[dict, nn.Module], optional):
+                Dictionary or function that creates a normalization layer inside
+                the MB block. Defaults to None.
+            inner_act (Union[dict, nn.Module], optional):
+                Dictionary or function that creates an activation layer inside
+                the MB block. Defaults to None.
+            norm (Union[dict, nn.Module], optional):
+                Dictionary or function that creates a normalization layer on the
+                last stage. Defaults to None.
+        """
+        super().__init__()
+        self.identity = stride == 1 and in_channels == out_channels
+
+        if hid_channels is None:
+            hid_channels = in_channels
+
+        if hid_channels != in_channels:
+            self.expdim = CNN2Dcell(
+                in_channels,
+                hid_channels,
+                kernel=1,
+                stride=1,
+                padding=0,
+                norm=deepcopy(inner_norm),
+                act=deepcopy(inner_act),
+            )
+
+        padding = (kernel - 1) // 2 if isinstance(kernel, int) else \
+            ((kernel[0] - 1) // 2, (kernel[1] - 1) // 2)
+
+        self.dwise = CNN2Dcell(
+            hid_channels,
+            hid_channels,
+            kernel=kernel,
+            stride=stride,
+            padding=padding,
+            groups=hid_channels,
+            norm=deepcopy(inner_norm),
+            act=deepcopy(inner_act),
+        )
+
+        if use_se:
+            self.dwise_se = SELayer(
+                hid_channels,
+                se_reductioin,
+            )
+
+        self.pwise_linear = CNN2Dcell(
+            hid_channels,
+            out_channels,
+            kernel=1,
+            stride=1,
+            padding=0,
+        )
+
+        if norm is not None:
+            self.norm = norm if isinstance(norm, nn.Module) else build_norm(**norm)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        out = x
+        if hasattr(self, 'expdim'):
+            out = self.expdim(out)
+        out = self.dwise(out)
+        if hasattr(self, 'dwise_se'):
+            out = self.dwise_se(out)
+        out = self.pwise_linear(out)
+        if hasattr(self, 'norm'):
+            out = self.norm(out)
+        out = x + out if self.identity else out  # skip connection
+        return out
+
+    @classmethod
+    def build_mbv1block(
+        cls,
+        in_channels: int,
+        out_channels: int,
+        kernel: Union[int, Tuple[int, int]] = 3,
+        stride: Union[int, Tuple[int, int]] = 1,
+    ):
+        return cls(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            hid_channels=in_channels,
+            kernel=kernel,
+            stride=stride,
+            use_se=False,
+            inner_norm=nn.BatchNorm2d(in_channels),
+            inner_act=nn.ReLU(False),
+            norm=nn.BatchNorm2d(out_channels),
+        )
+
+    @classmethod
+    def build_mbv2block(
+        cls,
+        in_channels: int,
+        out_channels: int,
+        expand_ratio: float = 2,
+        kernel: Union[int, Tuple[int, int]] = 3,
+        stride: Union[int, Tuple[int, int]] = 1,
+    ):
+        hid_channels = int(in_channels * expand_ratio)
+        return cls(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            hid_channels=hid_channels,
+            kernel=kernel,
+            stride=stride,
+            use_se=False,
+            inner_norm=nn.BatchNorm2d(hid_channels),
+            inner_act=nn.ReLU6(False),
+            norm=nn.BatchNorm2d(out_channels),
+        )
+
+    @classmethod
+    def build_mbv3block(
+        cls,
+        in_channels: int,
+        out_channels: int,
+        expand_ratio: float = 2,
+        kernel: Union[int, Tuple[int, int]] = 3,
+        stride: Union[int, Tuple[int, int]] = 1,
+    ):
+        hid_channels = int(in_channels * expand_ratio)
+        return cls(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            hid_channels=hid_channels,
+            kernel=kernel,
+            stride=stride,
+            use_se=True,
+            inner_norm=nn.BatchNorm2d(hid_channels),
+            inner_act=nn.Hardswish(False),
+            norm=nn.BatchNorm2d(out_channels),
+        )
diff --git a/chameleon/nn/positional_encoding.py b/chameleon/nn/positional_encoding.py
new file mode 100644
index 0000000..b90acbd
--- /dev/null
+++ b/chameleon/nn/positional_encoding.py
@@ -0,0 +1,26 @@
+import math
+
+import torch
+
+__all__ = ['sinusoidal_positional_encoding_1d']
+
+
+def sinusoidal_positional_encoding_1d(length, dim):
+    """ Sinusoidal positional encoding for non-recurrent neural networks.
+        REFERENCES: Attention Is All You Need
+        URL: https://arxiv.org/abs/1706.03762
+    """
+    if dim % 2 != 0:
+        raise ValueError(
+            'Cannot use sin/cos positional encoding with '
+            f'odd dim (got dim={dim})')
+
+    # position embedding
+    pe = torch.zeros(length, dim)
+    position = torch.arange(0, length).unsqueeze(1)
+    div_term = torch.exp(
+        (torch.arange(0, dim, 2, dtype=torch.float) * -(math.log(10000.0) / dim)))
+    pe[:, 0::2] = torch.sin(position.float() * div_term)
+    pe[:, 1::2] = torch.cos(position.float() * div_term)
+
+    return pe
diff --git a/chameleon/nn/selayer.py b/chameleon/nn/selayer.py
new file mode 100644
index 0000000..87b2b3f
--- /dev/null
+++ b/chameleon/nn/selayer.py
@@ -0,0 +1,36 @@
+import torch
+import torch.nn as nn
+
+from .cnn import CNN2Dcell
+from .utils import PowerModule
+
+__all__ = ['SELayer']
+
+
+class SELayer(PowerModule):
+
+    def __init__(self, in_channels: int, reduction: int = 4):
+        """
+        Initializes the Squeeze-and-Excitation (SE) layer.
+
+        Args:
+            in_channels (int): Number of input channels.
+            reduction (int):
+                Reduction ratio for the number of channels in the SE block.
+                Default is 4, meaning the output will have 1/4 of the input channels.
+        """
+        super().__init__()
+
+        # Compute the number of channels in the middle layer of the SE block.
+        # If the number of input channels is less than reduction, set it to 1.
+        mid_channels = max(1, in_channels // reduction)
+
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc1 = CNN2Dcell(in_channels, mid_channels, kernel=1, stride=1, padding=0, act=nn.ReLU(False))
+        self.fc2 = CNN2Dcell(mid_channels, in_channels, kernel=1, stride=1, padding=0, act=nn.Sigmoid())
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        y = self.avg_pool(x)
+        y = self.fc1(y)
+        y = self.fc2(y)
+        return x * y
diff --git a/chameleon/nn/utils.py b/chameleon/nn/utils.py
new file mode 100644
index 0000000..c7d7b70
--- /dev/null
+++ b/chameleon/nn/utils.py
@@ -0,0 +1,229 @@
+from typing import Any, List, Optional, Union
+
+import torch
+import torch.nn as nn
+
+from .components import build_activation
+
+__all__ = [
+    'PowerModule', 'initialize_weights', 'WeightedSum', 'Identity',
+    'Transpose', 'Permute',
+]
+
+
+def initialize_weights(
+    model: nn.Module,
+    init_type: str = 'normal',
+    recursive: bool = True
+) -> None:
+    """
+    Initialize the weights in the given model.
+
+    Args:
+        model (nn.Module):
+            The model to initialize.
+        init_type (str, optional):
+            The initialization method to use. Supported options are 'uniform'
+            and 'normal'. Defaults to 'normal'.
+        recursive (bool, optional):
+            Whether to recursively initialize child modules. Defaults to True.
+
+    Raises:
+        TypeError: If init_type is not supported.
+    """
+    if not isinstance(model, nn.Module):
+        raise TypeError(
+            f'model must be an instance of nn.Module, but got {type(model)}')
+
+    init_functions = {
+        'uniform': nn.init.kaiming_uniform_,
+        'normal': nn.init.kaiming_normal_
+    }
+
+    if init_type not in init_functions:
+        raise TypeError(f'init_type {init_type} is not supported.')
+    nn_init = init_functions[init_type]
+
+    def _recursive_init(m):
+        for child in m.children():
+            if len(list(child.children())) > 0 and recursive:
+                _recursive_init(child)
+            else:
+                if isinstance(child, (nn.Conv2d, nn.Linear)):
+                    nn_init(child.weight)
+                    if child.bias is not None:
+                        nn.init.zeros_(child.bias)
+                elif isinstance(child, (nn.BatchNorm1d, nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):
+                    if child.affine:
+                        nn.init.ones_(child.weight)
+                        if child.bias is not None:
+                            nn.init.zeros_(child.bias)
+
+    _recursive_init(model)
+
+
+class PowerModule(nn.Module):
+    """
+    A module that provides additional functionality for weight initialization,
+    freezing and melting layers.
+    """
+
+    def initialize_weights_(self, init_type: str = 'normal') -> None:
+        """
+        Initialize the weights of the module.
+
+        Args:
+            init_type (str): The type of initialization. Can be 'normal' or 'uniform'.
+        """
+        initialize_weights(self, init_type)
+
+    def freeze(self, part_names: Union[str, List[str]] = 'all', verbose: bool = False) -> None:
+        """
+        Freeze the parameters of specified layers.
+
+        Args:
+            part_names (Union[str, List[str]]): The names of the layers to freeze.
+                If 'all', all layers are frozen.
+            verbose (bool): Whether to print messages indicating which layers were frozen.
+        """
+        if part_names == 'all':
+            for name, params in self.named_parameters():
+                if verbose:
+                    print(f'Freezing layer {name}')
+                params.requires_grad_(False)
+        elif part_names is None:
+            return
+        else:
+            part_names = [part_names] if isinstance(part_names, str) \
+                else part_names
+            for layer_name in part_names:
+                module = self
+                for attr in layer_name.split('.'):
+                    module = getattr(module, attr)
+                for name, param in module.named_parameters():
+                    if verbose:
+                        print(f'Freezing layer {layer_name}.{name}')
+                    param.requires_grad_(False)
+
+    def melt(self, part_names: Union[str, List[str]] = 'all', verbose: bool = False) -> None:
+        """
+        Unfreeze the parameters of specified layers.
+
+        Args:
+            part_names (Union[str, List[str]]): The names of the layers to unfreeze.
+                If 'all', all layers are unfrozen.
+            verbose (bool): Whether to print messages indicating which layers were unfrozen.
+        """
+        if part_names == 'all':
+            for name, params in self.named_parameters():
+                if verbose:
+                    print(f'Unfreezing layer {name}')
+                params.requires_grad_(True)
+        elif part_names is None:
+            return
+        else:
+            part_names = [part_names] if isinstance(part_names, str) \
+                else part_names
+            for layer_name in part_names:
+                module = self
+                for attr in layer_name.split('.'):
+                    module = getattr(module, attr)
+                for name, param in module.named_parameters():
+                    if verbose:
+                        print(f'Unfreezing layer {layer_name}.{name}')
+                    param.requires_grad_(True)
+
+
+class WeightedSum(nn.Module):
+
+    def __init__(
+        self,
+        input_size: int,
+        act: Optional[Union[dict, nn.Module]] = None,
+        requires_grad: bool = True,
+    ) -> None:
+        """
+        Initializes a WeightedSum module.
+
+        Args:
+            input_size (int):
+                The number of inputs to be summed.
+            act Optional[Union[dict, nn.Module]]:
+                Optional activation function or dictionary of its parameters.
+                Defaults to None.
+            requires_grad (bool, optional):
+                Whether to require gradients for the weights. Defaults to True.
+        """
+        super().__init__()
+        self.input_size = input_size
+        self.weights = nn.Parameter(
+            torch.ones(input_size, dtype=torch.float32),
+            requires_grad=requires_grad
+        )
+        self.weights_relu = nn.ReLU()
+        if act is None:
+            self.relu = nn.Identity()
+        else:
+            self.relu = act if isinstance(act, nn.Module) \
+                else build_activation(**act)
+        self.epsilon = 1e-4
+
+    def forward(self, x: List[torch.Tensor]) -> torch.Tensor:
+        if len(x) != self.input_size:
+            raise ValueError('Invalid input size not equal to weight size.')
+        weights = self.weights_relu(self.weights)
+        weights = weights / (
+            torch.sum(weights, dim=0, keepdim=True) + self.epsilon)
+        weighted_x = torch.einsum(
+            'i,i...->...', weights, torch.stack(x, dim=0))
+        weighted_x = self.relu(weighted_x)
+        return weighted_x
+
+
+class Identity(PowerModule):
+    r"""A placeholder identity operator that is argument-insensitive.
+
+    Args:
+        args: any argument (unused)
+        kwargs: any keyword argument (unused)
+
+    Shape:
+        - Input: :math:`(*)`, where :math:`*` means any number of dimensions.
+        - Output: :math:`(*)`, same shape as the input.
+
+    Examples::
+
+        >>> m = nn.Identity(54, unused_argument1=0.1, unused_argument2=False)
+        >>> input = torch.randn(128, 20)
+        >>> output = m(input)
+        >>> print(output.size())
+        torch.Size([128, 20])
+
+    """
+
+    def __init__(self, *args: Any, **kwargs: Any) -> None:
+        super().__init__()
+
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        return input
+
+
+class Transpose(nn.Module):
+
+    def __init__(self, dim1: int, dim2: int) -> None:
+        super().__init__()
+        self.dim1 = dim1
+        self.dim2 = dim2
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return x.transpose(self.dim1, self.dim2)
+
+
+class Permute(nn.Module):
+
+    def __init__(self, dims: List[int]) -> None:
+        super().__init__()
+        self.dims = dims
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return x.permute(*self.dims)
diff --git a/chameleon/nn/vae.py b/chameleon/nn/vae.py
new file mode 100644
index 0000000..aa59897
--- /dev/null
+++ b/chameleon/nn/vae.py
@@ -0,0 +1,57 @@
+from typing import Tuple
+
+import torch
+import torch.nn as nn
+
+from .components import GAP
+from .utils import PowerModule
+
+__all__ = ['VAE']
+
+
+class VAE(PowerModule):
+
+    def __init__(self, in_channels: int, out_channels: int, do_pooling: bool = False):
+        """
+        Variational Autoencoder Module
+
+        Args:
+            in_channels (int): Number of input channels.
+            out_channels (int): Number of output channels, which corresponds to the size of the latent space.
+            do_pooling (bool, optional): Whether to apply global average pooling. Defaults to False.
+        """
+        super().__init__()
+        self.pool = GAP() if do_pooling else nn.Identity()
+        self.encoder_mu = nn.Linear(in_channels, out_channels, bias=False)
+        self.encoder_var = nn.Linear(in_channels, out_channels, bias=False)
+
+    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        """
+        Forward pass of the VAE.
+
+            Args:
+                x (torch.Tensor): Input tensor of shape (batch_size, in_channels, height, width)
+
+            Returns:
+                feat (torch.Tensor): Encoded feature tensor of shape (batch_size, out_channels)
+                kld_loss (torch.Tensor): KL divergence loss tensor of shape (1,)
+        """
+        x = self.pool(x)
+
+        # Compute mean and variance of the encoded features using separate linear layers
+        log_var = self.encoder_var(x)
+        mu = self.encoder_mu(x)
+
+        # Compute the standard deviation from the variance
+        std = torch.exp(0.5 * log_var)
+
+        # Sample random noise from a standard normal distribution
+        eps = torch.randn_like(std)
+
+        # Compute the encoded feature vector by adding the noise scaled by the standard deviation
+        feat = mu + eps * std
+
+        # Compute KL divergence loss between the learned distribution and a standard normal distribution
+        kld_loss = -0.5 * torch.mean(torch.sum(1 + log_var - mu ** 2 - log_var.exp(), dim=1))
+
+        return feat, kld_loss
diff --git a/chameleon/optim/__init__.py b/chameleon/optim/__init__.py
new file mode 100644
index 0000000..b3e2e35
--- /dev/null
+++ b/chameleon/optim/__init__.py
@@ -0,0 +1,23 @@
+from torch.optim import (ASGD, LBFGS, SGD, Adadelta, Adagrad, Adam, Adamax,
+                         AdamW, RMSprop, Rprop, SparseAdam)
+from torch.optim.lr_scheduler import (CosineAnnealingLR,
+                                      CosineAnnealingWarmRestarts, CyclicLR,
+                                      ExponentialLR, LambdaLR, MultiStepLR,
+                                      OneCycleLR, ReduceLROnPlateau, StepLR)
+
+from .polynomial_lr_warmup import PolynomialLRWarmup
+from .warm_up import *
+
+
+def build_optimizer(model_params, name, **optim_options):
+    cls_ = globals().get(name, None)
+    if cls_ is None:
+        raise ValueError(f'{name} is not supported optimizer.')
+    return cls_(model_params, **optim_options)
+
+
+def build_lr_scheduler(optimizer, name, **lr_scheduler_options):
+    cls_ = globals().get(name, None)
+    if cls_ is None:
+        raise ValueError(f'{name} is not supported lr scheduler.')
+    return cls_(optimizer, **lr_scheduler_options)
diff --git a/chameleon/optim/polynomial_lr_warmup.py b/chameleon/optim/polynomial_lr_warmup.py
new file mode 100644
index 0000000..70fac78
--- /dev/null
+++ b/chameleon/optim/polynomial_lr_warmup.py
@@ -0,0 +1,52 @@
+import warnings
+
+from torch.optim.lr_scheduler import _LRScheduler
+
+
+class PolynomialLRWarmup(_LRScheduler):
+
+    def __init__(
+        self,
+        optimizer,
+        warmup_iters,
+        total_iters=5,
+        power=1.0,
+        last_epoch=-1,
+        verbose=False
+    ):
+        super().__init__(optimizer, last_epoch=last_epoch, verbose=verbose)
+        self.total_iters = total_iters
+        self.power = power
+        self.warmup_iters = warmup_iters
+
+    def get_lr(self):
+        if not self._get_lr_called_within_step:
+            warnings.warn("To get the last learning rate computed by the scheduler, "
+                          "please use `get_last_lr()`.", UserWarning)
+
+        if self.last_epoch == 0 or self.last_epoch > self.total_iters:
+            return [group["lr"] for group in self.optimizer.param_groups]
+
+        if self.last_epoch <= self.warmup_iters:
+            return [base_lr * self.last_epoch / self.warmup_iters for base_lr in self.base_lrs]
+        else:
+            l = self.last_epoch
+            w = self.warmup_iters
+            t = self.total_iters
+            decay_factor = ((1.0 - (l - w) / (t - w)) /
+                            (1.0 - (l - 1 - w) / (t - w))) ** self.power
+        return [group["lr"] * decay_factor for group in self.optimizer.param_groups]
+
+    def _get_closed_form_lr(self):
+
+        if self.last_epoch <= self.warmup_iters:
+            return [
+                base_lr * self.last_epoch / self.warmup_iters for base_lr in self.base_lrs]
+        else:
+            return [
+                (
+                    base_lr * (1.0 - (min(self.total_iters, self.last_epoch) - self.warmup_iters) / (
+                        self.total_iters - self.warmup_iters)) ** self.power
+                )
+                for base_lr in self.base_lrs
+            ]
diff --git a/chameleon/optim/warm_up.py b/chameleon/optim/warm_up.py
new file mode 100644
index 0000000..2e97ddb
--- /dev/null
+++ b/chameleon/optim/warm_up.py
@@ -0,0 +1,81 @@
+from typing import List
+
+from torch.optim import Optimizer
+from torch.optim.lr_scheduler import MultiStepLR, _LRScheduler
+
+__all__ = ['WrappedLRScheduler', 'MultiStepLRWarmUp']
+
+
+class WrappedLRScheduler(_LRScheduler):
+    """
+    Gradually warm-up(increasing) learning rate in optimizer.
+    Proposed in 'Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour'.
+    Args:
+        optimizer (Optimizer): Wrapped optimizer.
+        milestone (int):
+            milestone step for warm-up.
+        multiplier (float):
+            A factor to multiply base_lr.
+            if multiplier > 1.0, learning rate = base lr * multiplier.
+            if multiplier = 1.0, lr starts from 0 and ends up with the base_lr.
+        after_scheduler (lr_scheduler):
+            after target_epoch, use this scheduler(eg. ReduceLROnPlateau)
+    """
+
+    def __init__(
+        self,
+        optimizer: Optimizer,
+        milestone: int,
+        multiplier: float = 1.0,
+        after_scheduler: _LRScheduler = None,
+        interval='step'
+    ):
+        self.multiplier = multiplier
+        if self.multiplier < 1.:
+            raise ValueError('multiplier should be greater thant or equal to 1.')
+        self.milestone = milestone
+        self.after_scheduler = after_scheduler
+        self.finished = False
+        self.interval = interval
+        super().__init__(optimizer)  # need be set in the end of __init__
+
+    def get_lr(self):
+        # do after_scheduler
+        if self.last_epoch > self.milestone:
+            if self.after_scheduler:
+                if not self.finished:
+                    self.after_scheduler.base_lrs = [base_lr * self.multiplier for base_lr in self.base_lrs]
+                    self.finished = True
+                return self.after_scheduler.get_last_lr()
+            return [base_lr * self.multiplier for base_lr in self.base_lrs]
+
+        if self.multiplier == 1.0:
+            return [base_lr * (float(self.last_epoch) / self.milestone) for base_lr in self.base_lrs]
+        else:
+            return [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.milestone + 1.) for base_lr in self.base_lrs]
+
+    def step(self, epoch=None, metrics=None):
+        if self.finished and self.after_scheduler:
+            if epoch is None:
+                self.after_scheduler.step(None)
+            else:
+                self.after_scheduler.step(epoch - self.milestone)
+            self._last_lr = self.after_scheduler.get_last_lr()
+        else:
+            return super().step()
+
+
+def MultiStepLRWarmUp(
+    optimizer: Optimizer,
+    milestones: List[int],
+    warmup_milestone: int,
+    gamma: float = 0.1,
+    last_epoch: int = -1,
+    interval='step',
+    verbose: bool = False,
+):
+    scheduler = MultiStepLR(optimizer, milestones, gamma, last_epoch, verbose)
+    return WrappedLRScheduler(optimizer,
+                              warmup_milestone,
+                              after_scheduler=scheduler,
+                              interval=interval)
diff --git a/chameleon/tools/__init__.py b/chameleon/tools/__init__.py
new file mode 100644
index 0000000..b04de5e
--- /dev/null
+++ b/chameleon/tools/__init__.py
@@ -0,0 +1,4 @@
+from .custom_aug import *
+from .mixin import *
+from .model_profile import *
+from .replace import *
diff --git a/chameleon/tools/cpuinfo.py b/chameleon/tools/cpuinfo.py
new file mode 100644
index 0000000..b6c167d
--- /dev/null
+++ b/chameleon/tools/cpuinfo.py
@@ -0,0 +1,874 @@
+###################################################################
+#  cpuinfo - Get information about CPU
+#
+#      License: BSD
+#      Author:  Pearu Peterson <pearu@cens.ioc.ee>
+#
+#  See LICENSES/cpuinfo.txt for details about copyright and
+#  rights to use.
+####################################################################
+"""
+cpuinfo
+Copyright 2002 Pearu Peterson all rights reserved,
+Pearu Peterson <pearu@cens.ioc.ee>
+Permission to use, modify, and distribute this software is given under the
+terms of the NumPy (BSD style) license.  See LICENSE.txt that came with
+this distribution for specifics.
+NO WARRANTY IS EXPRESSED OR IMPLIED.  USE AT YOUR OWN RISK.
+Pearu Peterson
+
+Ref: https://github.com/pydata/numexpr/blob/master/numexpr/cpuinfo.py
+
+Usage:
+    >>> from cpuinfo import cpuinfo
+    >>> info = cpuinfo() # len(info) equals to num of cpus.
+    >>> print(list(info[0].keys()))
+    >>> {
+            'processor',
+            'vendor_id',
+            'cpu family',
+            'model',
+            'model name',
+            'stepping',
+            'microcode',
+            'cpu MHz',
+            'cache size',
+            'physical id',
+            'siblings',
+            'core id',
+            'cpu cores',
+            'apicid',
+            'initial apicid',
+            'fpu',
+            'fpu_exception',
+            'cpuid level',
+            'wp',
+            'flags',
+            'vmx flags',
+            'bugs',
+            'bogomips',
+            'clflush size',
+            'cache_alignment',
+            'address sizes',
+            'power management'
+        }
+"""
+
+__all__ = ['cpuinfo']
+
+import inspect
+import os
+import platform
+import re
+import subprocess
+import sys
+import warnings
+
+is_cpu_amd_intel = False # DEPRECATION WARNING: WILL BE REMOVED IN FUTURE RELEASE
+
+def getoutput(cmd, successful_status=(0,), stacklevel=1):
+    try:
+        p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
+        output, _ = p.communicate()
+        status = p.returncode
+    except EnvironmentError as e:
+        warnings.warn(str(e), UserWarning, stacklevel=stacklevel)
+        return False, ''
+    if os.WIFEXITED(status) and os.WEXITSTATUS(status) in successful_status:
+        return True, output
+    return False, output
+
+
+def command_info(successful_status=(0,), stacklevel=1, **kw):
+    info = {}
+    for key in kw:
+        ok, output = getoutput(kw[key], successful_status=successful_status,
+                               stacklevel=stacklevel + 1)
+        if ok:
+            info[key] = output.strip()
+    return info
+
+
+def command_by_line(cmd, successful_status=(0,), stacklevel=1):
+    ok, output = getoutput(cmd, successful_status=successful_status,
+                           stacklevel=stacklevel + 1)
+    if not ok:
+        return
+
+    # XXX: check
+    output = output.decode('ascii')
+
+    for line in output.splitlines():
+        yield line.strip()
+
+
+def key_value_from_command(cmd, sep, successful_status=(0,),
+                           stacklevel=1):
+    d = {}
+    for line in command_by_line(cmd, successful_status=successful_status,
+                                stacklevel=stacklevel + 1):
+        l = [s.strip() for s in line.split(sep, 1)]
+        if len(l) == 2:
+            d[l[0]] = l[1]
+    return d
+
+
+class CPUInfoBase(object):
+    """Holds CPU information and provides methods for requiring
+    the availability of various CPU features.
+    """
+
+    def _try_call(self, func):
+        try:
+            return func()
+        except:
+            pass
+
+    def __getattr__(self, name):
+        if not name.startswith('_'):
+            if hasattr(self, '_' + name):
+                attr = getattr(self, '_' + name)
+                if inspect.ismethod(attr):
+                    return lambda func=self._try_call, attr=attr: func(attr)
+            else:
+                return lambda: None
+        raise AttributeError(name)
+
+    def _getNCPUs(self):
+        return 1
+
+    def __get_nbits(self):
+        abits = platform.architecture()[0]
+        nbits = re.compile(r'(\d+)bit').search(abits).group(1)
+        return nbits
+
+    def _is_32bit(self):
+        return self.__get_nbits() == '32'
+
+    def _is_64bit(self):
+        return self.__get_nbits() == '64'
+
+
+class LinuxCPUInfo(CPUInfoBase):
+    info = None
+
+    def __init__(self):
+        if self.info is not None:
+            return
+        info = [{}]
+        ok, output = getoutput(['uname', '-m'])
+        if ok:
+            info[0]['uname_m'] = output.strip()
+        try:
+            fo = open('/proc/cpuinfo')
+        except EnvironmentError as e:
+            warnings.warn(str(e), UserWarning)
+        else:
+            for line in fo:
+                name_value = [s.strip() for s in line.split(':', 1)]
+                if len(name_value) != 2:
+                    continue
+                name, value = name_value
+                if not info or name in info[-1]:  # next processor
+                    info.append({})
+                info[-1][name] = value
+            fo.close()
+        self.__class__.info = info
+
+    def _not_impl(self):
+        pass
+
+    # Athlon
+
+    def _is_AMD(self):
+        return self.info[0]['vendor_id'] == 'AuthenticAMD'
+
+    def _is_AthlonK6_2(self):
+        return self._is_AMD() and self.info[0]['model'] == '2'
+
+    def _is_AthlonK6_3(self):
+        return self._is_AMD() and self.info[0]['model'] == '3'
+
+    def _is_AthlonK6(self):
+        return re.match(r'.*?AMD-K6', self.info[0]['model name']) is not None
+
+    def _is_AthlonK7(self):
+        return re.match(r'.*?AMD-K7', self.info[0]['model name']) is not None
+
+    def _is_AthlonMP(self):
+        return re.match(r'.*?Athlon\(tm\) MP\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_AMD64(self):
+        return self.is_AMD() and self.info[0]['family'] == '15'
+
+    def _is_Athlon64(self):
+        return re.match(r'.*?Athlon\(tm\) 64\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_AthlonHX(self):
+        return re.match(r'.*?Athlon HX\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_Opteron(self):
+        return re.match(r'.*?Opteron\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_Hammer(self):
+        return re.match(r'.*?Hammer\b',
+                        self.info[0]['model name']) is not None
+
+    # Alpha
+
+    def _is_Alpha(self):
+        return self.info[0]['cpu'] == 'Alpha'
+
+    def _is_EV4(self):
+        return self.is_Alpha() and self.info[0]['cpu model'] == 'EV4'
+
+    def _is_EV5(self):
+        return self.is_Alpha() and self.info[0]['cpu model'] == 'EV5'
+
+    def _is_EV56(self):
+        return self.is_Alpha() and self.info[0]['cpu model'] == 'EV56'
+
+    def _is_PCA56(self):
+        return self.is_Alpha() and self.info[0]['cpu model'] == 'PCA56'
+
+    # Intel
+
+    #XXX
+    _is_i386 = _not_impl
+
+    def _is_Intel(self):
+        return self.info[0]['vendor_id'] == 'GenuineIntel'
+
+    def _is_i486(self):
+        return self.info[0]['cpu'] == 'i486'
+
+    def _is_i586(self):
+        return self.is_Intel() and self.info[0]['cpu family'] == '5'
+
+    def _is_i686(self):
+        return self.is_Intel() and self.info[0]['cpu family'] == '6'
+
+    def _is_Celeron(self):
+        return re.match(r'.*?Celeron',
+                        self.info[0]['model name']) is not None
+
+    def _is_Pentium(self):
+        return re.match(r'.*?Pentium',
+                        self.info[0]['model name']) is not None
+
+    def _is_PentiumII(self):
+        return re.match(r'.*?Pentium.*?II\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_PentiumPro(self):
+        return re.match(r'.*?PentiumPro\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_PentiumMMX(self):
+        return re.match(r'.*?Pentium.*?MMX\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_PentiumIII(self):
+        return re.match(r'.*?Pentium.*?III\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_PentiumIV(self):
+        return re.match(r'.*?Pentium.*?(IV|4)\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_PentiumM(self):
+        return re.match(r'.*?Pentium.*?M\b',
+                        self.info[0]['model name']) is not None
+
+    def _is_Prescott(self):
+        return self.is_PentiumIV() and self.has_sse3()
+
+    def _is_Nocona(self):
+        return (self.is_Intel() and
+                self.info[0]['cpu family'] in ('6', '15') and
+                # two s sse3; three s ssse3 not the same thing, this is fine
+                (self.has_sse3() and not self.has_ssse3()) and
+                re.match(r'.*?\blm\b', self.info[0]['flags']) is not None)
+
+    def _is_Core2(self):
+        return (self.is_64bit() and self.is_Intel() and
+                re.match(r'.*?Core\(TM\)2\b',
+                         self.info[0]['model name']) is not None)
+
+    def _is_Itanium(self):
+        return re.match(r'.*?Itanium\b',
+                        self.info[0]['family']) is not None
+
+    def _is_XEON(self):
+        return re.match(r'.*?XEON\b',
+                        self.info[0]['model name'], re.IGNORECASE) is not None
+
+    _is_Xeon = _is_XEON
+
+    # Power
+    def _is_Power(self):
+        return re.match(r'.*POWER.*',
+                       self.info[0]['cpu']) is not None
+
+    def _is_Power7(self):
+        return re.match(r'.*POWER7.*',
+                       self.info[0]['cpu']) is not None
+
+    def _is_Power8(self):
+        return re.match(r'.*POWER8.*',
+                       self.info[0]['cpu']) is not None
+
+    def _is_Power9(self):
+        return re.match(r'.*POWER9.*',
+                       self.info[0]['cpu']) is not None
+
+    def _has_Altivec(self):
+        return re.match(r'.*altivec\ supported.*',
+                       self.info[0]['cpu']) is not None
+
+    # Varia
+
+    def _is_singleCPU(self):
+        return len(self.info) == 1
+
+    def _getNCPUs(self):
+        return len(self.info)
+
+    def _has_fdiv_bug(self):
+        return self.info[0]['fdiv_bug'] == 'yes'
+
+    def _has_f00f_bug(self):
+        return self.info[0]['f00f_bug'] == 'yes'
+
+    def _has_mmx(self):
+        return re.match(r'.*?\bmmx\b', self.info[0]['flags']) is not None
+
+    def _has_sse(self):
+        return re.match(r'.*?\bsse\b', self.info[0]['flags']) is not None
+
+    def _has_sse2(self):
+        return re.match(r'.*?\bsse2\b', self.info[0]['flags']) is not None
+
+    def _has_sse3(self):
+        return re.match(r'.*?\bpni\b', self.info[0]['flags']) is not None
+
+    def _has_ssse3(self):
+        return re.match(r'.*?\bssse3\b', self.info[0]['flags']) is not None
+
+    def _has_3dnow(self):
+        return re.match(r'.*?\b3dnow\b', self.info[0]['flags']) is not None
+
+    def _has_3dnowext(self):
+        return re.match(r'.*?\b3dnowext\b', self.info[0]['flags']) is not None
+
+
+class IRIXCPUInfo(CPUInfoBase):
+    info = None
+
+    def __init__(self):
+        if self.info is not None:
+            return
+        info = key_value_from_command('sysconf', sep=' ',
+                                      successful_status=(0, 1))
+        self.__class__.info = info
+
+    def _not_impl(self):
+        pass
+
+    def _is_singleCPU(self):
+        return self.info.get('NUM_PROCESSORS') == '1'
+
+    def _getNCPUs(self):
+        return int(self.info.get('NUM_PROCESSORS', 1))
+
+    def __cputype(self, n):
+        return self.info.get('PROCESSORS').split()[0].lower() == 'r%s' % (n)
+
+    def _is_r2000(self):
+        return self.__cputype(2000)
+
+    def _is_r3000(self):
+        return self.__cputype(3000)
+
+    def _is_r3900(self):
+        return self.__cputype(3900)
+
+    def _is_r4000(self):
+        return self.__cputype(4000)
+
+    def _is_r4100(self):
+        return self.__cputype(4100)
+
+    def _is_r4300(self):
+        return self.__cputype(4300)
+
+    def _is_r4400(self):
+        return self.__cputype(4400)
+
+    def _is_r4600(self):
+        return self.__cputype(4600)
+
+    def _is_r4650(self):
+        return self.__cputype(4650)
+
+    def _is_r5000(self):
+        return self.__cputype(5000)
+
+    def _is_r6000(self):
+        return self.__cputype(6000)
+
+    def _is_r8000(self):
+        return self.__cputype(8000)
+
+    def _is_r10000(self):
+        return self.__cputype(10000)
+
+    def _is_r12000(self):
+        return self.__cputype(12000)
+
+    def _is_rorion(self):
+        return self.__cputype('orion')
+
+    def get_ip(self):
+        try:
+            return self.info.get('MACHINE')
+        except:
+            pass
+
+    def __machine(self, n):
+        return self.info.get('MACHINE').lower() == 'ip%s' % (n)
+
+    def _is_IP19(self):
+        return self.__machine(19)
+
+    def _is_IP20(self):
+        return self.__machine(20)
+
+    def _is_IP21(self):
+        return self.__machine(21)
+
+    def _is_IP22(self):
+        return self.__machine(22)
+
+    def _is_IP22_4k(self):
+        return self.__machine(22) and self._is_r4000()
+
+    def _is_IP22_5k(self):
+        return self.__machine(22) and self._is_r5000()
+
+    def _is_IP24(self):
+        return self.__machine(24)
+
+    def _is_IP25(self):
+        return self.__machine(25)
+
+    def _is_IP26(self):
+        return self.__machine(26)
+
+    def _is_IP27(self):
+        return self.__machine(27)
+
+    def _is_IP28(self):
+        return self.__machine(28)
+
+    def _is_IP30(self):
+        return self.__machine(30)
+
+    def _is_IP32(self):
+        return self.__machine(32)
+
+    def _is_IP32_5k(self):
+        return self.__machine(32) and self._is_r5000()
+
+    def _is_IP32_10k(self):
+        return self.__machine(32) and self._is_r10000()
+
+
+class DarwinCPUInfo(CPUInfoBase):
+    info = None
+
+    def __init__(self):
+        if self.info is not None:
+            return
+        info = command_info(arch='arch',
+                            machine='machine')
+        info['sysctl_hw'] = key_value_from_command(['sysctl', 'hw'], sep='=')
+        self.__class__.info = info
+
+    def _not_impl(self): pass
+
+    def _getNCPUs(self):
+        return int(self.info['sysctl_hw'].get('hw.ncpu', 1))
+
+    def _is_Power_Macintosh(self):
+        return self.info['sysctl_hw']['hw.machine'] == 'Power Macintosh'
+
+    def _is_i386(self):
+        return self.info['arch'] == 'i386'
+
+    def _is_ppc(self):
+        return self.info['arch'] == 'ppc'
+
+    def __machine(self, n):
+        return self.info['machine'] == 'ppc%s' % n
+
+    def _is_ppc601(self): return self.__machine(601)
+
+    def _is_ppc602(self): return self.__machine(602)
+
+    def _is_ppc603(self): return self.__machine(603)
+
+    def _is_ppc603e(self): return self.__machine('603e')
+
+    def _is_ppc604(self): return self.__machine(604)
+
+    def _is_ppc604e(self): return self.__machine('604e')
+
+    def _is_ppc620(self): return self.__machine(620)
+
+    def _is_ppc630(self): return self.__machine(630)
+
+    def _is_ppc740(self): return self.__machine(740)
+
+    def _is_ppc7400(self): return self.__machine(7400)
+
+    def _is_ppc7450(self): return self.__machine(7450)
+
+    def _is_ppc750(self): return self.__machine(750)
+
+    def _is_ppc403(self): return self.__machine(403)
+
+    def _is_ppc505(self): return self.__machine(505)
+
+    def _is_ppc801(self): return self.__machine(801)
+
+    def _is_ppc821(self): return self.__machine(821)
+
+    def _is_ppc823(self): return self.__machine(823)
+
+    def _is_ppc860(self): return self.__machine(860)
+
+class NetBSDCPUInfo(CPUInfoBase):
+	info = None
+
+	def __init__(self):
+		if self.info is not None:
+			return
+		info = {}
+		info['sysctl_hw'] = key_value_from_command(['sysctl', 'hw'], sep='=')
+		info['arch'] = info['sysctl_hw'].get('hw.machine_arch', 1)
+		info['machine'] = info['sysctl_hw'].get('hw.machine', 1)
+		self.__class__.info = info
+
+	def _not_impl(self): pass
+
+	def _getNCPUs(self):
+		return int(self.info['sysctl_hw'].get('hw.ncpu', 1))
+
+	def _is_Intel(self):
+		if self.info['sysctl_hw'].get('hw.model', "")[0:5] == 'Intel':
+			return True
+		return False
+
+	def _is_AMD(self):
+		if self.info['sysctl_hw'].get('hw.model', "")[0:3] == 'AMD':
+			return True
+		return False
+
+class SunOSCPUInfo(CPUInfoBase):
+    info = None
+
+    def __init__(self):
+        if self.info is not None:
+            return
+        info = command_info(arch='arch',
+                            mach='mach',
+                            uname_i=['uname', '-i'],
+                            isainfo_b=['isainfo', '-b'],
+                            isainfo_n=['isainfo', '-n'],
+        )
+        info['uname_X'] = key_value_from_command(['uname', '-X'], sep='=')
+        for line in command_by_line(['psrinfo', '-v', '0']):
+            m = re.match(r'\s*The (?P<p>[\w\d]+) processor operates at', line)
+            if m:
+                info['processor'] = m.group('p')
+                break
+        self.__class__.info = info
+
+    def _not_impl(self):
+        pass
+
+    def _is_i386(self):
+        return self.info['isainfo_n'] == 'i386'
+
+    def _is_sparc(self):
+        return self.info['isainfo_n'] == 'sparc'
+
+    def _is_sparcv9(self):
+        return self.info['isainfo_n'] == 'sparcv9'
+
+    def _getNCPUs(self):
+        return int(self.info['uname_X'].get('NumCPU', 1))
+
+    def _is_sun4(self):
+        return self.info['arch'] == 'sun4'
+
+    def _is_SUNW(self):
+        return re.match(r'SUNW', self.info['uname_i']) is not None
+
+    def _is_sparcstation5(self):
+        return re.match(r'.*SPARCstation-5', self.info['uname_i']) is not None
+
+    def _is_ultra1(self):
+        return re.match(r'.*Ultra-1', self.info['uname_i']) is not None
+
+    def _is_ultra250(self):
+        return re.match(r'.*Ultra-250', self.info['uname_i']) is not None
+
+    def _is_ultra2(self):
+        return re.match(r'.*Ultra-2', self.info['uname_i']) is not None
+
+    def _is_ultra30(self):
+        return re.match(r'.*Ultra-30', self.info['uname_i']) is not None
+
+    def _is_ultra4(self):
+        return re.match(r'.*Ultra-4', self.info['uname_i']) is not None
+
+    def _is_ultra5_10(self):
+        return re.match(r'.*Ultra-5_10', self.info['uname_i']) is not None
+
+    def _is_ultra5(self):
+        return re.match(r'.*Ultra-5', self.info['uname_i']) is not None
+
+    def _is_ultra60(self):
+        return re.match(r'.*Ultra-60', self.info['uname_i']) is not None
+
+    def _is_ultra80(self):
+        return re.match(r'.*Ultra-80', self.info['uname_i']) is not None
+
+    def _is_ultraenterprice(self):
+        return re.match(r'.*Ultra-Enterprise', self.info['uname_i']) is not None
+
+    def _is_ultraenterprice10k(self):
+        return re.match(r'.*Ultra-Enterprise-10000', self.info['uname_i']) is not None
+
+    def _is_sunfire(self):
+        return re.match(r'.*Sun-Fire', self.info['uname_i']) is not None
+
+    def _is_ultra(self):
+        return re.match(r'.*Ultra', self.info['uname_i']) is not None
+
+    def _is_cpusparcv7(self):
+        return self.info['processor'] == 'sparcv7'
+
+    def _is_cpusparcv8(self):
+        return self.info['processor'] == 'sparcv8'
+
+    def _is_cpusparcv9(self):
+        return self.info['processor'] == 'sparcv9'
+
+
+class Win32CPUInfo(CPUInfoBase):
+    info = None
+    pkey = r"HARDWARE\DESCRIPTION\System\CentralProcessor"
+    # XXX: what does the value of
+    #   HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\CentralProcessor\0
+    # mean?
+
+    def __init__(self):
+        try:
+            import _winreg
+        except ImportError:  # Python 3
+            import winreg as _winreg
+
+        if self.info is not None:
+            return
+        info = []
+        try:
+            #XXX: Bad style to use so long `try:...except:...`. Fix it!
+
+            prgx = re.compile(r"family\s+(?P<FML>\d+)\s+model\s+(?P<MDL>\d+)"
+                              r"\s+stepping\s+(?P<STP>\d+)", re.IGNORECASE)
+            chnd = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, self.pkey)
+            pnum = 0
+            while 1:
+                try:
+                    proc = _winreg.EnumKey(chnd, pnum)
+                except _winreg.error:
+                    break
+                else:
+                    pnum += 1
+                    info.append({"Processor": proc})
+                    phnd = _winreg.OpenKey(chnd, proc)
+                    pidx = 0
+                    while True:
+                        try:
+                            name, value, _ = _winreg.EnumValue(phnd, pidx)
+                        except _winreg.error:
+                            break
+                        else:
+                            pidx = pidx + 1
+                            info[-1][name] = value
+                            if name == "Identifier":
+                                srch = prgx.search(value)
+                                if srch:
+                                    info[-1]["Family"] = int(srch.group("FML"))
+                                    info[-1]["Model"] = int(srch.group("MDL"))
+                                    info[-1]["Stepping"] = int(srch.group("STP"))
+        except:
+            print(sys.exc_value, '(ignoring)')
+        self.__class__.info = info
+
+    def _not_impl(self):
+        pass
+
+    # Athlon
+
+    def _is_AMD(self):
+        return self.info[0]['VendorIdentifier'] == 'AuthenticAMD'
+
+    def _is_Am486(self):
+        return self.is_AMD() and self.info[0]['Family'] == 4
+
+    def _is_Am5x86(self):
+        return self.is_AMD() and self.info[0]['Family'] == 4
+
+    def _is_AMDK5(self):
+        return (self.is_AMD() and self.info[0]['Family'] == 5 and
+                self.info[0]['Model'] in [0, 1, 2, 3])
+
+    def _is_AMDK6(self):
+        return (self.is_AMD() and self.info[0]['Family'] == 5 and
+                self.info[0]['Model'] in [6, 7])
+
+    def _is_AMDK6_2(self):
+        return (self.is_AMD() and self.info[0]['Family'] == 5 and
+                self.info[0]['Model'] == 8)
+
+    def _is_AMDK6_3(self):
+        return (self.is_AMD() and self.info[0]['Family'] == 5 and
+                self.info[0]['Model'] == 9)
+
+    def _is_AMDK7(self):
+        return self.is_AMD() and self.info[0]['Family'] == 6
+
+    # To reliably distinguish between the different types of AMD64 chips
+    # (Athlon64, Operton, Athlon64 X2, Semperon, Turion 64, etc.) would
+    # require looking at the 'brand' from cpuid
+
+    def _is_AMD64(self):
+        return self.is_AMD() and self.info[0]['Family'] == 15
+
+    # Intel
+
+    def _is_Intel(self):
+        return self.info[0]['VendorIdentifier'] == 'GenuineIntel'
+
+    def _is_i386(self):
+        return self.info[0]['Family'] == 3
+
+    def _is_i486(self):
+        return self.info[0]['Family'] == 4
+
+    def _is_i586(self):
+        return self.is_Intel() and self.info[0]['Family'] == 5
+
+    def _is_i686(self):
+        return self.is_Intel() and self.info[0]['Family'] == 6
+
+    def _is_Pentium(self):
+        return self.is_Intel() and self.info[0]['Family'] == 5
+
+    def _is_PentiumMMX(self):
+        return (self.is_Intel() and self.info[0]['Family'] == 5 and
+                self.info[0]['Model'] == 4)
+
+    def _is_PentiumPro(self):
+        return (self.is_Intel() and self.info[0]['Family'] == 6 and
+                self.info[0]['Model'] == 1)
+
+    def _is_PentiumII(self):
+        return (self.is_Intel() and self.info[0]['Family'] == 6 and
+                self.info[0]['Model'] in [3, 5, 6])
+
+    def _is_PentiumIII(self):
+        return (self.is_Intel() and self.info[0]['Family'] == 6 and
+                self.info[0]['Model'] in [7, 8, 9, 10, 11])
+
+    def _is_PentiumIV(self):
+        return self.is_Intel() and self.info[0]['Family'] == 15
+
+    def _is_PentiumM(self):
+        return (self.is_Intel() and self.info[0]['Family'] == 6 and
+                self.info[0]['Model'] in [9, 13, 14])
+
+    def _is_Core2(self):
+        return (self.is_Intel() and self.info[0]['Family'] == 6 and
+                self.info[0]['Model'] in [15, 16, 17])
+
+    # Varia
+
+    def _is_singleCPU(self):
+        return len(self.info) == 1
+
+    def _getNCPUs(self):
+        return len(self.info)
+
+    def _has_mmx(self):
+        if self.is_Intel():
+            return ((self.info[0]['Family'] == 5 and
+                     self.info[0]['Model'] == 4) or
+                    (self.info[0]['Family'] in [6, 15]))
+        elif self.is_AMD():
+            return self.info[0]['Family'] in [5, 6, 15]
+        else:
+            return False
+
+    def _has_sse(self):
+        if self.is_Intel():
+            return ((self.info[0]['Family'] == 6 and
+                     self.info[0]['Model'] in [7, 8, 9, 10, 11]) or
+                    self.info[0]['Family'] == 15)
+        elif self.is_AMD():
+            return ((self.info[0]['Family'] == 6 and
+                     self.info[0]['Model'] in [6, 7, 8, 10]) or
+                    self.info[0]['Family'] == 15)
+        else:
+            return False
+
+    def _has_sse2(self):
+        if self.is_Intel():
+            return self.is_Pentium4() or self.is_PentiumM() or self.is_Core2()
+        elif self.is_AMD():
+            return self.is_AMD64()
+        else:
+            return False
+
+    def _has_3dnow(self):
+        return self.is_AMD() and self.info[0]['Family'] in [5, 6, 15]
+
+    def _has_3dnowext(self):
+        return self.is_AMD() and self.info[0]['Family'] in [6, 15]
+
+
+if sys.platform.startswith('linux'):  # variations: linux2,linux-i386 (any others?)
+    cpuinfo = LinuxCPUInfo
+elif sys.platform.startswith('irix'):
+    cpuinfo = IRIXCPUInfo
+elif sys.platform == 'darwin':
+    cpuinfo = DarwinCPUInfo
+elif sys.platform[0:6] == 'netbsd':
+    cpuinfo = NetBSDCPUInfo
+elif sys.platform.startswith('sunos'):
+    cpuinfo = SunOSCPUInfo
+elif sys.platform.startswith('win32'):
+    cpuinfo = Win32CPUInfo
+elif sys.platform.startswith('cygwin'):
+    cpuinfo = LinuxCPUInfo
+#XXX: other OS's. Eg. use _winreg on Win32. Or os.uname on unices.
+else:
+    cpuinfo = CPUInfoBase
diff --git a/chameleon/tools/custom_aug.py b/chameleon/tools/custom_aug.py
new file mode 100644
index 0000000..b16cbd5
--- /dev/null
+++ b/chameleon/tools/custom_aug.py
@@ -0,0 +1,99 @@
+import math
+import random
+from typing import Tuple
+
+import albumentations as A
+import cv2
+import numpy as np
+from PIL import Image
+
+from .mixin import BorderValueMixin, FillValueMixin
+
+__all__ = [
+    'RandomSunFlare', 'CoarseDropout', 'ShiftScaleRotate', 'SaftRotate',
+    'Perspective', 'Shear', 'Rotate180',
+]
+
+
+class RandomSunFlare(A.RandomSunFlare):
+
+    @property
+    def src_radius(self):
+        return random.randint(50, 200)
+
+    @src_radius.setter
+    def src_radius(self, x):
+        return None
+
+
+class CoarseDropout(FillValueMixin, A.CoarseDropout):
+    ...
+
+
+class ShiftScaleRotate(BorderValueMixin, A.ShiftScaleRotate):
+    ...
+
+
+class SaftRotate(BorderValueMixin, A.SafeRotate):
+    ...
+
+
+class Perspective(BorderValueMixin, A.Perspective):
+    ...
+
+
+class Shear:
+
+    def __init__(self, max_shear: Tuple[int, int] = (20, 20), p: float = 0.5):
+        self.p = p
+        self.max_shear_left, self.max_shear_right = max_shear
+
+    def __call__(self, img):
+        if np.random.rand() < self.p:
+            height, width, *_ = img.shape
+            img = Image.fromarray(img)
+
+            angle_to_shear = int(
+                np.random.uniform(-self.max_shear_left - 1, self.max_shear_right + 1))
+            if angle_to_shear != -1:
+                angle_to_shear += 1
+
+            phi = math.tan(math.radians(angle_to_shear))
+            shift_in_pixels = phi * height
+            shift_in_pixels = math.ceil(shift_in_pixels) \
+                if shift_in_pixels > 0 else math.floor(shift_in_pixels)
+
+            matrix_offset = shift_in_pixels
+            if angle_to_shear <= 0:
+                shift_in_pixels = abs(shift_in_pixels)
+                matrix_offset = 0
+                phi = abs(phi) * -1
+
+            transform_matrix = (1, phi, -matrix_offset, 0, 1, 0)
+            img = img.transform((int(round(width + shift_in_pixels)), height),
+                                Image.AFFINE,
+                                transform_matrix,
+                                Image.BICUBIC)
+
+            img = img.crop((abs(shift_in_pixels), 0, width, height))
+            img = cv2.resize(np.array(img), (width, height))
+
+        return img
+
+
+class Rotate180:
+
+    def __init__(self, p: float = 0.5):
+        self.p = p
+        self.rotate180 = A.Compose([
+            A.HorizontalFlip(p=1),
+            A.VerticalFlip(p=1),
+        ], p=1)
+
+    def __call__(self, **kwargs):
+        is_rotate = 0
+        if np.random.rand() < self.p:
+            results = self.rotate180(**kwargs)
+            is_rotate = 1
+        results.update({'is_rotate': is_rotate})
+        return results
diff --git a/chameleon/tools/mixin.py b/chameleon/tools/mixin.py
new file mode 100644
index 0000000..de4c274
--- /dev/null
+++ b/chameleon/tools/mixin.py
@@ -0,0 +1,51 @@
+import random
+
+import cv2
+
+__all__ = [
+    'BorderValueMixin', 'FillValueMixin',
+]
+
+
+class BorderValueMixin:
+
+    @property
+    def pad_mode(self):
+        return random.choice([
+            cv2.BORDER_CONSTANT,
+            cv2.BORDER_REPLICATE,
+        ])
+
+    @property
+    def border_mode(self):
+        return random.choice([
+            cv2.BORDER_CONSTANT,
+            cv2.BORDER_REPLICATE,
+        ])
+
+    @property
+    def value(self):
+        return [random.randint(0, 255) for _ in range(3)]
+
+    @pad_mode.setter
+    def pad_mode(self, x):
+        return None
+
+    @border_mode.setter
+    def border_mode(self, x):
+        return None
+
+    @value.setter
+    def value(self, x):
+        return None
+
+
+class FillValueMixin:
+
+    @property
+    def fill_value(self):
+        return [random.randint(0, 255) for _ in range(3)]
+
+    @fill_value.setter
+    def fill_value(self, x):
+        return None
diff --git a/chameleon/tools/model_profile.py b/chameleon/tools/model_profile.py
new file mode 100644
index 0000000..2892b9d
--- /dev/null
+++ b/chameleon/tools/model_profile.py
@@ -0,0 +1,80 @@
+from typing import Dict, Union
+
+import torch
+from calflops import calculate_flops
+from ptflops import get_model_complexity_info
+
+from .cpuinfo import cpuinfo
+
+__all__ = ['get_model_complexity_info',
+           'get_cpu_gflops', 'get_meta_info', 'calculate_flops']
+
+
+def get_cpu_gflops(one_cpu_core: bool = True) -> float:
+    _cpuinfo = cpuinfo()
+    ghz = float(_cpuinfo.info[0]['cpu MHz']) * 10e-3
+    core = 1 if one_cpu_core else int(_cpuinfo.info[0]['cpu cores'])
+    gflops = ghz * core * 10e9
+    return gflops
+
+
+def get_meta_info(macs: float, params: int, one_cpu_core: bool = True) -> dict:
+    return {
+        'Params(M)': f"{params/1e6:.3f}",
+        'MACs(G)': f"{macs/1e9:.3f}",
+        'FLOPs(G)': f"{(macs * 2)/1e9:.3f}",
+        'ModelSize_FP32 (MB)': f"{params * 4 / 1e6:.3f}",
+        'CPU infos': {
+            'cpu_model_name': cpuinfo().info[0]['model name'],
+            'cpu_cores': cpuinfo().info[0]['cpu cores'],
+            'infer_time (ms) (*rough estimate*)': f"{(macs * 2) * 1000 / get_cpu_gflops(one_cpu_core):.3f}",
+        }
+    }
+
+
+def profile_model(
+    model: Union[torch.nn.Module, str],
+    input_shape: tuple = (1, 3, 224, 224),
+    output_as_string: bool = False,
+    output_precision: int = 4,
+    print_detailed: bool = False,
+    features_only: bool = True,
+    one_cpu_core: bool = True
+) -> Dict[str, str]:
+    """
+    Profile a model to get its meta data.
+
+    Args:
+        model (Union[torch.nn.Module, str]): Model to be profiled. If a string is given, it will be treated as the model name by timm library.
+        input_shape (tuple): Input shape of the model. Default: (1, 3, 224, 224).
+        output_as_string (bool): Whether to output the results as string. Default: False.
+        output_precision (int): Precision of the output. Default: 4.
+        print_detailed (bool): Whether to print detailed information. Default: False.
+        features_only (bool): Whether to calculate only the features. Default: True.
+        one_cpu_core (bool): Whether to use only one CPU core. Default: True.
+
+    Returns:
+        Dict[str, str]: Meta data of the model.
+    """
+
+    if isinstance(model, str):
+
+        import timm
+
+        model = timm.create_model(
+            model,
+            pretrained=False,
+            features_only=features_only
+        )
+
+    _, macs, params = calculate_flops(
+        model,
+        input_shape=input_shape,
+        output_as_string=output_as_string,
+        output_precision=output_precision,
+        print_detailed=print_detailed
+    )
+
+    meta_data = get_meta_info(macs, params, one_cpu_core=one_cpu_core)
+
+    return meta_data
diff --git a/chameleon/tools/replace.py b/chameleon/tools/replace.py
new file mode 100644
index 0000000..368f4ff
--- /dev/null
+++ b/chameleon/tools/replace.py
@@ -0,0 +1,66 @@
+from typing import Any, Union
+
+import torch.nn as nn
+
+from ..nn import build_nn, build_nn_cls
+
+__all__ = ['has_children', 'replace_module', 'replace_module_attr_value']
+
+
+def has_children(module):
+    try:
+        next(module.children())
+        return True
+    except StopIteration:
+        return False
+
+
+def replace_module(
+    model: nn.Module,
+    target: Union[type, str],
+    dst_module: Union[nn.Module, dict]
+) -> None:
+    """
+    Function to replace modules.
+
+    Args:
+        model (nn.Module):
+            NN module.
+        target (Union[type, str]):
+            The type of module you want to replace.
+        dst_module (Union[nn.Module, dict]):
+            The module you want to use after replacement.
+    """
+    if not isinstance(dst_module, (nn.Module, dict)):
+        raise ValueError(f'dst_module = {dst_module} should be an instance of Module or dict.')
+
+    target = build_nn_cls(target) if isinstance(target, str) else target
+    dst_module = build_nn(**dst_module) if isinstance(dst_module, dict) else dst_module
+
+    for name, m in model.named_children():
+        if has_children(m):
+            replace_module(m, target, dst_module)
+        else:
+            if isinstance(m, target):
+                setattr(model, name, dst_module)
+
+
+def replace_module_attr_value(
+    model: nn.Module,
+    target: Union[type, str],
+    attr_name: str,
+    attr_value: Any
+) -> None:
+    """
+    Function to replace attr's value in target module
+
+    Args:
+        model (nn.Module): NN module.
+        target (Union[type, str]): The type of module you want to modify.
+        attr_name (str): The name of the attribute you want to modify.
+        attr_value (Any): The new value of the attribute.
+    """
+    target = build_nn_cls(target) if isinstance(target, str) else target
+    for module in model.modules():
+        if isinstance(module, target):
+            setattr(module, attr_name, attr_value)
diff --git a/chameleon/transformers/__init__.py b/chameleon/transformers/__init__.py
new file mode 100644
index 0000000..cbc3b70
--- /dev/null
+++ b/chameleon/transformers/__init__.py
@@ -0,0 +1,108 @@
+import fnmatch
+from functools import partial
+
+from .basic import ImageEncoder, ImageEncoderLayer
+from .efficientformer import EfficientFormer
+from .metaformer import MetaFormer, MlpBlock
+from .mobilevit import MobileViT
+from .poolformer import PoolFormer
+from .token_mixer import (Attention, AttentionMixing, PoolMixing, RandomMixing,
+                          SepConvMixing)
+from .utils import calculate_patch_size, list_models_transformers
+from .vit import ViT
+
+__all__ = [
+    'ViT', 'calculate_patch_size', 'list_models_transformers', 'MobileViT',
+    'PoolFormer', 'MetaFormer', 'MlpBlock', 'build_transformer', 'list_transformer',
+    'Attention', 'AttentionMixing', 'PoolMixing', 'RandomMixing', 'SepConvMixing',
+    'ImageEncoder', 'ImageEncoderLayer', 'EfficientFormer',
+]
+
+BASE_TRANSFORMER_NAMES = {
+    'vit': ViT,
+    'mobilevit': MobileViT,
+    'poolformer': PoolFormer,
+    'metaformer': MetaFormer,
+    'efficientformer': EfficientFormer,
+}
+
+VIT_NAMES = [
+    'vit-base-patch16-224-in21k',
+    'vit-base-patch16-224',
+    'vit-base-patch16-384',
+    'vit-base-patch32-224-in21k',
+    'vit-base-patch32-384',
+    'vit-huge-patch14-224-in21k',
+    'vit-large-patch16-224-in21k',
+    'vit-large-patch16-224',
+    'vit-large-patch16-384',
+    'vit-large-patch32-224-in21k',
+    'vit-large-patch32-384',
+    'vit-hybrid-base-bit-384',
+]
+
+MOBILEVIT_NAMES = [
+    'mobilevit-small',
+    'mobilevit-x-small',
+    'mobilevit-xx-small',
+    'deeplabv3-mobilevit-small',
+    'deeplabv3-mobilevit-x-small',
+    'deeplabv3-mobilevit-xx-small',
+]
+
+POOLFORMER_NAMES = [
+    'poolformer_m36',
+    'poolformer_m48',
+    'poolformer_s12',
+    'poolformer_s24',
+    'poolformer_s36',
+]
+
+METAFORMER_NAMES = [
+    'poolformer_v2_tiny',
+    'poolformer_v2_small',
+    'poolformer_v2_s12',
+    'poolformer_v2_s24',
+    'poolformer_v2_s36',
+    'poolformer_v2_m36',
+    'poolformer_v2_m48',
+    'convformer_s18',
+    'convformer_s36',
+    'convformer_m36',
+    'convformer_b36',
+    'caformer_tiny',
+    'caformer_small',
+    'caformer_s18',
+    'caformer_s36',
+    'caformer_m36',
+    'caformer_b36',
+]
+
+EFFICIENTFORMER_NAMES = [
+    'efficientformer-l1-300',
+    'efficientformer-l3-300',
+    'efficientformer-l7-300',
+]
+
+TRANSFORMER = {
+    **{name: module for name, module in BASE_TRANSFORMER_NAMES.items()},
+    **{name: partial(ViT.from_pretrained, name=f'google/{name}') for name in VIT_NAMES},
+    **{name: partial(MobileViT.from_pretrained, name=f'apple/{name}') for name in MOBILEVIT_NAMES},
+    **{name: partial(PoolFormer.from_pretrained, name=f'sail/{name}') for name in POOLFORMER_NAMES},
+    **{name: partial(MetaFormer.from_pretrained, name=name) for name in METAFORMER_NAMES},
+    **{name: partial(EfficientFormer.from_pretrained, name=f'snap-research/{name}') for name in EFFICIENTFORMER_NAMES},
+}
+
+
+def build_transformer(name: str, **kwargs):
+    if name not in TRANSFORMER:
+        raise ValueError(f'Transformer={name} is not supported.')
+    return TRANSFORMER[name](**kwargs)
+
+
+def list_transformer(filter=''):
+    model_list = list(TRANSFORMER.keys())
+    if len(filter):
+        return fnmatch.filter(model_list, filter)  # include these models
+    else:
+        return model_list
diff --git a/chameleon/transformers/basic.py b/chameleon/transformers/basic.py
new file mode 100644
index 0000000..f0f656a
--- /dev/null
+++ b/chameleon/transformers/basic.py
@@ -0,0 +1,127 @@
+import math
+from typing import Tuple, Union
+
+import torch
+import torch.nn as nn
+
+from ..nn.components import build_activation
+from .token_mixer import SelfAttention
+
+__all__ = ['ImageEncoder', 'ImageEncoderLayer']
+
+
+class ImageEncoderLayer(nn.Module):
+
+    def __init__(
+        self,
+        d_model: int,
+        nhead: int = 8,
+        expand_ratio: float = 2,
+        norm_first: bool = True,
+        inner_act: Union[dict, nn.Module] = {'name': 'StarReLU'},
+    ) -> None:
+        """
+        Initializes the EncoderLayer.
+
+        Args:
+            d_model (int):
+                The number of input dimensions.
+            nhead (int):
+                The number of attention heads.
+            expand_ratio (float, optional):
+                The expansion ratio for the hidden dimensions.
+                Defaults to 2.
+            norm_first (bool, optional):
+                Whether to apply the normalization before the attention layer.
+                Defaults to True.
+            inner_act (Union[dict, nn.Module], optional):
+                The activation function to use for the inner feedforward layer.
+                Defaults to {'name': 'StarReLU'}.
+        """
+        super().__init__()
+        hidden_dims = int(d_model * expand_ratio)
+        self.ffn = nn.Sequential(
+            nn.Linear(d_model, hidden_dims),
+            inner_act if isinstance(
+                inner_act, nn.Module) else build_activation(**inner_act),
+            nn.Linear(hidden_dims, d_model),
+        )
+        self.att = SelfAttention(embed_dim=d_model, num_heads=nhead)
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+        self.norm_first = norm_first
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        if self.norm_first:
+            norm_x = self.norm1(x)
+            att, att_weights = self.att(norm_x, norm_x, norm_x)
+            x = x + att
+            x = x + self.ffn(self.norm2(x))
+        else:
+            att, att_weights = self.att(x, x, x)
+            x = self.norm1(x + att)
+            x = self.norm2(x + self.ffn(x))
+        return x, att_weights
+
+
+class ImageEncoder(nn.Module):
+
+    def __init__(
+        self,
+        d_model: int,
+        num_layers: int,
+        image_size: Union[int, Tuple[int, int]],
+        patch_size: Union[int, Tuple[int, int]] = 16,
+        in_c: int = 3,
+        *args, **kwargs,
+    ) -> None:
+        """
+        Initialize a ImageEncoder module.
+
+        Args:
+            d_model (int):
+                The input dimension of the encoder.
+            num_layers (int):
+                The number of layers in the encoder.
+            image_size (Union[int, Tuple[int, int]]):
+                The input image size.
+            patch_size (Union[int, Tuple[int, int]], optional):
+                The patch size. Defaults to 16.
+            in_c (int):
+                The number of input channels. Defaults to 3.
+        """
+        super().__init__()
+        h, w = image_size if isinstance(
+            image_size, (tuple, list)) else (image_size, image_size)
+        ph, pw = patch_size if isinstance(
+            patch_size, (tuple, list)) else (patch_size, patch_size)
+        nh, nw = h // ph, w // pw
+
+        self.cls_token = nn.Parameter(torch.Tensor(1, 1, d_model))
+        self.pos_emb = nn.Parameter(torch.Tensor(1, nh*nw, d_model))
+        nn.init.kaiming_uniform_(self.cls_token, a=math.sqrt(5))
+        nn.init.kaiming_uniform_(self.pos_emb, a=math.sqrt(5))
+
+        self.tokenizer = nn.Conv2d(
+            in_c, d_model, (ph, pw), (ph, pw), bias=False)
+        self.encoder = nn.ModuleList([
+            ImageEncoderLayer(d_model, *args, **kwargs)
+            for _ in range(num_layers)
+        ])
+
+    def forward(self, x: torch.Tensor, cls_token: torch.Tensor = None) -> torch.Tensor:
+        """
+        Forward pass of the ImageEncoder.
+        """
+        x = self.tokenizer(x)
+        x = x.flatten(2).transpose(1, 2)
+        x = x + self.pos_emb.expand(x.size(0), -1, -1)
+        if cls_token is None:
+            cls_token = self.cls_token.expand(x.size(0), -1, -1)
+        x = torch.cat((cls_token, x), dim=1)
+        att_weights = []
+        for layer in self.encoder:
+            x, _att_weights = layer(x)
+            att_weights.append(_att_weights)
+        cls_token, hidden = torch.split(x, (1, x.size(1)-1), dim=1)
+        return cls_token.squeeze(1), hidden, att_weights
diff --git a/chameleon/transformers/efficientformer.py b/chameleon/transformers/efficientformer.py
new file mode 100644
index 0000000..2a80ff4
--- /dev/null
+++ b/chameleon/transformers/efficientformer.py
@@ -0,0 +1,176 @@
+from typing import List
+
+import torch
+import torch.nn as nn
+from transformers import EfficientFormerConfig, EfficientFormerModel
+
+from .utils import list_models_transformers
+
+__all__ = ['EfficientFormer']
+
+
+class EfficientFormer(nn.Module):
+
+    def __init__(
+        self,
+        depths: List[int] = [3, 2, 6, 4],
+        hidden_sizes: List[int] = [48, 96, 224, 448],
+        downsamples: List[bool] = [True, True, True, True],
+        dim: int = 448,
+        key_dim: int = 32,
+        attention_ratio: int = 4,
+        resolution: int = 7,
+        num_hidden_layers: int = 5,
+        num_attention_heads: int = 8,
+        mlp_expansion_ratio: int = 4,
+        hidden_dropout_prob: float = 0.0,
+        patch_size: int = 16,
+        num_channels: int = 3,
+        pool_size: int = 3,
+        downsample_patch_size: int = 3,
+        downsample_stride: int = 2,
+        downsample_pad: int = 1,
+        drop_path_rate: float = 0.0,
+        num_meta3d_blocks: int = 1,
+        distillation: bool = True,
+        use_layer_scale: bool = True,
+        layer_scale_init_value: float = 1e-5,
+        hidden_act: str = "gelu",
+        initializer_range: float = 0.02,
+        layer_norm_eps: float = 1e-12,
+        **kwargs,
+    ) -> None:
+        r"""
+        This is the configuration class to store the configuration of an
+        [`EfficientFormerModel`]. It is used to instantiate an EfficientFormer
+        model according to the specified arguments, defining the model architecture.
+
+        Instantiating a configuration with the defaults will yield a similar
+        configuration to that of the EfficientFormer [snap-research/efficientformer-l1]
+        (https://huggingface.co/snap-research/efficientformer-l1) architecture.
+
+        Configuration objects inherit from [`PretrainedConfig`] and can be used
+        to control the model outputs. Read the documentation from [`PretrainedConfig`]
+        for more information.
+
+        Args:
+            depths (`List(int)`, *optional*, defaults to `[3, 2, 6, 4]`)
+                Depth of each stage.
+            hidden_sizes (`List(int)`, *optional*, defaults to `[48, 96, 224, 448]`)
+                Dimensionality of each stage.
+            downsamples (`List(bool)`, *optional*, defaults to `[True, True, True, True]`)
+                Whether or not to downsample inputs between two stages.
+            dim (`int`, *optional*, defaults to 448):
+                Number of channels in Meta3D layers
+            key_dim (`int`, *optional*, defaults to 32):
+                The size of the key in meta3D block.
+            attention_ratio (`int`, *optional*, defaults to 4):
+                Ratio of the dimension of the query and value to the dimension
+                of the key in MSHA block
+            resolution (`int`, *optional*, defaults to 5)
+                Size of each patch
+            num_hidden_layers (`int`, *optional*, defaults to 5):
+                Number of hidden layers in the Transformer encoder.
+            num_attention_heads (`int`, *optional*, defaults to 8):
+                Number of attention heads for each attention layer in the 3D
+                MetaBlock.
+            mlp_expansion_ratio (`int`, *optional*, defaults to 4):
+                Ratio of size of the hidden dimensionality of an MLP to the
+                dimensionality of its input.
+            hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
+                The dropout probability for all fully connected layers in the
+                embeddings and encoder.
+            patch_size (`int`, *optional*, defaults to 16):
+                The size (resolution) of each patch.
+            num_channels (`int`, *optional*, defaults to 3):
+                The number of input channels.
+            pool_size (`int`, *optional*, defaults to 3):
+                Kernel size of pooling layers.
+            downsample_patch_size (`int`, *optional*, defaults to 3):
+                The size of patches in downsampling layers.
+            downsample_stride (`int`, *optional*, defaults to 2):
+                The stride of convolution kernels in downsampling layers.
+            downsample_pad (`int`, *optional*, defaults to 1):
+                Padding in downsampling layers.
+            drop_path_rate (`int`, *optional*, defaults to 0):
+                Rate at which to increase dropout probability in DropPath.
+            num_meta3d_blocks (`int`, *optional*, defaults to 1):
+                The number of 3D MetaBlocks in the last stage.
+            distillation (`bool`, *optional*, defaults to `True`):
+                Whether to add a distillation head.
+            use_layer_scale (`bool`, *optional*, defaults to `True`):
+                Whether to scale outputs from token mixers.
+            layer_scale_init_value (`float`, *optional*, defaults to 1e-5):
+                Factor by which outputs from token mixers are scaled.
+            hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
+                The non-linear activation function (function or string) in the
+                encoder and pooler. If string, `"gelu"`, `"relu"`, `"selu"` and
+                `"gelu_new"` are supported.
+            initializer_range (`float`, *optional*, defaults to 0.02):
+                The standard deviation of the truncated_normal_initializer for
+                initializing all weight matrices.
+            layer_norm_eps (`float`, *optional*, defaults to 1e-12):
+                The epsilon used by the layer normalization layers.
+        """
+        super().__init__()
+        self.config = EfficientFormerConfig(
+            depths=depths,
+            hidden_sizes=hidden_sizes,
+            downsamples=downsamples,
+            dim=dim,
+            key_dim=key_dim,
+            attention_ratio=attention_ratio,
+            resolution=resolution,
+            num_hidden_layers=num_hidden_layers,
+            num_attention_heads=num_attention_heads,
+            mlp_expansion_ratio=mlp_expansion_ratio,
+            hidden_dropout_prob=hidden_dropout_prob,
+            patch_size=patch_size,
+            num_channels=num_channels,
+            pool_size=pool_size,
+            downsample_patch_size=downsample_patch_size,
+            downsample_stride=downsample_stride,
+            downsample_pad=downsample_pad,
+            drop_path_rate=drop_path_rate,
+            num_meta3d_blocks=num_meta3d_blocks,
+            distillation=distillation,
+            use_layer_scale=use_layer_scale,
+            layer_scale_init_value=layer_scale_init_value,
+            hidden_act=hidden_act,
+            initializer_range=initializer_range,
+            layer_norm_eps=layer_norm_eps,
+            **kwargs,
+        )
+        model = EfficientFormerModel(self.config)
+        self.model = self._model_clip(model)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        _, all_hidden_state = self.model(x, output_hidden_states=True, return_dict=False)
+        all_hidden_state = [all_hidden_state[i] for i in [1, 3, 5, 6]]
+        return all_hidden_state
+
+    @staticmethod
+    def list_models(author='snap-research', search='efficientformer') -> List[str]:
+        return list_models_transformers(author=author, search=search)
+
+    @classmethod
+    def from_pretrained(cls, name, **kwargs) -> 'EfficientFormer':
+        model = cls(**kwargs)
+        _model = EfficientFormerModel.from_pretrained(name, **kwargs)
+        model.model = cls._model_clip(_model)
+        return model
+
+    @staticmethod
+    def _model_clip(m) -> None:
+
+        class _Identity(nn.Module):
+            def __init__(self):
+                super().__init__()
+            def forward(self, x, **kwargs):
+                return x
+
+        m.flat = nn.Identity()
+        m.meta3D_layers = nn.Identity()
+        m.layernorm = nn.Identity()
+        m.encoder.last_stage = _Identity()
+        return m
diff --git a/chameleon/transformers/metaformer.py b/chameleon/transformers/metaformer.py
new file mode 100644
index 0000000..1d03751
--- /dev/null
+++ b/chameleon/transformers/metaformer.py
@@ -0,0 +1,361 @@
+from typing import List, Optional, Union
+from warnings import warn
+
+import torch
+import torch.nn as nn
+
+from ..nn import CNN2Dcell, LayerNorm2d, PowerModule, StarReLU
+from .token_mixer import (AttentionMixing, PoolMixing, RandomMixing,
+                          SepConvMixing)
+
+__all__ = ['MetaFormer', 'MetaFormerBlock', 'MlpBlock']
+
+
+MODEL_SETTINGS = {
+    'poolformer_v2_tiny': {
+        'depths': [1, 1, 3, 1],
+        'hidden_sizes': [16, 32, 96, 128],
+        'token_mixers': 'PoolMixing',
+        'mlp_forwards': {'name': 'MlpBlock', 'expand_ratio': 1.5}
+    },
+    'poolformer_v2_small': {
+        'depths': [2, 2, 4, 2],
+        'hidden_sizes': [32, 64, 128, 256],
+        'token_mixers': 'PoolMixing',
+        'mlp_forwards': {'name': 'MlpBlock', 'expand_ratio': 1.5}
+    },
+    'poolformer_v2_s12': {
+        'depths': [2, 2, 6, 2],
+        'hidden_sizes': [64, 128, 320, 512],
+        'token_mixers': 'PoolMixing',
+    },
+    'poolformer_v2_s24': {
+        'depths': [4, 4, 12, 4],
+        'hidden_sizes': [64, 128, 320, 512],
+        'token_mixers': 'PoolMixing',
+    },
+    'poolformer_v2_s36': {
+        'depths': [6, 6, 18, 6],
+        'hidden_sizes': [64, 128, 320, 512],
+        'token_mixers': 'PoolMixing',
+    },
+    'poolformer_v2_m36': {
+        'depths': [6, 6, 18, 6],
+        'hidden_sizes': [96, 192, 384, 768],
+        'token_mixers': 'PoolMixing',
+    },
+    'poolformer_v2_m48': {
+        'depths': [8, 8, 24, 8],
+        'hidden_sizes': [96, 192, 384, 768],
+        'token_mixers': 'PoolMixing',
+    },
+    'convformer_s18': {
+        'depths': [3, 3, 9, 3],
+        'hidden_sizes': [64, 128, 320, 512],
+        'token_mixers': 'SepConvMixing',
+    },
+    'convformer_s36': {
+        'depths': [3, 12, 18, 3],
+        'hidden_sizes': [64, 128, 320, 512],
+        'token_mixers': 'SepConvMixing',
+    },
+    'convformer_m36': {
+        'depths': [3, 12, 18, 3],
+        'hidden_sizes': [96, 192, 384, 576],
+        'token_mixers': 'SepConvMixing',
+    },
+    'convformer_b36': {
+        'depths': [3, 12, 18, 3],
+        'hidden_sizes': [128, 256, 512, 768],
+        'token_mixers': 'SepConvMixing',
+    },
+    'caformer_tiny': {
+        'depths': [1, 1, 2, 1],
+        'hidden_sizes': [16, 32, 64, 128],
+        'token_mixers': ['SepConvMixing', 'SepConvMixing', 'AttentionMixing', 'AttentionMixing'],
+        'mlp_forwards': {'name': 'MlpBlock', 'expand_ratio': 1.5}
+    },
+    'caformer_small': {
+        'depths': [1, 1, 4, 2],
+        'hidden_sizes': [16, 48, 128, 160],
+        'token_mixers': ['SepConvMixing', 'SepConvMixing', 'AttentionMixing', 'AttentionMixing'],
+        'mlp_forwards': {'name': 'MlpBlock', 'expand_ratio': 1.5}
+    },
+    'caformer_s18': {
+        'depths': [3, 3, 9, 3],
+        'hidden_sizes': [64, 128, 320, 512],
+        'token_mixers': ['SepConvMixing', 'SepConvMixing', 'AttentionMixing', 'AttentionMixing'],
+    },
+    'caformer_s36': {
+        'depths': [3, 12, 18, 3],
+        'hidden_sizes': [64, 128, 320, 512],
+        'token_mixers': ['SepConvMixing', 'SepConvMixing', 'AttentionMixing', 'AttentionMixing'],
+    },
+    'caformer_m36': {
+        'depths': [3, 12, 18, 3],
+        'hidden_sizes': [96, 192, 384, 576],
+        'token_mixers': ['SepConvMixing', 'SepConvMixing', 'AttentionMixing', 'AttentionMixing'],
+    },
+    'caformer_b36': {
+        'depths': [3, 12, 18, 3],
+        'hidden_sizes': [128, 256, 512, 768],
+        'token_mixers': ['SepConvMixing', 'SepConvMixing', 'AttentionMixing', 'AttentionMixing'],
+    },
+}
+
+
+def build_token_mixer(name, **options) -> Union[nn.Module, None]:
+    cls = globals().get(name, None)
+    if cls is None:
+        raise ValueError(f'Token mixer named {name} is not supported.')
+    return cls(**options)
+
+
+def build_mlps_forward(name, **options) -> Union[nn.Module, None]:
+    cls = globals().get(name, None)
+    if cls is None:
+        raise ValueError(f'MLP forward named {name} is not supported.')
+    return cls(**options)
+
+
+class MlpBlock(nn.Module):
+
+    def __init__(
+            self,
+            in_features: int,
+            out_features: int,
+            expand_ratio: float = 4
+        ) -> None:
+        """
+        MLP as used in MetaFormer models baslines and related networks.
+
+        Args:
+            in_features:
+                The number of input features.
+            out_features:
+                The number of output features.
+            expand_ratio:
+                The multiplier applied to the number of input features to obtain
+                the number of hidden features. Defaults to 4.
+        """
+        super().__init__()
+        hidden_features = int(expand_ratio * in_features)
+        self.fc1_block = nn.Conv2d(in_features, hidden_features, 1)
+        self.fc2_block = nn.Conv2d(hidden_features, out_features, 1)
+        self.act = StarReLU()
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.fc1_block(x)
+        x = self.act(x)
+        x = self.fc2_block(x)
+        return x
+
+
+class MetaFormerBlock(nn.Module):
+
+    def __init__(
+        self,
+        in_features: int,
+        token_mixer: Union[str, dict] = None,
+        mlp_forward: Union[str, dict] = None,
+    ) -> None:
+        """
+        A single block of the MetaFormer model, consisting of a weighted sum of
+        a token mixing module and an MLP.
+
+        Args:
+            in_features (int):
+                The number of input features.
+            token_mixer (Union[dict, nn.Module], optional):
+                The token mixing module to use in the block. Can be either an
+                nn.Module instance or a dictionary specifying the token mixing
+                module to build using the `build_token_mixer` function.
+                Defaults to None.
+            mlp_forward (Union[dict, nn.Module], optional):
+                The MLP module to use in the block. Can be either an nn.Module
+                instance or a dictionary specifying the MLP module to build using
+                the `build_mlps_forward` function.
+                Defaults to None.
+        """
+
+        super().__init__()
+        self.in_features = in_features
+        self.token_mixer = self._build_token_mixers(token_mixer)
+        self.mlp_forward = self._build_mlp_forwars(mlp_forward)
+        self.norm_mixer = LayerNorm2d(in_features)
+        self.norm_mlp = LayerNorm2d(in_features)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = x + self.token_mixer(self.norm_mixer(x))
+        x = x + self.mlp_forward(self.norm_mlp(x))
+        return x
+
+    def _build_mlp_forwars(self, param: Union[str, dict]) -> nn.Module:
+        if param is None:
+            return nn.Identity()
+        if isinstance(param, str):
+            if param == 'Identity':
+                return nn.Identity()
+            elif param == 'MlpBlock':
+                return MlpBlock(
+                    in_features=self.in_features,
+                    out_features=self.in_features,
+                )
+            else:
+                raise ValueError(f'Unsupport mlp_forwards settings: {param}')
+        elif isinstance(param, dict):
+            if param['name'] in ['MlpBlock']:
+                param.update({
+                    'in_features': self.in_features,
+                    'out_features': self.in_features
+                })
+            return build_mlps_forward(**param)
+
+    def _build_token_mixers(self, param: Union[str, dict]) -> nn.Module:
+        if param is None:
+            return nn.Identity()
+        if isinstance(param, str):
+            if param == 'AttentionMixing':
+                return AttentionMixing(self.in_features)
+            elif param == 'SepConvMixing':
+                return SepConvMixing(self.in_features)
+            elif param == 'PoolMixing':
+                return PoolMixing()
+            elif param == 'RandomMixing':
+                warn(
+                    'Do not use RandomMixing in MetaFormer by pass string name,'
+                    'to token_mixer, use `token_mixer={"name": "RandomMixing", "num_tokens": N}` instead.'
+                    'Set token_mixer to nn.Identity() instead.'
+                )
+                return nn.Identity()
+            elif param == 'Identity':
+                return nn.Identity()
+            else:
+                raise ValueError(f'Unsupport token mixer settings: {param}')
+        elif isinstance(param, dict):
+            if param['name'] in ['AttentionMixing', 'SepConvMixing']:
+                param.update({'in_features': self.in_features})
+            return build_token_mixer(**param)
+
+class MetaFormer(PowerModule):
+
+    def __init__(
+        self,
+        num_channels: int = 3,
+        depths: List[int] = [2, 2, 6, 2],
+        hidden_sizes: List[int] = [64, 128, 320, 512],
+        patch_sizes: List[int] = [7, 3, 3, 3],
+        strides: List[int] = [4, 2, 2, 2],
+        padding: List[int] = [2, 1, 1, 1],
+        token_mixers: Union[dict, str, List[Union[dict, str]]] = 'PoolMixing',
+        mlp_forwards: Union[dict, str, List[Union[dict, str]]] = 'MlpBlock',
+        out_indices: Optional[List[int]] = None,
+    ) -> None:
+        """
+        Initializes the MetaFormer model.
+
+        Args:
+            num_channels (int, optional):
+                The number of channels in the input image. Defaults to 3.
+            depths (List[int], optional):
+                The number of blocks in each stage of the MetaFormer.
+                Defaults to [2, 2, 6, 2].
+            hidden_sizes (List[int], optional):
+                The number of channels in each stage of the MetaFormer.
+                Defaults to [64, 128, 320, 512].
+            patch_sizes (List[int], optional):
+                The patch size used in each stage of the MetaFormer.
+                Defaults to [7, 3, 3, 3].
+            strides (List[int], optional):
+                The stride used in each stage of the MetaFormer.
+                Defaults to [4, 2, 2, 2].
+            padding (List[int], optional):
+                The padding used in each stage of the MetaFormer.
+                Defaults to [2, 1, 1, 1].
+            token_mixers (Union[dict, str, List[Union[dict, str]]], optional):
+                The token mixing modules used in the model.
+                Defaults to 'PoolMixing'.
+            mlp_forwards (Union[dict, str, List[Union[dict, str]]], optional):
+                The MLP modules used in the model.
+                Defaults to 'MlpBlock'.
+            out_indices (Optional[List[int]], optional):
+                The indices of the output feature maps.
+                Defaults to None.
+        """
+        super().__init__()
+
+        if not isinstance(depths, (list, tuple)):
+            raise ValueError('depths must be either list or tuple.')
+
+        if not isinstance(hidden_sizes, (list, tuple)):
+            raise ValueError('hidden_sizes must be either list or tuple.')
+
+        self.num_stage = len(depths)
+
+        self.downsamples = nn.ModuleList([
+            nn.Sequential(
+                LayerNorm2d(hidden_sizes[i - 1]) if i > 0 else nn.Identity(),
+                nn.Conv2d(
+                    in_channels=num_channels if i == 0 else hidden_sizes[i-1],
+                    out_channels=hidden_sizes[i],
+                    kernel_size=ksize,
+                    stride=s,
+                    padding=p
+                ),
+                LayerNorm2d(hidden_sizes[i]) if i == 0 else nn.Identity(),
+            ) for i, (ksize, s, p) in enumerate(zip(patch_sizes, strides, padding))
+        ])
+
+        token_mixers = [token_mixers] * self.num_stage \
+            if not isinstance(token_mixers, (list, tuple)) else token_mixers
+        mlp_forwards = [mlp_forwards] * self.num_stage \
+            if not isinstance(mlp_forwards, (list, tuple)) else mlp_forwards
+
+        self.stages = nn.ModuleList([
+            nn.Sequential(*[
+                MetaFormerBlock(
+                    in_features=hidden_sizes[i],
+                    token_mixer=token_mixers[i],
+                    mlp_forward=mlp_forwards[i],
+                )
+                for _ in range(depth)
+            ])
+            for i, depth in enumerate(depths)
+        ])
+
+        self.out_indices = out_indices
+        self.initialize_weights_()
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        outs = []
+        for i in range(self.num_stage):
+            x = self.downsamples[i](x)
+            x = self.stages[i](x)
+            outs.append(x)
+
+        if self.out_indices is not None:
+            outs = [outs[i] for i in self.out_indices]
+
+        return outs
+
+    @classmethod
+    def from_pretrained(cls, name: str, **kwargs) -> 'MetaFormer':
+        """
+        Initializes the MetaFormer model from the pretrained model.
+
+        Args:
+            model_name (str):
+                The name of the pretrained model.
+            **kwargs:
+                The other arguments of the model.
+
+        Returns:
+            MetaFormer:
+                The MetaFormer model.
+        """
+        if name not in MODEL_SETTINGS:
+            raise ValueError(f'Unsupport model name: {name}')
+
+        model_settings = MODEL_SETTINGS[name]
+        model_settings.update(kwargs)
+        return cls(**model_settings)
diff --git a/chameleon/transformers/mobilevit.py b/chameleon/transformers/mobilevit.py
new file mode 100644
index 0000000..c429655
--- /dev/null
+++ b/chameleon/transformers/mobilevit.py
@@ -0,0 +1,146 @@
+from typing import List
+
+import torch
+import torch.nn as nn
+from transformers import MobileViTConfig, MobileViTModel
+
+from .utils import list_models_transformers
+
+__all__ = ['MobileViT']
+
+
+class MobileViT(nn.Module):
+
+    def __init__(
+        self,
+        num_channels: int = 3,
+        image_size: int = 256,
+        patch_size: int = 2,
+        hidden_sizes: List[int] = [144, 192, 240],
+        neck_hidden_sizes: List[int] = [16, 32, 64, 96, 128, 160, 640],
+        num_attention_heads: int = 4,
+        mlp_ratio: float = 2.0,
+        expand_ratio: float = 4.0,
+        hidden_act: str = "relu",
+        conv_kernel_size: int = 3,
+        output_stride: int = 32,
+        hidden_dropout_prob: float = 0.1,
+        attention_probs_dropout_prob: float = 0.0,
+        classifier_dropout_prob: float = 0.1,
+        initializer_range: float = 0.02,
+        layer_norm_eps: float = 1e-5,
+        qkv_bias: bool = True,
+        aspp_out_channels: int = 256,
+        atrous_rates: List[int] = [6, 12, 18],
+        aspp_dropout_prob: float = 0.1,
+        semantic_loss_ignore_index: int = 255,
+        **kwargs,
+    ) -> None:
+        """
+        This is the configuration of a `MobileViTModel`. It is used to instantiate
+        a MobileViT model according to the specified arguments, defining the model
+        architecture. Instantiating a configuration with the defaults will yield
+        a similar configuration to that of the MobileViT architecture.
+
+        [apple/mobilevit-small](https://huggingface.co/apple/mobilevit-small)
+
+        Args:
+            num_channels (int, optional):
+                The number of input channels. Defaults to 3.
+            image_size (int, optional):
+                The size (resolution) of each image. Defaults to 256.
+            patch_size (int, optional):
+                The size (resolution) of each patch. Defaults to 2.
+            hidden_sizes (List[int], optional):
+                Dimensionality (hidden size) of the Transformer encoders at each
+                stage. Defaults to [144, 192, 240]
+            neck_hidden_sizes (List[int], optional):
+                The number of channels for the feature maps of the backbone.
+                Defaults to [16, 32, 64, 96, 128, 160, 640]
+            num_attention_heads (int, optional):
+                Number of attention heads for each attention layer in the
+                Transformer encoder. Defaults to 4
+            mlp_ratio (float, optional):
+                The ratio of the number of channels in the output of the MLP to
+                the number of channels in the input. Defaults to 2.0
+            expand_ratio (float, optional):
+                Expansion factor for the MobileNetv2 layers. Defaults to 4.0.
+            hidden_act (str or function, optional):
+                The non-linear activation function (function or string) in the
+                Transformer encoder and convolution layers. Defaults to "relu".
+            conv_kernel_size (int, optional):
+                The size of the convolutional kernel in the MobileViT layer.
+                Defaults to 3.
+            output_stride (int, optional):
+                The ratio of the spatial resolution of the output to the
+                resolution of the input image. Defaults to 32.
+            hidden_dropout_prob (float, optional):
+                The dropout probabilitiy for all fully connected layers in the
+                Transformer encoder. Defaults to 0.1.
+            attention_probs_dropout_prob (float, optional):
+                The dropout ratio for the attention probabilities. Defaults to 0.0
+            classifier_dropout_prob (float, optional):
+                The dropout ratio for attached classifiers. Defaults to 0.1.
+            initializer_range (float, optional):
+                The standard deviation of the truncated_normal_initializer for
+                initializing all weight matrices. Defaults to 0.02.
+            layer_norm_eps (float, optional):
+                The epsilon used by the layer normalization layers.
+                Defaults to 1e-5.
+            qkv_bias (bool, optional):
+                Whether to add a bias to the queries, keys and values.
+                Defaults to True.
+            aspp_out_channels (int, optional):
+                Number of output channels used in the ASPP layer for semantic
+                segmentation. Defaults to 256.
+            atrous_rates (List[int], optional):
+                Dilation (atrous) factors used in the ASPP layer for semantic
+                segmentation. Defaults to [6, 12, 18].
+            aspp_dropout_prob (float, optional):
+                The dropout ratio for the ASPP layer for semantic segmentation.
+                Defaults to 0.1.
+            semantic_loss_ignore_index (int, optional):
+                The index that is ignored by the loss function of the semantic
+                segmentation model. Defaults to 255.
+        """
+        super().__init__()
+        self.config = MobileViTConfig(
+            num_channels=num_channels,
+            image_size=image_size,
+            patch_size=patch_size,
+            hidden_sizes=hidden_sizes,
+            neck_hidden_sizes=neck_hidden_sizes,
+            num_attention_heads=num_attention_heads,
+            mlp_ratio=mlp_ratio,
+            expand_ratio=expand_ratio,
+            hidden_act=hidden_act,
+            conv_kernel_size=conv_kernel_size,
+            output_stride=output_stride,
+            hidden_dropout_prob=hidden_dropout_prob,
+            attention_probs_dropout_prob=attention_probs_dropout_prob,
+            classifier_dropout_prob=classifier_dropout_prob,
+            initializer_range=initializer_range,
+            layer_norm_eps=layer_norm_eps,
+            qkv_bias=qkv_bias,
+            aspp_out_channels=aspp_out_channels,
+            atrous_rates=atrous_rates,
+            aspp_dropout_prob=aspp_dropout_prob,
+            semantic_loss_ignore_index=semantic_loss_ignore_index,
+            **kwargs,
+        )
+        self.model = MobileViTModel(self.config, expand_output=False)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        _, all_hidden_state = self.model(x, output_hidden_states=True, return_dict=False)
+        return all_hidden_state
+
+    @staticmethod
+    def list_models(author='apple', search='mobilevit') -> List[str]:
+        return list_models_transformers(author=author, search=search)
+
+    @classmethod
+    def from_pretrained(cls, name, **kwargs) -> 'MobileViT':
+        model = cls(**kwargs)
+        kwargs.update({'expand_output': False})
+        model.model = MobileViTModel.from_pretrained(name, **kwargs)
+        return model
diff --git a/chameleon/transformers/poolformer.py b/chameleon/transformers/poolformer.py
new file mode 100644
index 0000000..ec9b321
--- /dev/null
+++ b/chameleon/transformers/poolformer.py
@@ -0,0 +1,148 @@
+from typing import List
+
+import torch
+import torch.nn as nn
+from transformers import PoolFormerConfig, PoolFormerModel
+
+from .utils import list_models_transformers
+
+__all__ = ["PoolFormer"]
+
+
+class PoolFormer(nn.Module):
+
+    def __init__(
+        self,
+        num_channels: int = 3,
+        patch_size: int = 16,
+        stride: int = 16,
+        pool_size: int = 3,
+        mlp_ratio: float = 4.0,
+        depths: List[int] = [2, 2, 6, 2],
+        hidden_sizes: List[int] = [64, 128, 320, 512],
+        patch_sizes: List[int] = [7, 3, 3, 3],
+        strides: List[int] = [4, 2, 2, 2],
+        padding: List[int] = [2, 1, 1, 1],
+        num_encoder_blocks: int = 4,
+        drop_path_rate: float = 0.0,
+        hidden_act: str = 'relu',
+        use_layer_scale: bool = True,
+        layer_scale_init_value: float = 1e-5,
+        initializer_range: float = 0.02,
+        **kwargs: dict,
+    ) -> None:
+        """
+        PoolFormer is a model that replaces attention token mixer in transfomrers
+        with extremely simple operator, pooling.
+
+        Transformers have shown great potential in computer vision tasks. A common
+        belief is their attention-based token mixer module contributes most to
+        their competence. However, recent works show the attention-based module
+        in transformers can be replaced by spatial MLPs and the resulted models
+        still perform quite well. Based on this observation, we hypothesize that
+        the general architecture of the transformers, instead of the specific
+        token mixer module, is more essential to the model's performance.
+
+        To verify this, we deliberately replace the attention module in transformers
+        with an embarrassingly simple spatial pooling operator to conduct only
+        the most basic token mixing. Surprisingly, we observe that the derived
+        model, termed as PoolFormer, achieves competitive performance on multiple
+        computer vision tasks. For example, on ImageNet-1K, PoolFormer achieves
+        82.1% top-1 accuracy, surpassing well-tuned vision transformer/MLP-like
+        baselines DeiT-B/ResMLP-B24 by 0.3%/1.1% accuracy with 35%/52% fewer
+        parameters and 48%/60% fewer MACs. The effectiveness of PoolFormer
+        verifies our hypothesis and urges us to initiate the concept of "MetaFormer",
+        a general architecture abstracted from transformers without specifying
+        the token mixer. Based on the extensive experiments, we argue that
+        MetaFormer is the key player in achieving superior results for recent
+        transformer and MLP-like models on vision tasks.
+
+        This work calls for more future research dedicated to improving MetaFormer
+        instead of focusing on the token mixer modules. Additionally, our proposed
+        PoolFormer could serve as a starting baseline for future MetaFormer
+        architecture design.
+
+        Args:
+            num_channels (int, optional):
+                The number of channels in the input data. Defaults to 3.
+            patch_size (int, optional):
+                The size of the patches extracted from the input data.
+                Defaults to 16.
+            stride (int, optional):
+                The stride of the convolutional layer used to extract patches
+                from the input data. Defaults to 16.
+            pool_size (int, optional):
+                The size of the pooling kernel used in the PoolFormer encoder
+                layers. Defaults to 3.
+            mlp_ratio (float, optional):
+                The ratio of the hidden size in the feedforward layer of the
+                PoolFormer encoder to the input size. Defaults to 4.0.
+            depths (List[int], optional):
+                The number of blocks in each stage of the PoolFormer encoder.
+                Defaults to [2, 2, 6, 2].
+            hidden_sizes (List[int], optional):
+                The size of the hidden layer in each block of the PoolFormer
+                encoder. Defaults to [64, 128, 320, 512].
+            patch_sizes (List[int], optional):
+                The size of the convolutional kernel in each block of the
+                PoolFormer encoder. Defaults to [7, 3, 3, 3].
+            strides (List[int], optional):
+                The stride of the convolutional layer in each block of the
+                PoolFormer encoder. Defaults to [4, 2, 2, 2].
+            padding (List[int], optional):
+                The padding size of the convolutional layer in each block of the
+                PoolFormer encoder. Defaults to [2, 1, 1, 1].
+            num_encoder_blocks (int, optional):
+                The number of encoder blocks in the PoolFormer encoder.
+                Defaults to 4.
+            drop_path_rate (float, optional):
+                The drop path rate used in the PoolFormer encoder.
+                Defaults to 0.0.
+            hidden_act (str, optional):
+                The activation function used in the PoolFormer encoder.
+                Defaults to "relu".
+            use_layer_scale (bool, optional):
+                Whether to use layer scaling in the PoolFormer encoder.
+                Defaults to True.
+            layer_scale_init_value (float, optional):
+                The initial value of the layer scale in the PoolFormer encoder.
+                Defaults to 1e-5.
+            initializer_range (float, optional):
+                The range of the uniform distribution used to initialize the
+                weights in the PoolFormer encoder. Defaults to 0.02.
+        """
+        super().__init__()
+        self.config = PoolFormerConfig(
+            num_channels=num_channels,
+            patch_size=patch_size,
+            stride=stride,
+            pool_size=pool_size,
+            mlp_ratio=mlp_ratio,
+            depths=depths,
+            hidden_sizes=hidden_sizes,
+            patch_sizes=patch_sizes,
+            strides=strides,
+            padding=padding,
+            num_encoder_blocks=num_encoder_blocks,
+            drop_path_rate=drop_path_rate,
+            hidden_act=hidden_act,
+            use_layer_scale=use_layer_scale,
+            layer_scale_init_value=layer_scale_init_value,
+            initializer_range=initializer_range,
+            **kwargs,
+        )
+        self.model = PoolFormerModel(self.config)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        *_, all_hidden_state = self.model(x, output_hidden_states=True, return_dict=False)
+        return all_hidden_state
+
+    @staticmethod
+    def list_models(author='sail', search='poolformer') -> List[str]:
+        return list_models_transformers(author=author, search=search)
+
+    @classmethod
+    def from_pretrained(cls, name, **kwargs) -> 'PoolFormer':
+        model = cls(**kwargs)
+        model.model = PoolFormerModel.from_pretrained(name, **kwargs)
+        return model
diff --git a/chameleon/transformers/token_mixer.py b/chameleon/transformers/token_mixer.py
new file mode 100644
index 0000000..744a89a
--- /dev/null
+++ b/chameleon/transformers/token_mixer.py
@@ -0,0 +1,327 @@
+from typing import Tuple, Union
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from ..nn.components import LayerNorm2d, build_activation
+from ..nn.mbcnn import MBCNNcell
+
+__all__ = [
+    'Attention', 'AttentionMixing', 'RandomMixing', 'SepConvMixing',
+    'PoolMixing', 'SelfAttention',
+]
+
+
+class SelfAttention(nn.Module):
+
+    def __init__(
+        self,
+        embed_dim: int,
+        num_heads: int = 8,
+        dropout: float = 0.,
+        bias: bool = True,
+    ) -> None:
+        """
+        Initialize the multi-head attention mechanism.
+
+        Args:
+            embed_dim (int):
+                Dimensionality of the input and output feature vectors.
+            num_heads (int, optional):
+                Number of attention heads, defaults to 8.
+            dropout (float, optional):
+                Dropout rate, defaults to 0.
+            bias (bool, optional):
+                Whether to include bias in the projection layers, defaults to True.
+        """
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.num_heads = num_heads
+        self.dropout = dropout
+        self.head_dim = embed_dim // num_heads
+        assert self.head_dim * \
+            num_heads == self.embed_dim, "embed_dim must be divisible by num_heads."
+        self.in_proj_query = nn.Linear(embed_dim, embed_dim, bias=bias)
+        self.in_proj_key = nn.Linear(embed_dim, embed_dim, bias=bias)
+        self.in_proj_value = nn.Linear(embed_dim, embed_dim, bias=bias)
+        self.out_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
+        self.dropout_layer = nn.Dropout(dropout)
+
+    def forward(self, query, key, value, attn_mask=None, key_padding_mask=None):
+        """
+        Forward pass for the multi-head attention mechanism.
+
+        Args:
+            query (Tensor):
+                Query tensor of shape (batch_size, seq_len, embed_dim).
+            key (Tensor):
+                Key tensor of shape (batch_size, seq_len, embed_dim).
+            value (Tensor):
+                Value tensor of shape (batch_size, seq_len, embed_dim).
+            attn_mask (Optional[Tensor]):
+                Mask to be added to attention scores before softmax.
+                Default: None.
+            key_padding_mask (Optional[Tensor]):
+                Mask indicating which elements in the key sequence should be ignored.
+                Default: None.
+        """
+        Q = self.in_proj_query(query)
+        K = self.in_proj_key(key)
+        V = self.in_proj_value(value)
+
+        # Split into multiple heads
+        Q = Q.view(Q.size(0), Q.size(1), self.num_heads,
+                   self.head_dim).transpose(1, 2)
+        K = K.view(K.size(0), K.size(1), self.num_heads,
+                   self.head_dim).transpose(1, 2)
+        V = V.view(V.size(0), V.size(1), self.num_heads,
+                   self.head_dim).transpose(1, 2)
+
+        # Scaled dot-product attention
+        attn_output_weights = torch.matmul(
+            Q, K.transpose(-2, -1)) / (self.head_dim ** 0.5)
+
+        # Apply the key padding mask
+        if key_padding_mask is not None:
+            attn_output_weights.masked_fill_(
+                key_padding_mask.unsqueeze(1).unsqueeze(2), float('-inf'))
+
+        if attn_mask is not None:
+            attn_output_weights += attn_mask
+        attn_output_weights = F.softmax(attn_output_weights, dim=-1)
+        attn_output_weights = self.dropout_layer(attn_output_weights)
+
+        # Get final output
+        attn_output = torch.matmul(attn_output_weights, V)
+        attn_output = attn_output.transpose(1, 2).contiguous().view(
+            attn_output.size(0), -1, self.embed_dim)
+
+        return self.out_proj(attn_output), attn_output_weights
+
+
+class Attention(nn.Module):
+
+    def __init__(
+        self,
+        in_features: int,
+        num_heads: int = 8,
+        qkv_bias: bool = True,
+        return_attn: bool = False,
+        add_output_layer: bool = True,
+        is_cross_attention: bool = False,
+    ) -> None:
+        """
+        Vanilla self-attention from Transformer: https://arxiv.org/abs/1706.03762.
+        Modified from timm.
+
+        Args:
+            dim (int):
+                Number of input channels.
+            head_dim (int, optional):
+                Dimensionality of the output of each head, defaults to 32.
+            num_heads (int, optional):
+                Number of attention heads, defaults to None (uses `dim` divided by `head_dim`).
+            qkv_bias (bool, optional):
+                Whether to include bias in the projection layers, defaults to False.
+            return_attn (bool, optional):
+                Whether to return the attention map, defaults to False.
+            add_output_layer (bool, optional):
+                Whether to add an output layer, defaults to True.
+            is_cross_attention (bool, optional):
+                Whether this is cross-attention, defaults to False.
+        """
+        super().__init__()
+        assert in_features % num_heads == 0, 'dim should be divisible by num_heads'
+        self.num_heads = num_heads
+        self.head_dim = in_features // num_heads
+        self.scale = self.head_dim ** -0.5
+        self.return_attn = return_attn
+        self.is_cross_attention = is_cross_attention
+
+        self.proj = nn.Linear(in_features, in_features) \
+            if add_output_layer else nn.Identity()
+
+        if self.is_cross_attention:
+            self.q = nn.Linear(in_features, in_features, bias=qkv_bias)
+            self.kv = nn.Linear(in_features, in_features * 2, bias=qkv_bias)
+        else:
+            self.qkv = nn.Linear(in_features, in_features * 3, bias=qkv_bias)
+
+    def forward(self, x: torch.Tensor, hidden_state: torch.Tensor = None) -> torch.Tensor:
+        """
+        Applies self-attention to the input tensor.
+
+        Args:
+            x:
+                Input tensor of shape (batch_size, seq_len, dim).
+            hidden_state:
+                Hidden state of the previous token, used for cross-attention.
+
+        Returns:
+            A tuple containing the output tensor of shape (batch_size, seq_len, dim) and
+            the attention tensor of shape (batch_size, num_heads, seq_len, seq_len).
+        """
+        B, N, C = x.shape
+
+        if self.is_cross_attention:
+            q = self.q(x)
+            kv = self.kv(hidden_state)
+            k, v = torch.chunk(kv, 2, dim=-1)
+        else:
+            qkv = self.qkv(x)
+            q, k, v = torch.chunk(qkv, 3, dim=-1)
+
+        q = q.view(B, N, self.num_heads, self.head_dim).transpose(1, 2)
+        k = k.view(B, -1, self.num_heads, self.head_dim).transpose(1, 2)
+        v = v.view(B, -1, self.num_heads, self.head_dim).transpose(1, 2)
+
+        q = q * self.scale
+        attn = q @ k.transpose(-2, -1)
+        attn = attn.softmax(dim=-1)
+
+        x = attn @ v
+        x = x.transpose(1, 2).reshape(B, N, C)
+        x = self.proj(x)
+
+        if self.return_attn:
+            return x, attn
+
+        return x
+
+
+class AttentionMixing(Attention):
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        Applies self-attention to the input tensor.
+
+        Args:
+            x (torch.Tensor):
+                Input tensor of shape (batch_size, in_channels, height, width).
+
+        Returns:
+            A tensor of the same shape as input after applying the self-attention.
+        """
+        B, C, H, W = x.shape
+        x = x.reshape(B, C, H * W).permute(0, 2, 1)
+        x = super().forward(x)
+        if self.return_attn:
+            x, attn = x
+        x = x.permute(0, 2, 1).reshape(B, C, H, W)
+        if self.return_attn:
+            return x, attn
+        return x
+
+
+class RandomMixing(nn.Module):
+
+    def __init__(self, num_tokens: int):
+        """ Random mixing of tokens.
+        Args:
+            num_tokens (int):
+                Number of tokens.
+        """
+        super().__init__()
+        self.random_matrix = nn.parameter.Parameter(
+            torch.softmax(torch.rand(num_tokens, num_tokens), dim=-1),
+            requires_grad=False)
+
+    def forward(self, x):
+        """
+        Applies random-attention to the input tensor.
+
+        Args:
+            x (torch.Tensor):
+                Input tensor of shape (batch_size, in_channels, height, width).
+
+        Returns:
+            A tensor of the same shape as input after applying the random-attention.
+        """
+        B, C, H, W = x.shape
+        x = x.reshape(B, C, H * W)
+        x = torch.einsum('mn, bcn -> bcm', self.random_matrix, x)
+        x = x.reshape(B, C, H, W)
+        return x
+
+
+class SepConvMixing(nn.Module):
+
+    def __init__(
+        self,
+        in_features: int,
+        expand_ratio: float = 2,
+        kernel_size: Union[int, Tuple[int, int]] = 7,
+        inner_act: Union[dict, nn.Module] = {'name': 'StarReLU'},
+    ) -> None:
+        """
+        SepConvMixing is an inverted separable convolution block from MobileNetV2.
+        It performs a depthwise convolution followed by a pointwise convolution.
+        Ref: https://arxiv.org/abs/1801.04381.
+
+        Args:
+            in_channels (int):
+                Number of input channels.
+            expand_ratio (float):
+                Expansion ratio of the hidden channels. Defaults to 2.
+            kernel_size (Union[int, Tuple[int, int]]):
+                Size of the depthwise convolution kernel. Defaults to 7.
+            inner_act (Union[dict, nn.Module]):
+                Activation function to be used internally. Defaults to StarReLU.
+        """
+        super().__init__()
+        hid_channels = int(in_features * expand_ratio)
+        self.mbcnn_v2 = MBCNNcell(
+            in_channels=in_features,
+            out_channels=in_features,
+            hid_channels=hid_channels,
+            kernel=kernel_size,
+            norm=LayerNorm2d(in_features),
+            inner_norm=LayerNorm2d(hid_channels),
+            inner_act=inner_act if isinstance(
+                inner_act, nn.Module) else build_activation(**inner_act),
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        Perform a SepConvMixing operation on the input tensor.
+
+        Args:
+            x (torch.Tensor):
+                Input tensor of shape (batch_size, in_channels, height, width).
+
+        Returns:
+            A tensor of the same shape as the input after applying mbcnn_v2 module.
+        """
+        return self.mbcnn_v2(x)
+
+
+class PoolMixing(nn.Module):
+
+    def __init__(self, pool_size: int = 3):
+        """
+        Implementation of pooling for PoolFormer: https://arxiv.org/abs/2111.11418
+
+        Args:
+            pool_size (int): Size of the pooling window.
+        """
+        super().__init__()
+        self.pool = nn.AvgPool2d(
+            pool_size,
+            stride=1,
+            padding=pool_size//2,
+            count_include_pad=False
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        Apply pooling and subtract the result from the input.
+
+        Args:
+            x (torch.Tensor):
+                Input tensor of shape (batch_size, in_channels, height, width).
+
+        Returns:
+            A tensor of the same shape as input after applying the pooling and subtraction.
+        """
+        return self.pool(x) - x
diff --git a/chameleon/transformers/utils.py b/chameleon/transformers/utils.py
new file mode 100644
index 0000000..21bf339
--- /dev/null
+++ b/chameleon/transformers/utils.py
@@ -0,0 +1,41 @@
+from typing import Tuple, Union
+
+from huggingface_hub import list_models
+
+__all__ = ['list_models_transformers', 'calculate_patch_size']
+
+
+def list_models_transformers(*args, **kwargs):
+    models = list(iter(list_models(*args, **kwargs)))
+    return [m.modelId for m in models]
+
+
+def calculate_patch_size(
+    image_size: Union[int, Tuple[int, int]],
+    num_patches: Union[int, Tuple[int, int]],
+) -> Tuple[int, int]:
+    '''
+    Calculate the number of patches that can fit into an image.
+
+    Args:
+        image_size (Union[int, Tuple[int, int]]): The size of the image.
+        num_patches (Union[int, Tuple[int, int]]): The number of the patch.
+
+    Returns:
+        Tuple[int, int]: The number of patches that can fit into the image.
+    '''
+    if isinstance(image_size, int):
+        image_size = (image_size, image_size)
+    if isinstance(num_patches, int):
+        num_patches = (num_patches, num_patches)
+    if image_size[0] % num_patches[0]:
+        raise ValueError(
+            f'`image_size` {image_size[0]} can not divided with `{num_patches[0]}`.')
+    if image_size[1] % num_patches[1]:
+        raise ValueError(
+            f'`image_size` {image_size[1]} can not divided with `{num_patches[1]}`.')
+    patch_size = (
+        image_size[0] // num_patches[0],
+        image_size[1] // num_patches[1]
+    )
+    return patch_size
diff --git a/chameleon/transformers/vit.py b/chameleon/transformers/vit.py
new file mode 100644
index 0000000..faef4c6
--- /dev/null
+++ b/chameleon/transformers/vit.py
@@ -0,0 +1,121 @@
+from typing import List, Tuple, Union
+
+import torch
+import torch.nn as nn
+from transformers import ViTConfig, ViTModel
+
+from .utils import list_models_transformers
+
+__all__ = ['ViT']
+
+
+class ViT(nn.Module):
+
+    def __init__(
+        self,
+        hidden_size: int = 768,
+        num_hidden_layers: int = 12,
+        num_attention_heads: int = 12,
+        intermediate_size: int = 3072,
+        hidden_act: str = 'relu',
+        hidden_dropout_prob: float = 0.0,
+        attention_probs_dropout_prob: float = 0.0,
+        initializer_range: float = 0.02,
+        layer_norm_eps: float = 1e-12,
+        image_size: Union[int, Tuple[int, int]] = 224,
+        patch_size: Union[int, Tuple[int, int]] = 16,
+        num_channels: int = 3,
+        qkv_bias: bool = True,
+        encoder_stride: int = 16,
+        **kwargs,
+    ) -> None:
+        """
+        ViT: Vision Transformer
+        A transformer model for image classification
+
+        Args:
+            hidden_size (int, optional):
+                Dimensionality of the encoder layers and the pooler layer.
+                Default is 768.
+            num_hidden_layers (int, optional):
+                Number of hidden layers in the Transformer encoder.
+                Default is 12.
+            num_attention_heads (int, optional):
+                Number of attention heads for each attention layer in the
+                Transformer encoder.
+                Default is 12.
+            intermediate_size (int, optional):
+                Dimensionality of the "intermediate" (i.e., feed-forward) layer
+                in the Transformer encoder.
+                Default is 3072.
+            hidden_act (str, optional):
+                The non-linear activation function (function or string) in the
+                encoder and pooler. If string, "gelu", "relu", "selu" and "gelu_new"
+                are supported.
+                Default is "relu".
+            hidden_dropout_prob (float, optional):
+                The dropout probability for all fully connected layers in the
+                embeddings, encoder, and pooler.
+                Default is 0.0.
+            attention_probs_dropout_prob (float, optional):
+                The dropout ratio for the attention probabilities.
+                Default is 0.0.
+            initializer_range (float, optional):
+                The standard deviation of the truncated_normal_initializer for
+                initializing all weight matrices.
+                Default is 0.02.
+            layer_norm_eps (float, optional):
+                The epsilon used by the layer normalization layers.
+                Default is 1e-12.
+            image_size (Union[int, Tuple[int, int]], optional):
+                The size (resolution) of each image.
+                Default is 224.
+            patch_size (Union[int, Tuple[int, int]], optional):
+                The size (resolution) of each patch.
+                Default is 16.
+            num_channels (int, optional):
+                The number of input channels.
+                Default is 3.
+            qkv_bias (bool, optional):
+                Whether to add a bias to the queries, keys and values.
+                Default is True.
+            encoder_stride (int, optional):
+                Factor to increase the spatial resolution by in the decoder head
+                for masked image modeling.
+                Default is 16.
+        """
+        super().__init__()
+        self.config = ViTConfig(
+            hidden_size=hidden_size,
+            num_hidden_layers=num_hidden_layers,
+            num_attention_heads=num_attention_heads,
+            intermediate_size=intermediate_size,
+            hidden_act=hidden_act,
+            hidden_dropout_prob=hidden_dropout_prob,
+            attention_probs_dropout_prob=attention_probs_dropout_prob,
+            initializer_range=initializer_range,
+            layer_norm_eps=layer_norm_eps,
+            image_size=image_size,
+            patch_size=patch_size,
+            num_channels=num_channels,
+            qkv_bias=qkv_bias,
+            encoder_stride=encoder_stride,
+            **kwargs,
+        )
+        self.model = ViTModel(self.config, add_pooling_layer=False)
+
+    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        hidden_state = self.model(x).last_hidden_state
+        cls_token, hidden_state = torch.split(hidden_state, [1, hidden_state.shape[1] - 1], dim=1)
+        return cls_token.squeeze(dim=1), hidden_state
+
+    @staticmethod
+    def list_models(author='google', search='vit') -> List[str]:
+        return list_models_transformers(author=author, search=search)
+
+    @classmethod
+    def from_pretrained(cls, name, **kwargs) -> 'ViT':
+        model = cls(**kwargs)
+        kwargs.update({'add_pooling_layer': False})
+        model.model = ViTModel.from_pretrained(name, **kwargs)
+        return model
diff --git a/docs/chameleon.md b/docs/chameleon.md
new file mode 100644
index 0000000..d75c605
--- /dev/null
+++ b/docs/chameleon.md
@@ -0,0 +1,247 @@
+# PyTorch
+
+本模塊主要包含與 PyTorch 有關的功能，例如神經網絡結構、優化器等相關功能。
+
+## 目錄
+
+- [Backbone](#backbone)
+  - [主要功能](#主要功能)
+  - [使用示例](#使用示例)
+  - [主要結構和組件](#主要結構和組件)
+- [EfficientDet](#efficientdet)
+  - [主要功能](#主要功能-1)
+  - [主要結構和組件](#主要結構和組件-1)
+  - [使用示例](#使用示例-1)
+- [Neck](#neck)
+  - [主要功能](#主要功能-2)
+  - [主要結構和組件](#主要結構和組件-2)
+  - [使用示例](#使用示例-2)
+- [NN (Neural Networks)](#nn-neural-networks)
+- [Optim](#optim)
+- [Transformers](#transformers)
+- [Tools](#tools)
+  - [主要功能](#主要功能-3)
+  - [主要結構和組件](#主要結構和組件-3)
+
+---
+
+## [Backbone](../chameleon/backbone/)
+
+- **目的**:
+  此模組的主要目的是提供基礎的網絡結構，如：ResNet, VGG 等，並透過 `timm` 庫來支持多種預訓練的模型。
+
+- **文件**:
+  - `__init__.py`
+
+### 主要功能
+
+1. **模型註冊**: 使用 `timm` 庫的 `list_models` 和 `create_model` 函數來列舉並創建多種支持的模型結構。這些模型將被註冊到 `BACKBONE` 字典中。
+
+2. **建立 Backbone**:
+
+   - `build_backbone` 函數允許使用者根據給定的名稱建立一個 Backbone 模型。
+   - 如果名稱不在支持的模型列表中，它會引發一個錯誤。
+   - 使用 `BACKBONE` 字典中的相應函數來實例化模型。
+
+3. **列出支持的 Backbones**:
+   - `list_backbones` 函數可以返回當前支持的所有模型的列表。
+   - 通過傳遞一個過濾器參數 `filter`，可以使用 Unix shell-style 的通配符來過濾和搜尋特定的模型。
+
+### 使用示例
+
+1. **建立 Backbone**:
+
+   ```python
+   model = build_backbone("resnet50")
+   ```
+
+2. **列出所有支持的 Backbone**:
+
+   ```python
+   all_backbones = list_backbones()
+   print(all_backbones)
+   ```
+
+3. **搜尋特定的 Backbone**:
+   ```python
+   resnet_models = list_backbones("resnet*")
+   print(resnet_models)
+   ```
+
+---
+
+## [EfficientDet](../chameleon/efficientdet/)
+
+- **目的**:
+  此模組是為了實現 EfficientDet 物件檢測模型，一種現代的，高效的目標檢測結構。
+
+- **文件**:
+  - `efficientdet.py`
+
+### 主要功能
+
+#### 1. **EfficientDet 定義**:
+
+- **類**: `EfficientDet`
+
+- **參數**:
+
+  - `compound_coef`: 組合縮放係數，決定模型的大小和復雜性。
+  - `pretrained`: 是否需要使用 ImageNet 上預訓練的模型。
+
+- **特點**:
+  1.  使用組合縮放 (Compound scaling) 策略，這意味著模型的深度、寬度和解析度都可以按照給定的 `compound_coef` 進行調整。
+  2.  依賴 `timm` 庫來建立 `efficientnet` 作為其 Backbone。
+  3.  使用 BiFPNs (Bidirectional Feature Pyramids) 作為其 Neck 部分，以融合不同尺度的特徵。BiFPNs 是 EfficientDet 架構的核心部分，用於進行特徵的上下採樣。
+
+### 主要結構和組件
+
+- **Backbone (`self.backbone`)**:
+  這部分是由 `efficientnet` 組成的，其深度由 `compound_coef` 決定。它的目的是從輸入圖像中提取基本的特徵。
+
+- **BiFPNs (`self.bifpn`)**:
+  這是 EfficientDet 的一個重要部分，它是一種特徵金字塔結構，可以融合不同層次的特徵。此結構可以進行多次重複（取決於 `compound_coef`），且每次重複都可以進行特徵的上下採樣。
+
+### 使用示例：
+
+```python
+# 建立一個使用預訓練的 EfficientDet 模型，其組合縮放係數為 2
+model = EfficientDet(compound_coef=2, pretrained=True)
+```
+
+---
+
+## [Neck](../chameleon/neck/)
+
+- **目的**:
+  此模組是為了實現不同的特徵金字塔網絡，用於目標檢測或語義分割任務中對特徵進行上下採樣和融合。
+
+- **文件**:
+  - `bifpn.py`: 包含 BiFPN 的定義
+  - `fpn.py`: 包含 FPN 的定義
+
+### 主要功能
+
+#### 1. **特徵金字塔網絡 (Feature Pyramid Network, FPN)**:
+
+- **類**: `FPN` 和 `FPNs`
+
+  特徵金字塔網絡是一種用於目標檢測和語義分割的模型，它可以同時生成多尺度的特徵表示。通常由底層詳細特徵和上層語義強特徵組成。
+
+#### 2. **雙向特徵金字塔網絡 (Bidirectional Feature Pyramid Network, BiFPN)**:
+
+- **類**: `BiFPN` 和 `BiFPNs`
+
+  BiFPN 是 FPN 的一個變種，它允許特徵在金字塔的各個層之間進行上下採樣，從而得到更豐富的特徵表示。它是 EfficientDet 架構的核心部分。
+
+### 主要結構和組件
+
+- **Neck 結構字典 (`NECK`)**:
+
+  提供了一個查找表，以從名稱創建特定的金字塔網絡結構。
+
+- **建立和列出功能**:
+
+  - `build_neck()`: 根據提供的名稱和參數創建特定的特徵金字塔網絡。
+  - `list_necks()`: 列出支持的所有金字塔網絡或根據特定過濾器列出模型。
+
+### 使用示例
+
+```python
+# 建立一個 BiFPN 網絡
+bifpn = build_neck('bifpn', in_channels_list=[64, 128, 256], out_channels=256)
+
+# 列出所有支持的特徵金字塔網絡
+all_necks = list_necks()
+```
+
+---
+
+## [NN (Neural Networks)](../chameleon/nn/)
+
+**目的**:
+這個模塊的目的是包含各種常用的深度學習模塊和組件。
+
+**文件**:
+
+- `aspp.py`: 這個文件包含了 ASPP (Atrous Spatial Pyramid Pooling) 模組，這是一種常用於語義分割的模塊。
+- `block.py`: 基礎模塊可能包含基礎的構建區塊，如卷積區塊或殘差區塊。
+- `cnn.py`: 包含常規的卷積網絡模型，例如基礎的 CNN。
+- `dwcnn.py`: 深度可分離卷積網絡模型，這是一種高效的 CNN 變體。
+- `grl.py`: 包含梯度反轉層，常用於域自適應技術中。
+- `mbcnn.py`: MobileNet-like CNNs 可能是一些輕量級的網絡模型。
+- `positional_encoding.py`: 位置編碼，常用於 Transformer 架構中。
+- `selayer.py`: Squeeze-and-Excitation 層，增強了通道間的關聯性。
+- `utils.py`: 包含一些實用的工具和助手函數。
+- `vae.py`: Variational AutoEncoder，一種用於生成模型的自編碼器。
+
+**components**:
+
+- `activation.py`: 激活函數，如 ReLU, Sigmoid 等。
+- `dropout.py`: dropout 層，用於正則化。
+- `loss.py`: 包含損失函數，如交叉熵、均方誤差等。
+- `norm.py`: 正則化方法，如 BatchNorm、LayerNorm 等。
+- `pooling.py`: 池化操作，如 MaxPooling, AvgPooling 等。
+
+---
+
+## [Optim](../chameleon/optim/)
+
+**目的**:
+
+此模組的主要目的是為深度學習模型提供優化策略，特別是學習率的調整策略。適當的學習率策略可以幫助模型更快地收斂，並可能提高其最終性能。
+
+**文件**:
+
+- `warm_up.py` - 這個文件定義了一個為學習率提供暖身策略的調度器。暖身策略是近年來在深度學習社區中變得流行的策略，它在訓練的開始階段逐步增加學習率，這對於避免訓練初期由於學習率太高而可能發生的不穩定性特別有用。
+
+---
+
+## [Transformers](../chameleon/transformers/)
+
+- **目的**: 此模組的主要目的是為深度學習模型提供各種基於 Transformer 的模型架構。這些架構在許多自然語言處理和計算機視覺的應用中都取得了卓越的成果。
+
+- **文件**:
+  - `basic.py` - 包含基礎的 Transformer 架構，如 Encoder 和相關層。
+  - `efficientformer.py` - 包含 EfficientFormer 模型，這是一種針對效率優化的 Transformer 架構。
+  - `metaformer.py` - 包含 MetaFormer 模型，這是一個具有多種可插拔功能的 Transformer 模型。
+  - `mobilevit.py` - 包含 MobileViT 模型，這是專為移動裝置優化的 Vision Transformer 模型。
+  - `poolformer.py` - 包含 PoolFormer 模型，這是一種將 pooling 策略整合到 Transformer 中的模型。
+  - `token_mixer.py` - 包含 Token Mixer 模組，這是用於混合和操作輸入 token 的方法。
+  - `utils.py` - 包含多種工具和助手函數，如計算 patch 大小和列出可用的 transformer 模型。
+  - `vit.py` - 包含 Vision Transformer 模型，這是一種專為圖像處理設計的 Transformer。
+
+---
+
+## [Utils](../chameleon/utils/)
+
+- **目的**: 此模組旨在提供各種工具函數，以支援上述的模塊。
+
+- **文件**:
+  - `cpuinfo.py` - 用於獲取 CPU 相關資訊。
+  - `model_profile.py` - 提供模型分析和性能分析功能。
+  - `replace.py` - 提供用於替換模型組件的工具。
+
+### 主要功能
+
+1. **CPU 資訊提取**：利用`cpuinfo.py`，用戶可以獲取詳細的 CPU 資訊，例如製造商、型號、核心數量等。
+2. **模型分析和性能分析**：透過`model_profile.py`，用戶可以評估模型的複雜性、計算量等資訊。
+3. **模型組件替換**：使用`replace.py`，開發者可以方便地替換或修改特定模型的組件。
+
+### 主要結構和組件
+
+1. **CPU 資訊提取器**：`cpuinfo`功能是基於 Pearu Peterson 於 2002 年的著作，用於提取系統中的 CPU 詳細資訊。
+
+   使用範例：
+
+   ```python
+   from cpuinfo import cpuinfo
+   info = cpuinfo()
+   print(list(info[0].keys()))
+   ```
+
+   返回的資訊範例如：'processor', 'vendor_id', 'cpu family', ... 等。
+
+2. **模型複雜度和性能分析**：這部分提供了`get_model_complexity_info`、`get_cpu_gflops`和`get_meta_info`等函數，允許用戶評估模型的複雜性、計算量以及基於 CPU 的預估推理時間。
+
+3. **模型組件替換工具**：這裡提供了`replace_module`和`replace_module_attr_value`兩個函數。前者允許用戶替換指定的模組，而後者則允許用戶修改指定模組的屬性值。
diff --git a/setup.cfg b/setup.cfg
new file mode 100644
index 0000000..a0dbf76
--- /dev/null
+++ b/setup.cfg
@@ -0,0 +1,44 @@
+[metadata]
+name = chameleon
+version = attr: chameleon.__version__
+description = Torch based deep learning library.
+long_description = file: README.md
+license = Apache License 2.0
+classifiers=
+    Development Status :: 5 - Production/Stable
+    License :: OSI Approved :: Apache Software License
+    Intended Audience :: Developers
+    Intended Audience :: Science/Research
+    Operating System :: OS Independent
+    Programming Language :: Python :: 3.10
+    Programming Language :: Python :: 3.11
+    Programming Language :: Python :: 3.12
+    Topic :: Software Development :: Libraries
+    Topic :: Software Development :: Libraries :: Python Modules
+python_requires = >=3.10,<=3.12
+url = https://github.com/DocsaidLab/Chameleon.git
+
+[options]
+packages = find:
+include_package_data = True
+setup_requires=
+    pip
+    setuptools
+    wheel
+install_requires =
+    timm>=0.5.4
+    scikit-learn
+    transformers
+    torch>=2.4.0
+    torchvision
+    torchmetrics
+    albumentations
+    ptflops==0.7.0
+    calflops
+    rich
+
+[options.packages.find]
+exclude =
+    docker
+    docs
+    tests
\ No newline at end of file
diff --git a/setup.py b/setup.py
new file mode 100644
index 0000000..6b40b52
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,4 @@
+from setuptools import setup
+
+if __name__ == '__main__':
+    setup()
diff --git a/tests/backbone/test_backbone.py b/tests/backbone/test_backbone.py
new file mode 100644
index 0000000..32ee631
--- /dev/null
+++ b/tests/backbone/test_backbone.py
@@ -0,0 +1,126 @@
+import pytest
+import torch
+
+from chameleon import build_backbone, list_backbones
+
+INPUT1 = torch.rand(1, 3, 320, 320)
+INPUT2 = torch.rand(1, 6, 224, 224)
+data = [
+    # gpunet
+    (
+        INPUT1,
+        {'name': 'gpunet_0', },
+        {
+            'out_shapes': [
+                torch.Size([1, 32, 160, 160]),
+                torch.Size([1, 32, 80, 80]),
+                torch.Size([1, 64, 40, 40]),
+                torch.Size([1, 256, 20, 20]),
+                torch.Size([1, 704, 10, 10]),
+            ]
+        }
+    ),
+    (
+        INPUT1,
+        {'name': 'gpunet_1', 'out_indices': [0, 2, 3]},
+        {
+            'out_shapes': [
+                torch.Size([1, 24, 160, 160]),
+                torch.Size([1, 96, 40, 40]),
+                torch.Size([1, 288, 20, 20]),
+            ]
+        }
+    ),
+    (
+        INPUT1,
+        {'name': 'gpunet_2', 'out_indices': [0, 1, 2]},
+        {
+            'out_shapes': [
+                torch.Size([1, 32, 160, 160]),
+                torch.Size([1, 32, 80, 80]),
+                torch.Size([1, 112, 40, 40]),
+            ]
+        }
+    ),
+    (
+        INPUT1,
+        {'name': 'gpunet_p0'},
+        {
+            'out_shapes': [
+                torch.Size([1, 32, 160, 160]),
+                torch.Size([1, 64, 80, 80]),
+                torch.Size([1, 96, 40, 40]),
+                torch.Size([1, 256, 20, 20]),
+                torch.Size([1, 704, 10, 10]),
+            ]
+        }
+    ),
+    (
+        INPUT1,
+        {'name': 'gpunet_p1'},
+        {
+            'out_shapes': [
+                torch.Size([1, 32, 160, 160]),
+                torch.Size([1, 64, 80, 80]),
+                torch.Size([1, 96, 40, 40]),
+                torch.Size([1, 256, 20, 20]),
+                torch.Size([1, 704, 10, 10]),
+            ]
+        }
+    ),
+    (
+        INPUT1,
+        {'name': 'gpunet_d1'},
+        {
+            'out_shapes': [
+                torch.Size([1, 33, 160, 160]),
+                torch.Size([1, 44, 80, 80]),
+                torch.Size([1, 67, 40, 40]),
+                torch.Size([1, 190, 20, 20]),
+                torch.Size([1, 268, 10, 10]),
+            ]
+        }
+    ),
+    (
+        INPUT1,
+        {'name': 'gpunet_d2', 'out_indices': [3, 4]},
+        {
+            'out_shapes': [
+                torch.Size([1, 272, 20, 20]),
+                torch.Size([1, 384, 10, 10]),
+            ]
+        }
+    ),
+    (
+        INPUT1,
+        {'name': 'gpunet_d2', 'out_indices': [-1]},
+        {
+            'out_shapes': [torch.Size([1, 384, 10, 10])]
+        }
+    ),
+]
+
+
+@ pytest.mark.parametrize('in_tensor,build_kwargs,expected', data)
+def test_build_backbone(in_tensor, build_kwargs, expected):
+    model = build_backbone(**build_kwargs)
+    outs = model(in_tensor)
+    if isinstance(outs, (list, tuple)):
+        out_shapes = [x.shape for x in outs]
+    else:
+        out_shapes = outs.shape
+    assert out_shapes == expected['out_shapes']
+
+
+data = [
+    (
+        '*gpunet*',
+        ['gpunet_0', 'gpunet_1', 'gpunet_2', 'gpunet_p0',
+            'gpunet_p1', 'gpunet_d1', 'gpunet_d2']
+    ),
+]
+
+
+@ pytest.mark.parametrize('filter,expected', data)
+def test_list_backbones(filter, expected):
+    assert list_backbones(filter) == expected
diff --git a/tests/efficientdet/test_efficientdet.py b/tests/efficientdet/test_efficientdet.py
new file mode 100644
index 0000000..3041fec
--- /dev/null
+++ b/tests/efficientdet/test_efficientdet.py
@@ -0,0 +1,58 @@
+import pytest
+import torch
+
+from chameleon import EfficientDet
+
+
+@pytest.fixture
+def input_tensor():
+    # create a sample input tensor
+    return torch.rand((1, 3, 512, 512))
+
+
+@pytest.mark.parametrize("compound_coef, pretrained", [
+    (0, True),
+    (1, True),
+    (2, True),
+    (3, True),
+    (4, True),
+    (5, True),
+    (6, False),
+    (7, False),
+    (8, False),
+    (0, False),
+])
+def test_efficientdet_backbone(input_tensor, compound_coef, pretrained):
+    # create the model with the specified compound_coef and pretrained options
+    model = EfficientDet(compound_coef=compound_coef, pretrained=pretrained)
+
+    # verify that the model is PowerModule and nn.Module
+    assert isinstance(model, EfficientDet)
+    assert isinstance(model, torch.nn.Module)
+
+    # verify that the forward pass of the model returns a list of feature maps
+    output = model(input_tensor)
+    assert isinstance(output, list)
+
+    # verify that the shape of each feature map in the output list is correct
+    conv_channel_coef = {
+        0: [40, 112, 320],
+        1: [40, 112, 320],
+        2: [48, 120, 352],
+        3: [48, 136, 384],
+        4: [56, 160, 448],
+        5: [64, 176, 512],
+        6: [72, 200, 576],
+        7: [80, 224, 640],
+        8: [88, 248, 704],
+    }
+
+    for i in range(len(output)):
+        expected_shape = (
+            1,
+            model.fpn_num_filters[compound_coef],
+            int(input_tensor.shape[2] / 2 ** (i+3)),
+            int(input_tensor.shape[3] / 2 ** (i+3))
+        )
+
+        assert output[i].shape == expected_shape
diff --git a/tests/neck/test_bifpn.py b/tests/neck/test_bifpn.py
new file mode 100644
index 0000000..259d3e1
--- /dev/null
+++ b/tests/neck/test_bifpn.py
@@ -0,0 +1,91 @@
+import torch
+
+from chameleon.neck import BiFPN, BiFPNs
+
+
+def test_bifpn():
+    in_channels_list = [256, 512, 1024, 2048]
+    out_channels = 256
+    bifpn = BiFPN(in_channels_list, out_channels, extra_layers=2, out_indices=[0, 1, 2, 3])
+
+    x1 = torch.randn(3, in_channels_list[0], 128, 128)
+    x2 = torch.randn(3, in_channels_list[1], 64, 64)
+    x3 = torch.randn(3, in_channels_list[2], 32, 32)
+    x4 = torch.randn(3, in_channels_list[3], 16, 16)
+    feats = [x1, x2, x3, x4]
+    outs = bifpn(feats)
+    assert len(outs) == 4
+    assert bifpn.conv1x1s[0].__class__.__name__ == 'Identity'
+    for out in outs:
+        assert out.shape[0] == 3
+        assert out.shape[1] == out_channels
+        assert out.shape[2] == out.shape[3]
+
+
+def test_build_bifpn():
+    in_channels_list = [256, 512, 1024, 2048]
+    out_channels = 256
+    extra_layers = 2
+    upsample_mode = 'bilinear'
+    out_indices = [0, 1, 2, 3]
+    bifpn = BiFPN.build_bifpn(in_channels_list, out_channels, extra_layers, out_indices, upsample_mode)
+    assert isinstance(bifpn, BiFPN)
+
+
+def test_build_convbifpn():
+    in_channels_list = [256, 512, 1024, 2048]
+    out_channels = 256
+    extra_layers = 2
+    upsample_mode = 'bilinear'
+    out_indices = [0, 1, 2, 3]
+    bifpn = BiFPN.build_convbifpn(in_channels_list, out_channels, extra_layers, out_indices, upsample_mode)
+    assert isinstance(bifpn, BiFPN)
+
+
+def test_bifpns_module():
+    # Define test inputs
+    in_channels_list = [64, 128, 256]
+    out_channels = 256
+    n_bifpn = 3
+    extra_layers = 2
+    out_indices = [0, 2]
+    upsample_mode = 'nearest'
+    attention = True
+
+    # Initialize BiFPNs module
+    bifpns = BiFPNs(
+        in_channels_list=in_channels_list,
+        out_channels=out_channels,
+        n_bifpn=n_bifpn,
+        extra_layers=extra_layers,
+        out_indices=out_indices,
+        upsample_mode=upsample_mode,
+        attention=attention,
+    )
+
+    # Generate test inputs
+    input_shapes = [(1, in_channels, 32 // 2**i, 32 // 2**i) for i, in_channels in enumerate(in_channels_list)]
+    inputs = [torch.randn(shape) for shape in input_shapes]
+
+    # Test forward pass
+    output_shapes = [(1, out_channels, 32 // 2**i, 32 // 2**i) for i in range(len(in_channels_list))]
+    expected_output_shapes = [shape for i, shape in enumerate(output_shapes) if i in out_indices]
+    expected_output = [torch.randn(shape) for shape in expected_output_shapes]
+    output = bifpns(inputs)
+    assert isinstance(output, list)
+    assert len(output) == 2
+    for i, idx in enumerate(out_indices):
+        assert output[i].shape == output_shapes[idx]
+
+    # Test if the upsample_mode is correct
+    for i in range(n_bifpn):
+        assert bifpns.block[i].upsample_mode == upsample_mode
+
+    # Test if the attention mechanism is applied
+    for i in range(n_bifpn):
+        assert bifpns.block[i].attention == attention
+
+    # Test if the input channels are correct
+    for i in range(n_bifpn):
+        expected_in_channels = [out_channels] * (len(in_channels_list) + extra_layers) if i != 0 else in_channels_list
+        assert bifpns.block[i].in_channels_list == expected_in_channels
diff --git a/tests/neck/test_fpn.py b/tests/neck/test_fpn.py
new file mode 100644
index 0000000..4cdf909
--- /dev/null
+++ b/tests/neck/test_fpn.py
@@ -0,0 +1,82 @@
+import torch
+
+from chameleon.neck import FPN, FPNs
+
+
+def test_fpn():
+    in_channels_list = [256, 512, 1024, 2048]
+    out_channels = 256
+    fpn = FPN(in_channels_list, out_channels, extra_layers=2, out_indices=[0, 1, 2, 3])
+    x1 = torch.randn(1, in_channels_list[0], 128, 128)
+    x2 = torch.randn(1, in_channels_list[1], 64, 64)
+    x3 = torch.randn(1, in_channels_list[2], 32, 32)
+    x4 = torch.randn(1, in_channels_list[3], 16, 16)
+    feats = [x1, x2, x3, x4]
+    outs = fpn(feats)
+    assert len(outs) == 4
+    assert fpn.conv1x1s[0].__class__.__name__ == 'Identity'
+    for out in outs:
+        assert out.shape[0] == 1
+        assert out.shape[1] == out_channels
+        assert out.shape[2] == out.shape[3]
+
+
+def test_build_fpn():
+    in_channels_list = [256, 512, 1024, 2048]
+    out_channels = 256
+    extra_layers = 2
+    upsample_mode = 'bilinear'
+    out_indices = [0, 1, 2, 3]
+    fpn = FPN.build_fpn(in_channels_list, out_channels, extra_layers, out_indices, upsample_mode)
+    assert isinstance(fpn, FPN)
+
+
+def test_build_dwfpn():
+    in_channels_list = [256, 512, 1024, 2048]
+    out_channels = 256
+    extra_layers = 2
+    upsample_mode = 'bilinear'
+    out_indices = [0, 1, 2, 3]
+    fpn = FPN.build_dwfpn(in_channels_list, out_channels, extra_layers, out_indices, upsample_mode)
+    assert isinstance(fpn, FPN)
+
+
+def test_fpns_module():
+    # Define test inputs
+    in_channels_list = [64, 128, 256]
+    out_channels = 256
+    n_fpn = 3
+    extra_layers = 2
+    out_indices = [0, 2]
+    upsample_mode = 'nearest'
+
+    # Initialize FPNs module
+    fpns = FPNs(
+        in_channels_list=in_channels_list,
+        out_channels=out_channels,
+        n_fpn=n_fpn,
+        extra_layers=extra_layers,
+        out_indices=out_indices,
+        upsample_mode=upsample_mode,
+    )
+
+    # Generate test inputs
+    input_shapes = [(1, in_channels, 32 // 2**i, 32 // 2**i) for i, in_channels in enumerate(in_channels_list)]
+    inputs = [torch.randn(shape) for shape in input_shapes]
+
+    # Test forward pass
+    output_shapes = [(1, out_channels, 32 // 2**i, 32 // 2**i) for i in range(len(in_channels_list))]
+    output = fpns(inputs)
+    assert isinstance(output, list)
+    assert len(output) == 2
+    for i, idx in enumerate(out_indices):
+        assert output[i].shape == output_shapes[idx]
+
+    # Test if the upsample_mode is correct
+    for i in range(n_fpn):
+        assert fpns.block[i].upsample_mode == upsample_mode
+
+    # Test if the input channels are correct
+    for i in range(n_fpn):
+        expected_in_channels = [out_channels] * (len(in_channels_list) + extra_layers) if i != 0 else in_channels_list
+        assert fpns.block[i].in_channels_list == expected_in_channels
diff --git a/tests/neck/test_neck.py b/tests/neck/test_neck.py
new file mode 100644
index 0000000..bfb3b4e
--- /dev/null
+++ b/tests/neck/test_neck.py
@@ -0,0 +1,70 @@
+import pytest
+import torch
+
+from chameleon.neck import build_neck, list_necks
+
+INPUT1 = [
+    torch.rand(1, 16, 80, 80),
+    torch.rand(1, 32, 40, 40),
+    torch.rand(1, 64, 20, 20),
+]
+
+data = [
+    (
+        INPUT1,
+        {'name': 'fpn', 'in_channels_list': [16, 32, 64], 'out_channels': 24},
+        {'out_shapes': [
+            torch.Size((1, 24, 80, 80)),
+            torch.Size((1, 24, 40, 40)),
+            torch.Size((1, 24, 20, 20)),
+        ]}
+    ),
+    (
+        INPUT1,
+        {'name': 'bifpn', 'in_channels_list': [16, 32, 64], 'out_channels': 24, 'extra_layers': 2},
+        {'out_shapes': [
+            torch.Size((1, 24, 80, 80)),
+            torch.Size((1, 24, 40, 40)),
+            torch.Size((1, 24, 20, 20)),
+            torch.Size((1, 24, 10, 10)),
+            torch.Size((1, 24, 5, 5)),
+        ]}
+    ),
+    (
+        INPUT1,
+        {'name': 'bifpn', 'in_channels_list': [16, 32, 64], 'out_channels': 24, 'out_indices': [0, 1, 2]},
+        {'out_shapes': [
+            torch.Size((1, 24, 80, 80)),
+            torch.Size((1, 24, 40, 40)),
+            torch.Size((1, 24, 20, 20)),
+        ]}
+    ),
+]
+
+
+@ pytest.mark.parametrize('in_tensor,build_kwargs,expected', data)
+def test_build_backbone(in_tensor, build_kwargs, expected):
+    model = build_neck(**build_kwargs)
+    outs = model(in_tensor)
+    if isinstance(outs, (list, tuple)):
+        out_shapes = [x.shape for x in outs]
+    else:
+        out_shapes = outs.shape
+    assert out_shapes == expected['out_shapes']
+
+
+data = [
+    (
+        '',
+        ['fpn', 'fpns', 'bifpn', 'bifpns']
+    ),
+    (
+        '*bi*',
+        ['bifpn', 'bifpns']
+    ),
+]
+
+
+@ pytest.mark.parametrize('filter,expected', data)
+def test_list_backbones(filter, expected):
+    assert list_necks(filter) == expected
diff --git a/tests/nn/component/test_activation.py b/tests/nn/component/test_activation.py
new file mode 100644
index 0000000..3878119
--- /dev/null
+++ b/tests/nn/component/test_activation.py
@@ -0,0 +1,46 @@
+import pytest
+import torch
+
+from chameleon.nn.components import SquaredReLU, StarReLU, build_activation
+
+test_build_activation_data = [
+    ('ReLU', torch.nn.ReLU),
+    ('LeakyReLU', torch.nn.LeakyReLU),
+    ('Swish', torch.nn.SiLU),
+    ('StarReLU', StarReLU),
+    ('SquaredReLU', SquaredReLU),
+    ('FakeActivation', ValueError)
+]
+
+
+@pytest.mark.parametrize('name, expected_output', test_build_activation_data)
+def test_build_activation(name, expected_output):
+    if expected_output == ValueError:
+        with pytest.raises(ValueError):
+            build_activation(name)
+    else:
+        assert isinstance(build_activation(name), expected_output)
+
+
+def test_starrelu():
+    x = torch.tensor([-1, 0, 1], dtype=torch.float32)
+    relu = StarReLU(scale=2.0, bias=1.0)
+    # Test forward pass
+    expected_output = torch.tensor([1, 1, 3], dtype=torch.float32)
+    assert torch.allclose(relu(x), expected_output)
+
+    # Test backward pass with scale and bias learnable
+    optimizer = torch.optim.SGD(relu.parameters(), lr=0.01)
+    loss = (relu(x).sum() - expected_output.sum()) ** 2
+    loss.backward()
+    optimizer.step()
+    assert relu.scale.requires_grad
+    assert relu.bias.requires_grad
+    assert torch.allclose(relu(x), expected_output, rtol=1e-3)
+
+    # Test backward pass with fixed scale and bias
+    relu = StarReLU(scale=2.0, bias=1.0, scale_learnable=False,
+                    bias_learnable=False)
+    assert not relu.scale.requires_grad
+    assert not relu.bias.requires_grad
+    assert torch.allclose(relu(x), expected_output, rtol=1e-3)
diff --git a/tests/nn/component/test_loss.py b/tests/nn/component/test_loss.py
new file mode 100644
index 0000000..20ab5d4
--- /dev/null
+++ b/tests/nn/component/test_loss.py
@@ -0,0 +1,41 @@
+import pytest
+import torch
+
+from chameleon import AWingLoss, WeightedAWingLoss
+
+
+@pytest.fixture(scope='module')
+def loss_fn():
+    return AWingLoss()
+
+
+@pytest.fixture(scope='module')
+def weighted_loss_fn():
+    return WeightedAWingLoss()
+
+
+def test_AWingLoss():
+    loss_fn = AWingLoss(alpha=2.1, omega=14, epsilon=1, theta=0.5)
+    preds = torch.tensor([1.0, 2.0, 3.0])
+    targets = torch.tensor([2.0, 2.0, 2.0])
+    loss = loss_fn(preds, targets)
+    assert loss.shape == preds.shape
+    assert torch.allclose(loss, torch.tensor([9.9030, 0.0000, 9.9030]), atol=1e-4)
+
+
+def test_WeightedAWingLoss():
+    weighted_loss_fn = WeightedAWingLoss(w=10, alpha=2.1, omega=14, epsilon=1, theta=0.5)
+    preds = torch.tensor([1.0, 2.0, 3.0])
+    targets = torch.tensor([2.0, 2.0, 2.0])
+    weight_map = torch.tensor([0, 1, 0], dtype=torch.bool)
+    loss = weighted_loss_fn(preds, targets, weight_map=weight_map)
+    assert torch.allclose(loss, torch.tensor(6.6020), atol=1e-4)
+
+    # Test without weight_map
+    loss = weighted_loss_fn(preds, targets)
+    assert torch.allclose(loss, torch.tensor(72.6221), atol=1e-4)
+
+    # Test with float weight_map
+    weight_map = torch.tensor([0.0, 1.0, 0.0])
+    loss = weighted_loss_fn(preds, targets, weight_map=weight_map)
+    assert torch.allclose(loss, torch.tensor(6.6020), atol=1e-4)
diff --git a/tests/nn/component/test_norm.py b/tests/nn/component/test_norm.py
new file mode 100644
index 0000000..9eb4dc1
--- /dev/null
+++ b/tests/nn/component/test_norm.py
@@ -0,0 +1,62 @@
+import pytest
+import torch
+from torch.nn.modules.batchnorm import (BatchNorm1d, BatchNorm2d, BatchNorm3d,
+                                        SyncBatchNorm)
+from torch.nn.modules.instancenorm import (InstanceNorm1d, InstanceNorm2d,
+                                           InstanceNorm3d)
+from torch.nn.modules.normalization import (CrossMapLRN2d, GroupNorm,
+                                            LayerNorm, LocalResponseNorm)
+
+from chameleon import LayerNorm2d, build_norm
+
+NORM_CLASSES = {
+    'BatchNorm1d': BatchNorm1d,
+    'BatchNorm2d': BatchNorm2d,
+    'BatchNorm3d': BatchNorm3d,
+    'SyncBatchNorm': SyncBatchNorm,
+    'InstanceNorm1d': InstanceNorm1d,
+    'InstanceNorm2d': InstanceNorm2d,
+    'InstanceNorm3d': InstanceNorm3d,
+    'CrossMapLRN2d': CrossMapLRN2d,
+    'GroupNorm': GroupNorm,
+    'LayerNorm': LayerNorm,
+    'LayerNorm2d': LayerNorm2d,
+    'LocalResponseNorm': LocalResponseNorm,
+}
+
+
+@pytest.mark.parametrize('name', NORM_CLASSES.keys())
+def test_build_norm(name: str) -> None:
+    options = {}
+    cls = NORM_CLASSES[name]
+    if name.startswith('BatchNorm'):
+        options['num_features'] = 8
+    elif name.startswith('InstanceNorm'):
+        options['num_features'] = 8
+    elif name.startswith('SyncBatchNorm'):
+        options['num_features'] = 8
+    elif name.startswith('GroupNorm'):
+        options['num_groups'] = 4
+        options['num_channels'] = 8
+    elif name.startswith('LayerNorm'):
+        if name == 'LayerNorm2d':
+            options['num_channels'] = 8
+        else:
+            options['normalized_shape'] = [8]
+    elif name.startswith('LocalResponseNorm'):
+        options['size'] = 3
+    elif name.startswith('CrossMapLRN2d'):
+        options['size'] = 3
+    norm = build_norm(name, **options)
+    assert isinstance(norm, cls)
+
+
+def test_layer_norm_2d():
+    # Create a tensor of size (N, C, H, W)
+    x = torch.randn(2, 3, 4, 4)
+    # Initialize LayerNorm2d
+    ln = LayerNorm2d(num_channels=3)
+    # Forward pass
+    y = ln(x)
+    # Check output shape
+    assert y.shape == (2, 3, 4, 4)
diff --git a/tests/nn/component/test_pool.py b/tests/nn/component/test_pool.py
new file mode 100644
index 0000000..3b01766
--- /dev/null
+++ b/tests/nn/component/test_pool.py
@@ -0,0 +1,29 @@
+import pytest
+import torch
+
+from chameleon import build_pool
+
+
+@pytest.fixture
+def input_tensor():
+    return torch.randn(2, 3, 16, 16)
+
+
+pool_layers = [
+    ('AdaptiveAvgPool2d', {'output_size': (1, 1)}, (2, 3, 1, 1)),
+    ('AdaptiveMaxPool2d', {'output_size': (1, 1)}, (2, 3, 1, 1)),
+    ('AvgPool2d', {'kernel_size': 3, 'stride': 1, 'padding': 1}, (2, 3, 16, 16)),
+    ('MaxPool2d', {'kernel_size': 3, 'stride': 1, 'padding': 1}, (2, 3, 16, 16)),
+    ('GAP', {}, (2, 3)),
+    ('GMP', {}, (2, 3)),
+]
+
+
+@pytest.mark.parametrize('name, kwargs, expected_shape', pool_layers)
+def test_pool_layer(name, kwargs, expected_shape, input_tensor):
+    # Build the pool layer
+    layer = build_pool(name, **kwargs)
+
+    # Check the output shape
+    output = layer(input_tensor)
+    assert output.shape == expected_shape
diff --git a/tests/nn/test_PowerModule.py b/tests/nn/test_PowerModule.py
new file mode 100644
index 0000000..ad45ea5
--- /dev/null
+++ b/tests/nn/test_PowerModule.py
@@ -0,0 +1,52 @@
+import pytest
+import torch
+
+from chameleon.nn.utils import PowerModule, initialize_weights
+
+
+class SimpleModel(PowerModule):
+    def __init__(self):
+        super().__init__()
+        self.layer1 = torch.nn.Linear(10, 5)
+        self.layer2 = torch.nn.Linear(5, 2)
+
+
+@pytest.fixture
+def model():
+    return SimpleModel()
+
+
+def test_initialize_weights(model):
+    initialize_weights(model)
+    for param in model.parameters():
+        assert not torch.isnan(param).any()
+
+
+def test_freeze(model):
+    model.freeze(verbose=True)
+    for param in model.parameters():
+        assert not param.requires_grad
+
+
+def test_melt(model):
+    model.freeze(verbose=True)
+    model.melt(verbose=True)
+    for param in model.parameters():
+        assert param.requires_grad
+
+
+def test_initialize_weights_(model):
+    model.initialize_weights_()
+    for param in model.parameters():
+        assert not torch.isnan(param).any()
+
+
+def test_freeze_layer(model):
+    model.freeze('layer1', verbose=True)
+    assert not model.layer1.weight.requires_grad
+
+
+def test_melt_layer(model):
+    model.freeze('layer1', verbose=True)
+    model.melt('layer1', verbose=True)
+    assert model.layer1.weight.requires_grad
diff --git a/tests/nn/test_aspp.py b/tests/nn/test_aspp.py
new file mode 100644
index 0000000..af22843
--- /dev/null
+++ b/tests/nn/test_aspp.py
@@ -0,0 +1,33 @@
+import pytest
+import torch
+
+from chameleon.nn import ASPPLayer, Hswish
+
+
+@pytest.fixture
+def input_tensor():
+    return torch.randn(1, 64, 32, 32)
+
+
+def test_aspp_layer(input_tensor):
+    in_channels = input_tensor.size(1)
+    out_channels = 128
+
+    # Test default activation function (ReLU)
+    aspp_layer = ASPPLayer(in_channels, out_channels)
+    output = aspp_layer(input_tensor)
+    assert output.size() == (1, out_channels, 32, 32)
+
+    # Test with Hswish activation function
+    aspp_layer = ASPPLayer(in_channels, out_channels, output_activate=Hswish())
+    output = aspp_layer(input_tensor)
+    assert output.size() == (1, out_channels, 32, 32)
+
+    # Test with different dilation rates
+    aspp_layer = ASPPLayer(in_channels, out_channels)
+    aspp_layer.layers['DILATE1'].dilation = (2, 2)
+    aspp_layer.layers['DILATE2'].dilation = (4, 4)
+    aspp_layer.layers['DILATE3'].dilation = (8, 8)
+    aspp_layer.layers['DILATE4'].dilation = (16, 16)
+    output = aspp_layer(input_tensor)
+    assert output.size() == (1, out_channels, 32, 32)
diff --git a/tests/nn/test_block.py b/tests/nn/test_block.py
new file mode 100644
index 0000000..1ed6d5b
--- /dev/null
+++ b/tests/nn/test_block.py
@@ -0,0 +1,82 @@
+import pytest
+import torch
+import torch.nn as nn
+
+from chameleon.nn import SeparableConvBlock
+
+
+@pytest.fixture
+def cnn_arch():
+    return [
+        {'in_channels': 3, 'out_channels': 32, 'kernel': 3},
+        {'in_channels': 32, 'out_channels': 64, 'kernel': 3},
+        {'in_channels': 64, 'out_channels': 128, 'kernel': 3},
+    ]
+
+
+@pytest.fixture
+def fc_arch():
+    return [
+        {'in_channels': 3, 'out_channels': 32},
+        {'in_channels': 32, 'out_channels': 64},
+        {'in_channels': 64, 'out_channels': 128},
+    ]
+
+
+def test_SeparableConvBlock_forward():
+    # Test input and output shapes
+    in_channels = 64
+    out_channels = 128
+    block = SeparableConvBlock(in_channels, out_channels)
+    x = torch.randn(1, in_channels, 64, 64)
+    output = block(x)
+    assert output.shape == (1, out_channels, 64, 64)
+
+    # Test with different kernel size and padding
+    kernel_size = (5, 3)
+    padding = (1, 2)
+    block = SeparableConvBlock(in_channels, out_channels, kernel=kernel_size, padding=padding)
+    output = block(x)
+    assert output.shape == (1, out_channels, 62, 66)
+
+    # Test with different stride
+    stride = 2
+    block = SeparableConvBlock(in_channels, out_channels, stride=stride)
+    output = block(x)
+    assert output.shape == (1, out_channels, 32, 32)
+
+    # Test with different output channels
+    out_channels = 32
+    block = SeparableConvBlock(in_channels, out_channels)
+    output = block(x)
+    assert output.shape == (1, out_channels, 64, 64)
+
+    # Test without normalization and activation
+    block = SeparableConvBlock(in_channels, out_channels, norm=None, act=None)
+    output = block(x)
+    assert output.shape == (1, out_channels, 64, 64)
+
+
+def test_SeparableConvBlock_build_activation():
+    # Test build_activation() function with different activation functions
+    activation_fns = [
+        {'name': 'ReLU'},
+        {'name': 'Sigmoid'},
+        {'name': 'Tanh'},
+        {'name': 'LeakyReLU', 'negative_slope': 0.2}
+    ]
+    for act in activation_fns:
+        block = SeparableConvBlock(64, 64, act=act)
+        assert isinstance(block.act, nn.Module)
+
+
+def test_SeparableConvBlock_build_norm():
+    # Test build_norm() function with different normalization layers
+    norm_layers = [
+        {'name': 'BatchNorm2d', 'num_features': 64},
+        {'name': 'InstanceNorm2d', 'num_features': 64},
+        {'name': 'GroupNorm', 'num_groups': 8, 'num_channels': 64},
+    ]
+    for norm in norm_layers:
+        block = SeparableConvBlock(64, 64, norm=norm)
+        assert isinstance(block.norm, nn.Module)
diff --git a/tests/nn/test_cnn.py b/tests/nn/test_cnn.py
new file mode 100644
index 0000000..67b8686
--- /dev/null
+++ b/tests/nn/test_cnn.py
@@ -0,0 +1,69 @@
+import pytest
+import torch
+import torch.nn as nn
+
+from chameleon.nn import CNN2Dcell
+
+
+@pytest.fixture
+def input_tensor():
+    return torch.randn((2, 3, 32, 32))
+
+
+@pytest.fixture
+def output_shape():
+    return (2, 16, 32, 32)
+
+
+def test_cnn2dcell_forward(input_tensor, output_shape):
+    model = CNN2Dcell(in_channels=3, out_channels=16)
+    output = model(input_tensor)
+    assert output.shape == output_shape
+
+
+def test_cnn2dcell_with_activation(input_tensor, output_shape):
+    model = CNN2Dcell(in_channels=3, out_channels=16, act={'name': 'ReLU', 'inplace': True})
+    output = model(input_tensor)
+    assert output.shape == output_shape
+    assert torch.all(output >= 0)
+
+
+def test_cnn2dcell_with_batch_norm(input_tensor, output_shape):
+    model = CNN2Dcell(in_channels=3, out_channels=16, norm={'name': 'BatchNorm2d', 'num_features': 16})
+    output = model(input_tensor)
+    assert output.shape == output_shape
+    assert torch.allclose(output.mean(dim=(0, 2, 3)), torch.zeros(16), rtol=1e-3, atol=1e-5)
+    assert torch.allclose(output.var(dim=(0, 2, 3)), torch.ones(16), rtol=1e-3, atol=1e-5)
+
+
+def test_cnn2dcell_with_dropout(input_tensor, output_shape):
+    model = CNN2Dcell(in_channels=3, out_channels=16, dropout={'name': 'Dropout2d', 'p': 0.5})
+    output = model(input_tensor)
+    assert output.shape == output_shape
+
+
+def test_cnn2dcell_with_pooling(input_tensor):
+    model = CNN2Dcell(in_channels=3, out_channels=16, pool=nn.AdaptiveAvgPool2d(1))
+    output = model(input_tensor)
+    assert output.shape == (2, 16, 1, 1)
+
+
+def test_cnn2dcell_init_type(input_tensor):
+    model = CNN2Dcell(in_channels=3, out_channels=16, init_type='uniform')
+    output1 = model(input_tensor)
+    model = CNN2Dcell(in_channels=3, out_channels=16, init_type='normal')
+    output2 = model(input_tensor)
+    assert not torch.allclose(output1, output2, rtol=1e-3, atol=1e-5)
+
+
+def test_cnn2dcell_all_together(input_tensor):
+    model = CNN2Dcell(in_channels=3, out_channels=16,
+                      kernel=5, stride=2, padding=2, dilation=2, groups=1,
+                      bias=True, padding_mode='reflect',
+                      norm={'name': 'BatchNorm2d', 'num_features': 16, 'momentum': 0.5},
+                      dropout={'name': 'Dropout2d', 'p': 0.5},
+                      act={'name': 'LeakyReLU', 'negative_slope': 0.1, 'inplace': True},
+                      pool=nn.AdaptiveAvgPool2d(1),
+                      init_type='uniform')
+    output = model(input_tensor)
+    assert output.shape == (2, 16, 1, 1)
diff --git a/tests/nn/test_grl.py b/tests/nn/test_grl.py
new file mode 100644
index 0000000..cc397e9
--- /dev/null
+++ b/tests/nn/test_grl.py
@@ -0,0 +1,39 @@
+import torch
+
+from chameleon.nn import GradientReversalLayer
+
+
+def test_gradient_reversal_layer():
+    input_ = torch.rand(2, 3, 4, 5, requires_grad=True)
+    module = GradientReversalLayer()
+
+    # Test forward pass
+    output = module(input_)
+    assert output.shape == input_.shape
+    assert torch.allclose(output, input_)
+
+    # Test backward pass
+    loss = output.sum()
+    loss.backward()
+
+    # Check gradients
+    assert input_.grad is not None
+    assert torch.allclose(input_.grad, torch.tensor(-0.00025))
+
+
+def test_gradient_reversal_layer_warm_up():
+    input_ = torch.rand(2, 3, 4, 5, requires_grad=True)
+    module = GradientReversalLayer(warm_up=100)
+
+    # Test forward pass
+    output = module(input_)
+    assert output.shape == input_.shape
+    assert torch.allclose(output, input_)
+
+    # Test backward pass
+    loss = output.sum()
+    loss.backward()
+
+    # Check gradients
+    assert input_.grad is not None
+    assert torch.allclose(input_.grad, torch.tensor(-0.01))
diff --git a/tests/nn/test_mbcnn.py b/tests/nn/test_mbcnn.py
new file mode 100644
index 0000000..d66772e
--- /dev/null
+++ b/tests/nn/test_mbcnn.py
@@ -0,0 +1,60 @@
+import torch
+import torch.nn as nn
+
+from chameleon.nn import MBCNNcell
+
+
+def test_mbcnncell_identity():
+    # Test identity block
+    x = torch.randn(1, 16, 32, 32)
+    block = MBCNNcell(16, 16, kernel=3, stride=1)
+    out = block(x)
+    assert out.shape == x.shape
+
+
+def test_mbcnncell_expdim():
+    # Test expansion block
+    x = torch.randn(1, 16, 32, 32)
+    block = MBCNNcell(16, 32, kernel=3, stride=1)
+    out = block(x)
+    assert out.shape == (1, 32, 32, 32)
+
+
+def test_mbcnncell_norm():
+    # Test block with normalization layer
+    x = torch.randn(1, 16, 32, 32)
+    block = MBCNNcell(16, 16, kernel=3, stride=1, norm=nn.BatchNorm2d(16))
+    out = block(x)
+    assert out.shape == x.shape
+
+
+def test_mbcnncell_se():
+    # Test block with Squeeze-and-Excitation layer
+    x = torch.randn(1, 16, 32, 32)
+    block = MBCNNcell(16, 16, kernel=3, stride=1, use_se=True)
+    out = block(x)
+    assert out.shape == x.shape
+
+
+def test_mbcnncell_build_mbv1block():
+    # Test building of MobileNetV1-style block
+    x = torch.randn(1, 16, 32, 32)
+    block = MBCNNcell.build_mbv1block(16, 32)
+    out = block(x)
+    assert out.shape == (1, 32, 32, 32)
+
+
+def test_mbcnncell_build_mbv2block():
+    # Test building of MobileNetV2-style block
+    x = torch.randn(1, 16, 32, 32)
+    block = MBCNNcell.build_mbv2block(16, 32)
+    out = block(x)
+    assert out.shape == (1, 32, 32, 32)
+
+
+def test_mbcnncell_build_mbv3block():
+    # Test building of MobileNetV3-style block
+    x = torch.randn(1, 16, 32, 32)
+    block = MBCNNcell.build_mbv3block(16, 32)
+    out = block(x)
+    assert out.shape == (1, 32, 32, 32)
diff --git a/tests/nn/test_positional_encoding.py b/tests/nn/test_positional_encoding.py
new file mode 100644
index 0000000..c051345
--- /dev/null
+++ b/tests/nn/test_positional_encoding.py
@@ -0,0 +1,23 @@
+import pytest
+import torch
+
+from chameleon.nn import sinusoidal_positional_encoding_1d
+
+
+@pytest.mark.parametrize("length, dim", [(10, 16), (20, 32), (5, 8)])
+def test_sinusoidal_positional_encoding_1d(length, dim):
+    # Test that the output has the correct shape
+    pe = sinusoidal_positional_encoding_1d(length, dim)
+    assert pe.shape == (length, dim)
+
+    # Test that the output has the correct values
+    for i in range(length):
+        for j in range(dim // 2):
+            sin_val = torch.sin(torch.tensor(i / (10000 ** (2 * j / dim))))
+            cos_val = torch.cos(torch.tensor(i / (10000 ** (2 * j / dim))))
+            assert torch.isclose(pe[i][2*j], sin_val, atol=1e-6)
+            assert torch.isclose(pe[i][2*j+1], cos_val, atol=1e-6)
+
+    # Test that odd dimensions raise a ValueError
+    with pytest.raises(ValueError):
+        pe = sinusoidal_positional_encoding_1d(length, dim=7)
diff --git a/tests/nn/test_selayer.py b/tests/nn/test_selayer.py
new file mode 100644
index 0000000..56e6901
--- /dev/null
+++ b/tests/nn/test_selayer.py
@@ -0,0 +1,52 @@
+import torch
+
+from chameleon.nn import SELayer
+
+
+def test_selayer_output_shape():
+    # Test that the output shape of the SELayer is correct.
+    in_channels = 16
+    reduction = 4
+    batch_size = 8
+    height = 32
+    width = 32
+
+    x = torch.randn(batch_size, in_channels, height, width)
+    se_layer = SELayer(in_channels, reduction)
+    y = se_layer(x)
+
+    assert y.shape == x.shape
+
+
+def test_selayer_activation():
+    # Test that the output of the SELayer is activated correctly.
+    in_channels = 16
+    reduction = 4
+    batch_size = 8
+    height = 32
+    width = 32
+
+    x = torch.randn(batch_size, in_channels, height, width)
+    se_layer = SELayer(in_channels, reduction)
+    y = se_layer(x)
+
+    assert torch.all(y / x >= 0)
+    assert torch.all(y / x <= 1)
+
+
+def test_selayer_reduction():
+    # Test that the SELayer reduces the number of channels as expected.
+    in_channels = 16
+    reduction = 4
+    batch_size = 8
+    height = 32
+    width = 32
+
+    x = torch.randn(batch_size, in_channels, height, width)
+    se_layer = SELayer(in_channels, reduction)
+    y = se_layer(x)
+
+    expected_channels = in_channels // reduction
+
+    assert se_layer.fc1.layer.cnn.out_channels == expected_channels
+    assert se_layer.fc2.layer.cnn.out_channels == in_channels
diff --git a/tests/nn/test_vae.py b/tests/nn/test_vae.py
new file mode 100644
index 0000000..efd6a4e
--- /dev/null
+++ b/tests/nn/test_vae.py
@@ -0,0 +1,33 @@
+import pytest
+import torch
+
+from chameleon.nn import VAE
+
+
+@pytest.fixture
+def input_tensor():
+    return torch.randn(2, 3, 32, 32)
+
+
+@pytest.fixture
+def vae():
+    return VAE(3 * 32 * 32, 100)
+
+
+def test_vae_forward_shape(input_tensor, vae):
+    feat, kld_loss = vae(input_tensor.view(input_tensor.size(0), -1))
+    assert feat.shape == (input_tensor.size(0), 100)
+    assert kld_loss.shape == ()
+
+
+def test_vae_forward_kld_loss(input_tensor, vae):
+    feat, kld_loss = vae(input_tensor.view(input_tensor.size(0), -1))
+    assert kld_loss >= 0
+
+
+def test_vae_do_pooling():
+    model = VAE(in_channels=10, out_channels=5, do_pooling=True)
+    x = torch.randn(2, 10, 64, 64)
+    feat, kld_loss = model(x)
+    assert feat.shape == (2, 5)
+    assert kld_loss.shape == ()
diff --git a/tests/nn/test_weightedsum.py b/tests/nn/test_weightedsum.py
new file mode 100644
index 0000000..60a49c0
--- /dev/null
+++ b/tests/nn/test_weightedsum.py
@@ -0,0 +1,51 @@
+import pytest
+import torch
+import torch.nn as nn
+
+from chameleon.nn import WeightedSum
+
+
+def test_weighted_sum_init():
+    input_size = 3
+    ws = WeightedSum(input_size)
+    assert ws.input_size == input_size
+    assert isinstance(ws.weights, nn.Parameter)
+    assert ws.weights.shape == (input_size,)
+    assert ws.relu.__class__.__name__ == 'Identity'
+    assert ws.epsilon == 1e-4
+
+
+def test_weighted_sum_forward():
+    input_size = 3
+    ws = WeightedSum(input_size)
+
+    # Test valid input
+    x = [torch.randn(1, 5) for _ in range(input_size)]
+    y = ws(x)
+    assert y.shape == (1, 5)
+    assert torch.allclose(y, torch.mean(
+        torch.cat(x), dim=0, keepdim=True), atol=1e-3)
+
+    # Test invalid input size
+    with pytest.raises(ValueError):
+        ws(x[:-1])
+
+    # Test activation function
+    max_v = 10
+    min_v = -10
+    ws = WeightedSum(input_size, act=nn.ReLU(False))
+    x = [(max_v - min_v) * torch.rand(1, 5) + min_v for _ in range(input_size)]
+    y = ws(x)
+    assert torch.allclose(y, torch.mean(
+        torch.cat(x), dim=0, keepdim=True).relu(), atol=1e-3)
+
+    # Test custom activation function
+    class custom_act(nn.Module):
+        def forward(self, x):
+            return x + 1
+
+    ws = WeightedSum(input_size, act=custom_act())
+    x = [torch.randn(1, 5) for _ in range(input_size)]
+    y = ws(x)
+    assert torch.allclose(y, custom_act()(torch.mean(
+        torch.cat(x), dim=0, keepdim=True)), atol=1e-3)
diff --git a/tests/transformers/testMetaFormer.py b/tests/transformers/testMetaFormer.py
new file mode 100644
index 0000000..a23f5ba
--- /dev/null
+++ b/tests/transformers/testMetaFormer.py
@@ -0,0 +1,51 @@
+import pytest
+import torch
+from torch import nn
+
+from chameleon.transformers.metaformer import MetaFormer, MetaFormerBlock
+
+
+def test_init():
+    model = MetaFormer()
+    assert isinstance(model, nn.Module)
+    assert isinstance(model, MetaFormer)
+
+
+def test_forward():
+    model = MetaFormer()
+    input_tensor = torch.rand(3, 3, 224, 224)
+    all_hidden_state = model(input_tensor)
+    assert all_hidden_state[0].shape == torch.Size([3, 64, 56, 56])
+    assert all_hidden_state[1].shape == torch.Size([3, 128, 28, 28])
+    assert all_hidden_state[2].shape == torch.Size([3, 320, 14, 14])
+    assert all_hidden_state[3].shape == torch.Size([3, 512, 7, 7])
+
+
+def test_token_mixer():
+    model = MetaFormer(token_mixers=[
+        {'name': 'AttentionMixing', 'in_features': 64},
+        {'name': 'PoolMixing', 'pool_size': 5},
+        {'name': 'RandomMixing', 'num_tokens': 196},
+        {'name': 'SepConvMixing', 'in_features': 512, 'expand_ratio': 4}
+    ])
+    input_tensor = torch.rand(3, 3, 224, 224)
+    all_hidden_state = model(input_tensor)
+    assert all_hidden_state[0].shape == torch.Size([3, 64, 56, 56])
+    assert all_hidden_state[1].shape == torch.Size([3, 128, 28, 28])
+    assert all_hidden_state[2].shape == torch.Size([3, 320, 14, 14])
+    assert all_hidden_state[3].shape == torch.Size([3, 512, 7, 7])
+
+
+@pytest.fixture
+def input_tensor():
+    return torch.randn(2, 3, 16, 16)
+
+
+@pytest.fixture
+def metaformer_block():
+    return MetaFormerBlock(3)
+
+
+def test_metaformer_block_forward(metaformer_block, input_tensor):
+    output = metaformer_block(input_tensor)
+    assert output.shape == input_tensor.shape
diff --git a/tests/transformers/testMobileViT.py b/tests/transformers/testMobileViT.py
new file mode 100644
index 0000000..afb137c
--- /dev/null
+++ b/tests/transformers/testMobileViT.py
@@ -0,0 +1,40 @@
+import torch
+from torch import nn
+from transformers import MobileViTConfig, MobileViTModel
+
+from chameleon import MobileViT
+
+
+def test_init():
+    model = MobileViT()
+    assert isinstance(model, nn.Module)
+    assert isinstance(model.config, MobileViTConfig)
+    assert isinstance(model.model, MobileViTModel)
+
+
+def test_forward():
+    model = MobileViT()
+    input_tensor = torch.rand(1, 3, 224, 224)
+    all_hidden_state = model(input_tensor)
+    assert all_hidden_state[0].shape == torch.Size([1, 32, 112, 112])
+    assert all_hidden_state[1].shape == torch.Size([1, 64, 56, 56])
+    assert all_hidden_state[2].shape == torch.Size([1, 96, 28, 28])
+    assert all_hidden_state[3].shape == torch.Size([1, 128, 14, 14])
+    assert all_hidden_state[4].shape == torch.Size([1, 160, 7, 7])
+
+
+def test_list_pretrained_models():
+    models = MobileViT.list_models()
+    assert isinstance(models, list)
+    assert len(models) > 0
+
+
+def test_from_pretrained():
+    model = MobileViT.from_pretrained('apple/mobilevit-small')
+    input_tensor = torch.rand(1, 3, 224, 224)
+    all_hidden_state = model(input_tensor)
+    assert all_hidden_state[0].shape == torch.Size([1, 32, 112, 112])
+    assert all_hidden_state[1].shape == torch.Size([1, 64, 56, 56])
+    assert all_hidden_state[2].shape == torch.Size([1, 96, 28, 28])
+    assert all_hidden_state[3].shape == torch.Size([1, 128, 14, 14])
+    assert all_hidden_state[4].shape == torch.Size([1, 160, 7, 7])
diff --git a/tests/transformers/testPoolFormer.py b/tests/transformers/testPoolFormer.py
new file mode 100644
index 0000000..fb72169
--- /dev/null
+++ b/tests/transformers/testPoolFormer.py
@@ -0,0 +1,38 @@
+import torch
+from torch import nn
+from transformers import PoolFormerConfig, PoolFormerModel
+
+from chameleon import PoolFormer
+
+
+def test_init():
+    model = PoolFormer()
+    assert isinstance(model, nn.Module)
+    assert isinstance(model.config, PoolFormerConfig)
+    assert isinstance(model.model, PoolFormerModel)
+
+
+def test_forward():
+    model = PoolFormer()
+    input_tensor = torch.rand(1, 3, 224, 224)
+    all_hidden_state = model(input_tensor)
+    assert all_hidden_state[0].shape == torch.Size([1, 64, 56, 56])
+    assert all_hidden_state[1].shape == torch.Size([1, 128, 28, 28])
+    assert all_hidden_state[2].shape == torch.Size([1, 320, 14, 14])
+    assert all_hidden_state[3].shape == torch.Size([1, 512, 7, 7])
+
+
+def test_list_pretrained_models():
+    models = PoolFormer.list_models()
+    assert isinstance(models, list)
+    assert len(models) > 0
+
+
+def test_from_pretrained():
+    model = PoolFormer.from_pretrained('sail/poolformer_s12')
+    input_tensor = torch.rand(1, 3, 224, 224)
+    all_hidden_state = model(input_tensor)
+    assert all_hidden_state[0].shape == torch.Size([1, 64, 56, 56])
+    assert all_hidden_state[1].shape == torch.Size([1, 128, 28, 28])
+    assert all_hidden_state[2].shape == torch.Size([1, 320, 14, 14])
+    assert all_hidden_state[3].shape == torch.Size([1, 512, 7, 7])
diff --git a/tests/transformers/testViT.py b/tests/transformers/testViT.py
new file mode 100644
index 0000000..4a6fbfd
--- /dev/null
+++ b/tests/transformers/testViT.py
@@ -0,0 +1,34 @@
+import torch
+from torch import nn
+from transformers import ViTConfig, ViTModel
+
+from chameleon import ViT
+
+
+def test_init():
+    model = ViT()
+    assert isinstance(model, nn.Module)
+    assert isinstance(model.config, ViTConfig)
+    assert isinstance(model.model, ViTModel)
+
+
+def test_forward():
+    model = ViT()
+    input_tensor = torch.rand(1, 3, 224, 224)
+    cls_token, hidden_state = model(input_tensor)
+    assert cls_token.shape == torch.Size([1, 768])
+    assert hidden_state.shape == torch.Size([1, 196, 768])
+
+
+def test_list_pretrained_models():
+    models = ViT.list_models()
+    assert isinstance(models, list)
+    assert len(models) > 0
+
+
+def test_from_pretrained():
+    model = ViT.from_pretrained('google/vit-base-patch16-224')
+    input_tensor = torch.rand(1, 3, 224, 224)
+    cls_token, hidden_state = model(input_tensor)
+    assert cls_token.shape == torch.Size([1, 768])
+    assert hidden_state.shape == torch.Size([1, 196, 768])
diff --git a/tests/transformers/test_build_transformers.py b/tests/transformers/test_build_transformers.py
new file mode 100644
index 0000000..fe6d7f5
--- /dev/null
+++ b/tests/transformers/test_build_transformers.py
@@ -0,0 +1,18 @@
+import pytest
+import torch
+
+from chameleon import build_transformer, list_transformer
+from chameleon.transformers import (BASE_TRANSFORMER_NAMES, TRANSFORMER,
+                                    EfficientFormer, MetaFormer, MobileViT,
+                                    PoolFormer, ViT)
+
+
+def test_list_transformer():
+    models = list_transformer()
+    assert len(models) == len(TRANSFORMER)
+
+
+@pytest.mark.parametrize("model_name", BASE_TRANSFORMER_NAMES.keys())
+def test_build_transformer(model_name):
+    model = build_transformer(model_name)
+    assert isinstance(model, (ViT, MobileViT, PoolFormer, MetaFormer, EfficientFormer))
diff --git a/tests/transformers/test_utils_in_trans.py b/tests/transformers/test_utils_in_trans.py
new file mode 100644
index 0000000..115af28
--- /dev/null
+++ b/tests/transformers/test_utils_in_trans.py
@@ -0,0 +1,26 @@
+import pytest
+
+from chameleon import calculate_patch_size
+
+
+def test_calculate_patch_size():
+
+    # Test case 1
+    image_size = (256, 256)
+    num_patches = (4, 4)
+    expected_patch_size = (64, 64)
+    assert calculate_patch_size(image_size, num_patches) == expected_patch_size
+
+    # Test case 2
+    image_size = (512, 512)
+    num_patches = (8, 8)
+    expected_patch_size = (64, 64)
+
+    assert calculate_patch_size(image_size, num_patches) == expected_patch_size
+
+    # Test case 3 - invalid input
+    image_size = (512, 512)
+    num_patches = (7, 7)
+
+    with pytest.raises(ValueError):
+        calculate_patch_size(image_size, num_patches)