Reimplementing something is one of the best way to learn. This is a minimalistic implementation of Kedro (inspired by minGPT by Karpathy).
The repository is a clone of the standard spaceflights repository with some additional files:
Script: run.py
Module: `src/minikedro/v2/init.py
- run_v1.py
- `src/minikedro/v1/init.py
In each step, you would run different versions of run_vx.py
and it will use the class defined in minikedro.vx.py
. For example, when you are running run_v1.py
, you should read src/minikedor_v1/__init__.py
Each step I try to refactor part of the script and introduce a new feature from Kedro. By the end you will get something resemble Kedro with a basic set of features.
There is also a dataset.ipynb
to demo how dataset works (out of scope)
First, clone this repository and install the package with:
git clone https://github.com/noklam/miniKedro.git
pip install -e .
Run python run.py
and the following scripts and observe the difference between each versions.
The repository is created incrementally, so you can view the changes clearly from the commit history: https://github.com/noklam/miniKedro/commits/main/
The best way to understand the changes are open this repository in an IDE and compare the diff between different versions. For example, this is a screenshot of comparing run_v1.py
and run_v2.py
:
The basic structure of a script without any Kedro
- Extract configuration to a dictionary.
- Replace shared config with template value
${_base_folder}
withConfigLoader
dataset.ipynb
to explain howkedro-datasets
works and demonstrate the I/O capability (save
andload
interface)- Add
DataCatalog
with the ability toload
andsave
- Extract Configuration to a
steps
as a list of dictionary - Save the intermediate data as dataset too
- Make use of the
steps
variable and loop through the steps.
- Introduce a thin
list
wrapper calledpipeline
andstep
wrapper callednodes
- Rename
steps
asnodes
,step
asnode_
- Replace
pipeline
fromkedro
withminikedro.v7
- Replace the configuration dictionary and load it from
base/conf/catalog.yml
instead. - Replace the
pipeline
from script and move it topipeline.py