This is tutorial for training a PyTorch transformer from scratch
There are many tutorials for how to train a transformer, including pytorch official tutorials while even the official tutorial only contains "half" of it -- it only trains the encoder part
and there are some other tutorials which uses some fake data just for demo, or they don't provide the complete code.
just run main.py to start training.
python main.py
for inferencing test, run
python inference.py
If you are using windows or mac, you can use PyCharm IDE, and right click to run main.py
This is a small Transformer architecture model with 8 million parameters When training a transformer model, typically we customize a wrapper model class for the pytorch library implemented Transformer.
it usually contains of:
- the embedding and positioning module
- the pytorch implemented transformer module
- the output linear module based on you tokenizer length
This is decided what kind of LLM we want to implement, here we don't use the torch rand data or some simple data. we try to solve a real problem with our LLM, thought this problem can be done with other methods
the problem is:
give you a number, more precisely, a rational number. output the corresponding English words of this number.
Number | English words |
---|---|
5000 | five thousand |
-92101602.123 | minus ninety two million , one hundred and one thousand , six hundred and two point one two three |
we regard the number as the input string, and the English words as the output string
This is actually a Translation task.
I used 10k training data, batch size 16, lr init with 1e-4, and StepLR gamma 0.1 with step size 1, trained 4 epochs. get these results. althouugh just one correct answer, seems the results are real valid numbers.
1235678 --> five million , six hundred and twenty eight thousand , seven hundred and thirteen
200.3236 --> thirty two thousand , three hundred and sixty point zero
-2000 --> minus two thousand and twenty
10000001 --> ten million , one hundred thousand and
66666666 --> sixty six million , six hundred and sixty six thousand , six hundred and sixty six
-823982502.002 --> minus two hundred and twenty two million , eight hundred and twenty thousand , nine hundred and eighty point zero three five
987654321.12 --> two hundred and twenty one million , nine hundred and sixty five thousand , four hundred and eighty three point one seven
3295799.9873462 --> nine hundred and ninety three million , two hundred and ninety seven thousand , four hundred and sixty five point three eight two seven
这是一个从头开始训练PyTorch transformer的教程。
有很多关于如何训练transformer的教程,包括PyTorch官方教程,但即使是官方教程也只包含了一半内容——它只训练了编码器部分。
还有一些其他教程使用一些假数据仅用于演示,或者他们没有提供完整的代码。
只需运行main.py即可开始训练。
python main.py
对于推理测试,运行
python inference.py
如果您使用的是Windows或Mac,可以使用PyCharm IDE,并右键运行main.py。
这是一个小型的Transformer架构模型,拥有800万个参数。在训练transformer模型时,通常我们会为PyTorch库实现的Transformer定制一个包装模型类。
它通常包含以下内容:
嵌入和定位模块 PyTorch实现的transformer模块 基于你的tokenizer长度的输出线性模块
这决定了我们想要实现什么样的LLM,这里我们不使用torch rand数据或一些简单的数据。我们尝试用我们的LLM解决一个真实的问题,尽管这个问题可以用其他方法解决。
问题是:
给你一个数字,更准确地说,是一个有理数。输出这个数字对应的英文单词。
Number | English words |
---|---|
5000 | five thousand |
-92101602.123 | minus ninety two million , one hundred and one thousand , six hundred and two point one two three |
我们将数字视为输入字符串,将英文单词视为输出字符串。
这实际上是一个翻译任务。
第一次训练 我使用了1万个训练数据,批量大小为16,学习率初始值为1e-4,StepLR gamma为0.1,步长为1,训练了4个周期。得到了这些结果。尽管只有一个正确答案,但看起来结果是真实有效的数字。
1235678 --> five million , six hundred and twenty eight thousand , seven hundred and thirteen
200.3236 --> thirty two thousand , three hundred and sixty point zero
-2000 --> minus two thousand and twenty
10000001 --> ten million , one hundred thousand and
66666666 --> sixty six million , six hundred and sixty six thousand , six hundred and sixty six
-823982502.002 --> minus two hundred and twenty two million , eight hundred and twenty thousand , nine hundred and eighty point zero three five
987654321.12 --> two hundred and twenty one million , nine hundred and sixty five thousand , four hundred and eighty three point one seven
3295799.9873462 --> nine hundred and ninety three million , two hundred and ninety seven thousand , four hundred and sixty five point three eight two seven