Pocket Network Machine Learning Test-Bench

This repository contains tools for deploying and measure Machine Learning (ML) nodes staked in the Pocket Network. The code available here was created under the POKT AI Lab socket.

Guides

Model Deployment
- Large Language Models
- Diffuser Models
Local-Net - ML RPCs Testing
- Morse Local-Net
Live Metrics Testing - Pocket Test-Bench
- Deployment Files

The Test-Bench

The Pocket Network test bench is an environment used to verify the correctness or soundness of an staked model, live on the network. It works by streamlining the tracking, sampling and execute tasks in a performant and scalable way. Each of these tasks is a instance of a particular metric, for example, the GLUE dataset. The architecture of the project is thought to be agnostic of the task to perform, and easily extendable. The test bench follows the structure presented in the following image: As it can be seen the test-bench has four main blocks (each a different App), that work together to track the task's scores of each of the Pocket Nodes. Briefly, the apps do the following:

Manager : Keeps the records of each node's scores. It checks for new nodes, reviews the age and statistics of the task scores and requests more tasks to be executed. If there are finalized tasks (resulting from the Evaluator) it adds them to the node score tracking database.
Sampler : Checks for Manager requests and prepares the tasks to be done. In order to do that it keeps track of available datasets (if needed) and sample from them. The result of this App is a generic call request that is correct for the Pocket Service but independent of the task.
Requester : It controls the relays done. Using the provided Pocket Network App Keys it checks the current sessions and looks for nodes that have pending tasks requests (generated by the Sampler). When it finds a match, it performs the relays against the nodes and saves the raw answer.
Evaluator : Retrieves the responses of the Requester and finds the originating task requests, then it calculates the appropriate metrics and writes the resulting values.

The Apps are all coordinated using Temporal IO, with the Manager and Requester being recurrent workflows and Sampler and Evaluator being triggered by the Manager and Requester respectively. The datasets are stored using PostgreSQL since it is the most effective way to handle datasets from the LMEH test suite which is the first to be implemented. The data communication between apps is done via MongoDB, which is also the holder of the nodes collection, the one with the resulting scores for each tested node.

For more details on how the Apps interact, please read the Apps Readme.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
.idea		.idea
apps		apps
assets		assets
docker-compose		docker-compose
model-deployment		model-deployment
morse-localnet-poc		morse-localnet-poc
packages		packages
reports		reports
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requeriments.txt		requeriments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pocket Network Machine Learning Test-Bench

Guides

The Test-Bench

About

Releases

Packages

Contributors 4

Languages

License

pnyxai/pocket-ml-testbench

Folders and files

Latest commit

History

Repository files navigation

Pocket Network Machine Learning Test-Bench

Guides

The Test-Bench

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages