Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification process #23

Open
harelber opened this issue Nov 26, 2022 · 1 comment
Open

Classification process #23

harelber opened this issue Nov 26, 2022 · 1 comment

Comments

@harelber
Copy link

Hi Hugo,
Regarding Adagio, do you remember how to run the classification process on the extracted graphs?
I extracted the graphs using the -f/-p flags. However, there is no documentation how to actually run the train and test of an ML on these graphs. I assume that the common directory holds the functions, but I don't find the sequence.
I will be happy if you can add some instructions/script on this matter.

@hgascon
Copy link
Owner

hgascon commented Nov 29, 2022

Hi @harelber you can instantiate an Analysis object:

In [1]: from adagio.core.analysis import Analysis

In [2]: Analysis?
Init signature:
Analysis(
    dirs,
    labels,
    split,
    max_files=0,
    max_node_size=0,
    precomputed_matrix='',
    y='',
    fnames='',
)
Docstring:      A class to run a classification experiment
Init docstring:
The Analysis class allows to load sets of pickled graoh objects
from different directories where the objects in each directory
belong to different classes. It also provide the methods to run
different types of classification experiments by training and
testing a linear classifier on the feature vectors generated
from the different graph objects.

:dirs: A list with directories including types of files for
    classification e.g. <[MALWARE_DIR, CLEAN_DIR]> or just
    directories with samples from different malware families
:labels: The labels assigned to samples in each directory.
    For example a number or a string.
:split: The percentage of samples used for training (value
    between 0 and 1)
:precomputed_matrix: name of file if a data or kernel matrix
    has already been computed.
:y: If precomputed_matrix is True, a pickled and gzipped list
    of labels must be provided.
:returns: an Analysis object with the dataset as a set of
    properties and several functions to train, test, evaluate
    or run a learning experiment iteratively.

And then run the different experiments in the Analysis class:

In [3]: a = Analysis(...)
[...]
In [4]: a.run_linear_experiment(...)

There are other helping functions in the Analysis class that you can experiment with. Let me know if that works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants