Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added skeleton of package for python support #81

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions data-extraction/pyDataExtraction/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
import pyDataExtraction.commonTypes
16 changes: 16 additions & 0 deletions data-extraction/pyDataExtraction/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import json
from typing import Union, Dict, Optional
from abc import ABC

from pyDataExtraction.commonTypes.Graph import Graph

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add newline

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can run flake8(for linting) and black(code formatting) in local(console) to do all such changes automatically and consistency will be maintained for all the contributors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently use black for formatting, though I've never used flake8, nor have I used black from the commandline. I know there's a precommit plugin for git that would call both to on attempted commits to make sure that they are used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just have to install black in local(or in the virtual environment of python) and run black <file_name> This will do the formatting for the file.
And about flake8, It's just that we can have a consistent style guide for python over this project which follows pep-8 guidelines.
As some of the comments by @mvoitko is regarding this only.
Just a suggestion 😄

if __name__ == "__main__":
# NOTE this will likely be what will be called by the PyEvaluation Engine,
# this will be where this library interfaces with the existing codebase

# TODO: implement testing for each dataType, it may be a good idea to compare
# with output of typescript extractors
graph_data1 = {"A": ["B", "C"], "B": ["A", "C"], "C": ["A", "D"], "D": ["A"]}
graph_data2 = {1: [2, 3], 2: [1, 3], 3: [1, 4], 4: [1]}
graph = Graph(graph_data1)
print(graph)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use logging with respective log level

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. I was mainly using __main__ to verify that the graph implementation works as I was writing it, though that should be move to a separate directory for testing, as seems to be the standard practice for python libraries.

I'm not sure if we could use the same test directory as node without pytest throwing errors due to the presence of js files. my experience with pytest and how it works is limited.

69 changes: 69 additions & 0 deletions data-extraction/pyDataExtraction/commonTypes/Graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
from typing import Union, Dict, Optional
from pyDataExtraction.commonTypes.base import DataType

# from docs
"""
interface NodeGraphData {
id: string;
label?: string;
color?: string;
shape?: "ellipse" | "box";
}
"""
"""
interface EdgeGraphData {
from: string;
to: string;
label?: string;
id?: string;
color?: string;
dashes?: boolean;
}
"""
# NOTE: may not be able to encapsulate edge in separate class due to from being a syntax token
"""
class Edge:

def __init__(self,from: str, to: str,):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space after comma

self.fromnode
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use self.from_node


def __repr__(self):
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have an empty method?

Copy link
Author

@skewballfox skewballfox Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it needs to be implemented. This was mainly meant to be a skeleton of a library, make it easier for people to contribute to the development of one set of dataTypes. Also we can't use the dataType's __repr__ method for Edge or Node.

for one, json complains Node objects aren't serializable when calling __repr__ on graph.

two, from is a syntax token, and also what is expected to be inside the json representation of a graph. so in Edge's case in particular, it's going to require us to either:

  1. do something hacky to use from as a keyword (not my preferred option),
  2. parse the json representation of it's dict and change every instance of whatever variable we used to represent from
  3. handle all the edge related stuff inside graph(would rather avoid that)

"""
# NOTE: ran into issue Node object is not json serializable when ecapsulating in own class
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a method in Node class to serialize the data into json.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'm just now starting to actually learn about serialization due to this project and a personal rust project.

could you point me in the direction of some documentation on how to do so?

Copy link

@insolor insolor Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skewballfox you can simply create a dictionary of the desired form and then serialize it into JSON:

def to_json(self):
    return json.dumps({'from': self.from_node, 'to': self.to})

(this is an example for the Edge class, you can do this for Node class too)

This is a crude approach, which is inconvenient for classes with many fields, but in this case, I think it is quite applicable. Also such a way you can use from_node field name for the 'from' JSON key.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_node would also be more readable.

Copy link
Author

@skewballfox skewballfox Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@insolor this is similar to what I was doing in __repr__ but that definitely helps fix that issue in a way that doesn't involve mangling the language. I don't know why overloading from was my first guess.

also I think I figured out why Node wasn't serializable. It was because the __repr__ function doesn't seem to be recursive. as an example When calling print(graph) that doesn't automatically behave as if print(node) were called.

either I could come up with a generic method that does this in DataType to avoid having to define it in every current and future dataType that uses other objects as components(if present), or we have to overload the function in those cases.

I think I might have an idea using __slots__, I'll have to see though

"""
class Node(DataType):
def __init__(self, id: Union[int, str], label: Optional[str] = None):
super().__init__()
self.id = id
if label is None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest simplifying to:
self.lable = lable or id

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is actually a hella useful feature I didn't know about. Thanks for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, given the implementation, and what we are passing the information to, this likely wouldn't work, as the attribute would still be listed in the json output(at least with the method currently used to produce json.

I'm not sure if this would be the case, but I'm trying to avoid all of the nodes being labeled None in the visualizer. So, I set it up to where attributes were only present in the case where a value was assigned.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example? I'm not able to understand.
As from what I think if we go with
self.label = label or id
If anyone doesn't pass label the default is None and the self.label will be populated by id.
Correct?

self.label = id
else:
self.label = label
"""


class Graph(DataType):
"""An implementation of the Graph data type for the visualizer

Args:
DataType (Union[Dict[str, list], Dict[int, list]]):
either expects a dictionary with a list as values or a 2d array
some representation of a basic graph
"""

def __init__(self, graph_data: Union[Dict[str, list], Dict[int, list]]):
super().__init__()
self.kind["graph"] = True
# TODO get working for both a dictionary and an nxn array
self.nodes = []
self.edges = []
if isinstance(graph_data, dict):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not check the type here. We expect graph data to be a dictionary otherwise we should have exception not silence it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, no, the reason being is not counting classes for graphs(just too much trouble to worry about at this moment), you can have a graph represented as a dict or 2d array.

I intended after getting the dictionary implementation working to check else isinstance(graph_data, list) or something similar for explicitly a list of list.

for node in graph_data:
self.nodes.append({"id": str(node)})
# TODO change prints to log statements
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a too minor change to left a TODO for it and not implement it right away

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's partially because my experience implementing logs across a library is none, I've only implemented them in a single file before, and didn't try to make it where the log statement was different across classes.

# print("node: ", node)
# print("edges: ", graph_data[node])
for edge in graph_data[node]:
# print("edge: ", graph_data[node][edge_i])
self.edges.append({"from": node, "to": edge})
27 changes: 27 additions & 0 deletions data-extraction/pyDataExtraction/commonTypes/Grid.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
from pyDataExtraction.commonTypes.base import DataType

"""
export interface Grid {
kind: { array: true };
columnLabels?: { label?: string }[];
rows: {
label?: string;
columns: {
content?: string;
tag?: string;
color?: string;
}[];
}[];
markers?: {
id: string;

row: number;
column: number;
rows?: number;
columns?: number;

label?: string;
color?: string;
}[];
}
"""
15 changes: 15 additions & 0 deletions data-extraction/pyDataExtraction/commonTypes/Plotly.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from pyDataExtraction.commonTypes.base import DataType

"""
export interface Plotly {
kind: { plotly: true };
data: Partial<Plotly.Data>[];
}
// See plotly docs for Plotly.Data.
"""


class Plotly(DataType):
def __init__(self, data):
super().__init__()
self.kind["plotly"] = True
64 changes: 64 additions & 0 deletions data-extraction/pyDataExtraction/commonTypes/Text.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
from typing import Optional
from pyDataExtraction.commonTypes.base import DataType

"""
interface Text {
kind: { text: true };
text: string;
mimeType?: string;
fileName?: string;
}
"""


class Text(DataType):
def __init__(
self,
text_data: str,
mimeType: Optional[str] = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep it to the pep-8 python style guide:

mime_type,
file_name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable names are mainly because of the use case and the method currently for doing so.

right now I'm directly converting the respective dataTypes to json objects when they are printed or called as a string, and variables are the displayed variables inside the json string, so I've been matching the case of the specification listed on the Readme.

Though since we already need to do some manipulations on the json representation before returning in the case of the Edge class, doing so here shouldn't be a major issue.

btw if we got rid of the reliance on __dict__, we could use __slots__ to slightly improve performance and reduce memory usage.

fileName: Optional[str] = None,
):
super().__init__()

self.kind["text"] = True
self.text = text_data
if mimeType is not None:
self.mimeType = mimeType
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you trying to achieve here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that was probably a missing not

if fileName is None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you wanted the opposite:

if file_name:
     self.file_name = file_name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be when dealing with edge cases far removed from this or a leftover from python 2, but from what I keep reading, using is None seems to be preferred when dealing with values that may be None

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, while editing this, I just remembered why I checked. because this library currently relies on json.dumps(self.__dict__) to print the json representation of an object in a format the plugin expects, I was only conditionally instantiating these variables.

they weren't part of the object unless they were explicitly added to the object.

this may be another reason to either find a better method of creating json objects, or use super to get the inherited json representation and return a version altered in some way.

self.fileName = fileName


"""
interface Svg extends Text {
kind: { text: true; svg: true };
}
"""


class Svg(Text):
def __init__(
self,
text_data: str,
mimeType: Optional[str] = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please name params in snake case:
mime_type, file_name

fileName: Optional[str] = None,
):
self.kind["svg"] = True
super().__init__(text_data, mimeType, fileName)


"""
interface DotGraph extends Text {
kind: { text: true; dotGraph: true };
}
"""


class DotGraph(Text):
def __init__(
self,
text_data: str,
mimeType: Optional[str] = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use snake case:
mime_type, file_name

fileName: Optional[str] = None,
):
self.kind["dotGraph"] = True
super().__init__(text_data, mimeType, fileName)
8 changes: 8 additions & 0 deletions data-extraction/pyDataExtraction/commonTypes/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from typing import Union, Dict, Optional
from abc import ABC, abstractmethod

from pyDataExtraction.commonTypes import base
from pyDataExtraction.commonTypes import Graph
from pyDataExtraction.commonTypes import Grid
from pyDataExtraction.commonTypes import Plotly
from pyDataExtraction.commonTypes import Text
24 changes: 24 additions & 0 deletions data-extraction/pyDataExtraction/commonTypes/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import json
from abc import ABC

# import logging
# TODO fix the import structure
# TODO figure out how to set up logging across a python library


class DataType(ABC):
"""Abstract class for all supported dataTypesdataTypes

Args:
object ([type]): [description]
"""

def __init__(self):
self.kind = {}
super().__init__()

def __repr__(self):
"""returns json object format when printed or using str()
"""
return json.dumps(self.__dict__)

Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
import {
getExpressionForDataExtractorApi,
DataResult,
ApiHasNotBeenInitializedCode,
getExpressionToInitializeDataExtractorApi,
DataExtractionResult,
} from "@hediet/debug-visualizer-data-extraction";
import { EnhancedDebugSession } from "../../debugger/EnhancedDebugSession";
import {
EvaluationEngine,
Evaluator,
EvaluationArgs,
} from "./EvaluationEngine";
import { FormattedMessage } from "../../webviewContract";
import { registerUpdateReconciler, hotClass } from "@hediet/node-reload";

registerUpdateReconciler(module);

@hotClass(module)
export class PyEvaluationEngine implements EvaluationEngine {
createEvaluator(session: EnhancedDebugSession): Evaluator | undefined {
const supportedDebugAdapters = [
"python",

];
if (supportedDebugAdapters.indexOf(session.session.type) !== -1) {
return new PyEvaluator(session);
}
return undefined;
}
}

class PyEvaluator implements Evaluator {
public readonly languageId = "python";

constructor(private readonly session: EnhancedDebugSession) { }

private getContext(): "copy" | "repl" {
if (this.session.session.type.startsWith("pwa-")) {
return "copy";
}
return "repl";
}

public async evaluate({
expression,
preferredExtractorId,
frameId,
}: EvaluationArgs): Promise<
| { kind: "data"; result: DataExtractionResult }
| { kind: "error"; message: FormattedMessage }
> {
while (true) {
try {
const preferredExtractorExpr = preferredExtractorId
? `"${preferredExtractorId}"`
: "undefined";

const body = `${getExpressionForDataExtractorApi()}.getData(
e => (${expression}),
expr => eval(expr),
${preferredExtractorExpr}
)`;

const wrappedExpr = `
(() => {
try {
return ${body};
} catch (e) {
return JSON.stringify({
kind: "Error",
message: e.message,
stack: e.stack
});
}
})()
`;

const reply = await this.session.evaluate({
expression: wrappedExpr,
frameId,
context: this.getContext(),
});
const resultStr = reply.result;
const jsonData =
this.getContext() === "copy"
? resultStr
: resultStr.substr(1, resultStr.length - 2);
const result = JSON.parse(jsonData) as DataResult;

if (result.kind === "NoExtractors") {
throw new Error("No extractors");
} else if (result.kind === "Error") {
throw new Error(result.message);
} else if (result.kind === "Data") {
return {
kind: "data",
result: result.extractionResult,
};
} else {
throw new Error("Invalid Data");
}
} catch (error) {
const msg = error.message as string | undefined;
if (msg && msg.includes(ApiHasNotBeenInitializedCode)) {
if (await this.initializeApi(frameId)) {
continue;
}
}

return {
kind: "error",
message: error.message,
};
}
}
}

private async initializeApi(frameId: number | undefined): Promise<boolean> {
try {
// prefer existing is true, so that manually registered (possibly newer) extractors are not overwritten.
const expression = `${getExpressionToInitializeDataExtractorApi()}.registerDefaultExtractors(true);`;

await this.session.evaluate({
expression,
frameId,
context: this.getContext(),
});

return true;
} catch (error) {
return false;
}
}
}