-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: _inplace_paired_L2() missing 2 required positional arguments: 'A' and 'B' #312
Comments
Looks like either Is In any case, we should do better checking to surface a less-opaque error message. |
Yep, I have tried that . Replaced last 2 lines with this: grid_lmnn_knn.fit(np.array(X_train),np.array(y_train))
grid_lmnn_knn.score(np.array(X_test), np.array(y_test)) Any other thoughts you can think of? |
I reproduced the issue locally, and it turns out that I haven't verified yet, but I suspect the new LMNN implementation coming in gh-309 will solve this for you. We should also make sure we add test coverage for the no-impostors case. |
Sounds good. I will patiently wait for that. If you have any workaround until then, let me know as I have to present on my findings by next Wednesday to my research group lol |
Here's a workaround. It just bails out entirely if no impostors can be found: Not super elegant, but it should work okay. |
Thank you for your work @perimosocordiae . We are trying to apply metric learning to the materials sciences space, but trying to apply this work to Friedman dataset first before going all in on the diffusion datasets. Super weird that my pipeline is now not predicting any of the 0s correctly in my test set. I am not sure if this is a correct approach. In case you were interested in our use case: my PI is telling me to frame a classification problem with the Friedman dataset. Categorize the dataset and set samples where x0 < 0.6 to 1 (sample is within domain), else 0. Then apply metric-learning to that and see if it performs well. Code so far: from sklearn.datasets import make_friedman1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def friedman_np_to_df(X,y):
return pd.DataFrame(X,columns=['x0','x1', 'x2', 'x3', 'x4']), pd.Series(y)
# Make training set
X_train, NA = make_friedman1(n_samples=1000, n_features=5, random_state = 1) #dont care about Y so call it NA
X_train, NA = friedman_np_to_df(X_train,NA)
#categorize training set based off of x0
domain_list = []
for i in range(len(X_train)):
if X_train.iloc[i]['x0'] < 0.6 :
domain_list.append(1)
else:
domain_list.append(0)
X_train['domain'] = domain_list
# Set training set to where domain == 1 (x0 < 0.6)
X_train = X_train[X_train['domain']==1]
y_train = X_train.copy()
X_train = X_train.drop(columns = ['domain'])
y_train = y_train['domain']
# Make testing set with a different random_state
X_test, NA2 = make_friedman1(n_samples=1000, n_features=5, random_state = 3)
X_test, NA2 = friedman_np_to_df(X_test,NA2)
#categorize testing set based off of x0
domain_list = []
for i in range(len(X_test)):
if X_test.iloc[i]['x0'] < 0.6:
domain_list.append(1)
else:
domain_list.append(0)
X_test['domain'] = domain_list
y_test = X_test['domain'].copy()
X_test = X_test.drop(columns = ['domain'])
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from metric_learn import LMNN
lmnn_knn = Pipeline(steps=[('lmnn', LMNN()), ('knn', KNeighborsClassifier())])
parameters = {'lmnn__init': ['pca', 'lda', 'identity', 'random'],
'lmnn__k':[2,3],
'knn__n_neighbors':[2,3],
'knn__weights': ['uniform','distance'],
'knn__algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
'knn__leaf_size': [x for x in np.arange(1,30,5)],
'knn__metric': ['euclidian', 'manhattan', 'mahalanobis', 'seuclidian', 'minkowski']}
grid_lmnn_knn = GridSearchCV(lmnn_knn, parameters,cv = 3, n_jobs=-1, verbose=True, scoring='f1')
grid_lmnn_knn.fit(np.array(X_train),np.array(y_train))
# grid_lmnn_knn.score(np.array(X_test), np.array(y_test))
predictions = grid_lmnn_knn.predict(X_test)
print(grid_lmnn_knn.best_estimator_)
from sklearn.metrics import classification_report
print(classification_report(y_test, predictions))
output is:
So yeah. Not sure why f1-score is 0 for my 0 cases. Maybe I am doing it wrong haha. |
So I think the reason why my last attempt had bad performance on the Friedman dataset was because there were no examples of 0-labeled data in the training set. Now, I include samples of 0-labeled data in the training set. @perimosocordiae I found another issue with your branch: from sklearn.datasets import make_friedman1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def friedman_np_to_df(X,y):
return pd.DataFrame(X,columns=['x0','x1', 'x2', 'x3', 'x4']), pd.Series(y)
# Make training set
X_train, NA = make_friedman1(n_samples=1000, n_features=5, random_state = 1) #dont care about Y so call it NA
X_train, NA = friedman_np_to_df(X_train,NA)
#categorize training set based off of x0
domain_list = []
for i in range(len(X_train)):
if X_train.iloc[i]['x0'] < 0.6 :
domain_list.append(1)
else:
domain_list.append(0)
X_train['domain'] = domain_list
# Set training set to where domain == 1 (x0 < 0.6)
out_of_domain = X_train[X_train['domain'] == 0][:60]
X_train = X_train[X_train['domain']==1]
X_train = pd.concat([out_of_domain, X_train])
y_train = X_train.copy()
X_train = X_train.drop(columns = ['domain'])
y_train = y_train['domain']
# Make testing set with a different random_state
X_test, NA2 = make_friedman1(n_samples=1000, n_features=5, random_state = 3)
X_test, NA2 = friedman_np_to_df(X_test,NA2)
#categorize testing set based off of x0
domain_list = []
for i in range(len(X_test)):
if X_test.iloc[i]['x0'] < 0.6:
domain_list.append(1)
else:
domain_list.append(0)
X_test['domain'] = domain_list
y_test = X_test['domain'].copy()
X_test = X_test.drop(columns = ['domain'])
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from metric_learn import LMNN
lmnn_knn = Pipeline(steps=[('lmnn', LMNN()), ('knn', KNeighborsClassifier())])
parameters = {'lmnn__init': ['pca', 'lda', 'identity', 'random'],
'lmnn__k':[2,3],
'knn__n_neighbors':[2,3],
'knn__weights': ['uniform','distance'],
'knn__algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
'knn__leaf_size': [x for x in np.arange(1,30,5)],
'knn__metric': [ 'manhattan', 'mahalanobis', 'minkowski']}
grid_lmnn_knn = GridSearchCV(lmnn_knn, parameters,cv = 5, n_jobs=-1, verbose=True, scoring='f1')
grid_lmnn_knn.fit(np.array(X_train),np.array(y_train))
grid_lmnn_knn.score(np.array(X_test), np.array(y_test))
Output:
|
Tried swapping it to the following, but it now goes into an infinite loop 😂 if not impostors.any():
return None, 0, 0 |
Sorry, try `if len(impostors) == 0:` instead.
…On Sun, Apr 4, 2021 at 2:31 PM Angelo Cortez ***@***.***> wrote:
Tried swapping it to the following, but it now goes into an infinite loop
😂
if not impostors.any():
return None, 0, 0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#312 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUHJ3SQS4227W2QJ7FR4DTHCWARANCNFSM42FZLVAQ>
.
|
Were you able to try the code from gh-309? I'm curious to see how it would handle this case. |
The code from the PR doesn't work as well 🤣I will post my results on that thread. |
Description
I get this error TypeError: _inplace_paired_L2() missing 2 required positional arguments: 'A' and 'B'
Steps/Code to Reproduce
Example:
Expected Results
Example: No error is thrown. Score is calculated
Actual Results
Versions
Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
Python 3.7.10 (default, Feb 20 2021, 21:17:23)
[GCC 7.5.0]
NumPy 1.19.5
SciPy 1.4.1
Scikit-Learn 0.22.2.post1
Metric-Learn 0.6.2
The text was updated successfully, but these errors were encountered: