-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
take() got an unexpected keyword argument 'axis' #84
Comments
Hi @JiaLeXian, thanks for reporting! I will take a look at vaex when get a moment. For now, you can manually convert your data into the form of numpy array in order to use deep fprest. |
It looks like vaex does not support slicing (vaexio/vaex#911), which is an essential operation in deep forest, e.g., bootstrap sampling when building random forests. At least for now, this problem cannot be solved :-( Thanks for reporting anyway. |
Hi @xuyxu, thanks for investigating the problem. Appreciated! So, for DF, it's best to use numpy array or original pandas dataframe? In our case, we have more than 100 million rows of data. That's why we use vaex to load the data to reduce memory occupation. We still want to try DF on our dataset. We will explore other ways to try. Thank you! |
Could you take a look at Besides, feel free to tell me if you have any problem when trying out this solution ;-). We are willing to further improve the functionality of DF when faced with such large datasets. |
@xuyxu thanks for the quick reply. Thanks for suggesting numpy.memmap. We will try this option in the following days. Will keep you posted. Thank you! |
Got error with code:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deepforest import CascadeForestClassifier
model = CascadeForestClassifier(random_state=1)
model.fit(X_train, y_train)
TypeError Traceback (most recent call last)
in
6
7 model = CascadeForestClassifier(random_state=1)
----> 8 model.fit(X_train, y_train.values.ravel())
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/cascade.py in fit(self, X, y, sample_weight)
1395 y = self._encode_class_labels(y)
1396
-> 1397 super().fit(X, y, sample_weight)
1398
1399 def predict_proba(self, X):
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/cascade.py in fit(self, X, y, sample_weight)
754
755 # Bin the training data
--> 756 X_train_ = self.bin_data(binner, X, is_training_data=True)
757 X_train_ = self.buffer_.cache_data(0, X_train_, is_training_data=True)
758
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/cascade.py in _bin_data(self, binner, X, is_training_data)
665 tic = time.time()
666 if is_training_data:
--> 667 X_binned = binner.fit_transform(X)
668 else:
669 X_binned = binner.transform(X)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
697 if y is None:
698 # fit method of arity 1 (unsupervised transformation)
--> 699 return self.fit(X, **fit_params).transform(X)
700 else:
701 # fit method of arity 2 (supervised transformation)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/_binner.py in fit(self, X)
128 self.validate_params()
129
--> 130 self.bin_thresholds = _find_binning_thresholds(
131 X,
132 self.n_bins - 1,
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/_binner.py in _find_binning_thresholds(X, n_bins, bin_subsample, bin_type, random_state)
75 if n_samples > bin_subsample:
76 subset = rng.choice(np.arange(n_samples), bin_subsample, replace=False)
---> 77 X = X.take(subset, axis=0)
78
79 binning_thresholds = []
TypeError: take() got an unexpected keyword argument 'axis'
Dataset is loaded with vaex, is this a problem particular for vaex?
The text was updated successfully, but these errors were encountered: