Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add B2Share support #88 #89

Merged
merged 28 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
820bb3a
Fix #75
micafer May 30, 2024
04d5468
Merge pull request #1 from micafer/fix_75
micafer May 30, 2024
07db0ac
Update README.md
micafer May 31, 2024
3243ea4
Enable to remove the last slash in the osf url
micafer May 31, 2024
06bda68
Add support for providers in OSF: #69
micafer Jun 3, 2024
442bca3
Merge pull request #2 from micafer/osf_providers
micafer Jun 3, 2024
b0d0c28
Fix style
micafer Jun 3, 2024
f842007
Add support to filter files
micafer Jun 4, 2024
dd99815
Add support to filter files
micafer Jun 4, 2024
d9ad19b
Merge pull request #3 from micafer/filter
micafer Jun 4, 2024
6c00ebd
Add support to SeaNoe
micafer Jul 18, 2024
31906b5
Implements data.europa.eu support
micafer Sep 18, 2024
257afa4
merge upstream
micafer Sep 18, 2024
4a8c560
Merge branch 'seanoe'
micafer Sep 18, 2024
16d2518
merge data_europa_eu
micafer Sep 18, 2024
62698ef
Minor changes
micafer Sep 20, 2024
659e2d3
Add B2Share support #88
micafer Sep 24, 2024
21e89a1
Merge branch 'main' into b2share
micafer Sep 24, 2024
9be54fd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 24, 2024
92da69a
Merge branch 'main' into b2share
micafer Sep 30, 2024
67ed8fa
Revert change
micafer Sep 30, 2024
7762af9
Revert "Add support to filter files"
micafer Sep 30, 2024
35c87cd
Revert "Add support to filter files"
micafer Sep 30, 2024
3e47ef2
Order servicesç
micafer Sep 30, 2024
d8a4a49
Apply suggestions from code review
J535D165 Sep 30, 2024
7c418ae
Merge branch 'main' into b2share
J535D165 Sep 30, 2024
123ce7d
Fix file link
J535D165 Sep 30, 2024
f9dda21
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions datahugger/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from datahugger.services import ArXivDataset
from datahugger.services import B2shareDataset
from datahugger.services import DataDryadDataset
from datahugger.services import DataEuropaDataset
from datahugger.services import DataOneDataset
Expand Down Expand Up @@ -118,6 +119,7 @@
"trolling.uit.no": DataverseDataset,
"www.sodha.be": DataverseDataset,
"www.uni-hildesheim.de": DataverseDataset,
"b2share.eudat.eu": B2shareDataset,
"data.europa.eu": DataEuropaDataset,
}

Expand Down
22 changes: 21 additions & 1 deletion datahugger/services.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ class MendeleyDataset(DatasetDownloader):
class OSFDataset(DatasetDownloader):
"""Downloader for OSF repository."""

REGEXP_ID = r"osf\.io\/(?P<record_id>.*)/"
REGEXP_ID = r"osf\.io\/(?P<record_id>[^\/]*)\/{0,1}"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting catch. I'm fine with keeping this in this PR.


# the base entry point of the REST API
API_URL = "https://api.osf.io/v2/nodes/"
Expand Down Expand Up @@ -425,3 +425,23 @@ class SeaNoeDataset(DatasetDownloader):
ATTR_SIZE_JSONPATH = "size"
ATTR_HASH_JSONPATH = "checksum"
ATTR_HASH_TYPE_VALUE = "sha256"


class B2shareDataset(DatasetDownloader):
"""Downloader for B2Share repository."""

REGEXP_ID = r"b2share\.eudat\.eu\/records\/(?P<record_id>[0-9a-z]+)"

# the base entry point of the REST API
API_URL = "https://b2share.eudat.eu/api/"

# the files and metadata about the dataset
API_URL_META = "{api_url}records/{record_id}"
META_FILES_JSONPATH = "files[*]"

# paths to file attributes
ATTR_NAME_JSONPATH = "key"
ATTR_FILE_LINK_JSONPATH = "ePIC_PID"
ATTR_SIZE_JSONPATH = "size"
ATTR_HASH_JSONPATH = "checksum"
ATTR_HASH_TYPE_VALUE = "md5"
4 changes: 4 additions & 0 deletions tests/test_repositories.toml
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ files = "AA_age.tab"
location = "https://github.com/j535d165/cbsodata"
files = "cbsodata-main/README.md"

[[b2share]]
location = "https://b2share.eudat.eu/records/db2ef5890fa44c7a85af366a50de73b9"
files = "2024-02-13.sav"

[[dataeuropa]]
location = "https://data.europa.eu/data/datasets/65e092e4009f18f050b14216"
files = "consolidation-wattzhub-schema-irve-statique-20240220-152202.csv"
Expand Down
Loading