-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'add-jina' of https://github.com/KennethEnevoldsen/scand…
…inavian-embedding-benchmark into add-jina
- Loading branch information
Showing
26 changed files
with
119 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"Angry Tweets","task_description":"A sentiment dataset with 3 classes (positiv, negativ, neutral) for Danish tweets","task_version":"1.1.1","time_of_run":"2024-11-13T21:33:16.042746","scores":{"da":{"accuracy":0.5680993314231136,"f1":0.5594053621774726,"accuracy_stderr":0.024346122584687404,"f1_stderr":0.022681854105695665,"main_score":0.5680993314231136}},"main_score":"accuracy"} | ||
{"task_name":"Angry Tweets","task_description":"A sentiment dataset with 3 classes (positiv, negativ, neutral) for Danish tweets","task_version":"1.1.1","time_of_run":"2024-12-11T22:07:20.690617","scores":{"da":{"accuracy":0.6091690544412608,"f1":0.6052343490049035,"accuracy_stderr":0.01654188475325484,"f1_stderr":0.015929005182812636,"main_score":0.6091690544412608}},"main_score":"accuracy"} |
2 changes: 1 addition & 1 deletion
2
src/seb/cache/jinaai__jina-embeddings-v3/Bornholm_Parallel.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"Bornholm Parallel","task_description":"Danish Bornholmsk Parallel Corpus. Bornholmsk is a Danish dialect spoken on the island of Bornholm, Denmark. Historically it is a part of east Danish which was also spoken in Scania and Halland, Sweden.","task_version":"1.1.1","time_of_run":"2024-11-13T21:33:55.587815","scores":{"da":{"precision":0.3206724089635854,"recall":0.436,"f1":0.35174285714285713,"accuracy":0.436,"main_score":0.35174285714285713},"da-bornholm":{"precision":0.3206724089635854,"recall":0.436,"f1":0.35174285714285713,"accuracy":0.436,"main_score":0.35174285714285713}},"main_score":"f1"} | ||
{"task_name":"Bornholm Parallel","task_description":"Danish Bornholmsk Parallel Corpus. Bornholmsk is a Danish dialect spoken on the island of Bornholm, Denmark. Historically it is a part of east Danish which was also spoken in Scania and Halland, Sweden.","task_version":"1.1.1","time_of_run":"2024-12-11T22:08:05.312335","scores":{"da":{"precision":0.33535238095238096,"recall":0.45,"f1":0.3656333333333333,"accuracy":0.45,"main_score":0.3656333333333333},"da-bornholm":{"precision":0.33535238095238096,"recall":0.45,"f1":0.3656333333333333,"accuracy":0.45,"main_score":0.3656333333333333}},"main_score":"f1"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"DKHate","task_description":"Danish Tweets annotated for Hate Speech either being Offensive or not","task_version":"1.1.1","time_of_run":"2024-11-13T22:19:50.826703","scores":{"da":{"accuracy":0.6477203647416412,"f1":0.5366895373219578,"ap":0.1795654122069819,"accuracy_stderr":0.06684788272628261,"f1_stderr":0.04571944298710895,"ap_stderr":0.0282077813963096,"main_score":0.6477203647416412}},"main_score":"accuracy"} | ||
{"task_name":"DKHate","task_description":"Danish Tweets annotated for Hate Speech either being Offensive or not","task_version":"1.1.1","time_of_run":"2024-12-11T22:09:29.031462","scores":{"da":{"accuracy":0.6787234042553191,"f1":0.5549690045301742,"ap":0.18587207926094124,"accuracy_stderr":0.07542711044895069,"f1_stderr":0.054513556188797,"ap_stderr":0.031926033464990976,"main_score":0.6787234042553191}},"main_score":"accuracy"} |
This file was deleted.
Oops, something went wrong.
2 changes: 1 addition & 1 deletion
2
src/seb/cache/jinaai__jina-embeddings-v3/Da_Political_Comments.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"Da Political Comments","task_description":"A dataset of Danish political comments rated for sentiment","task_version":"1.1.1","time_of_run":"2024-11-13T21:34:35.344272","scores":{"da":{"accuracy":0.4206437291897891,"f1":0.38642142217868036,"accuracy_stderr":0.027264394356405246,"f1_stderr":0.017817279657788544,"main_score":0.4206437291897891}},"main_score":"accuracy"} | ||
{"task_name":"Da Political Comments","task_description":"A dataset of Danish political comments rated for sentiment","task_version":"1.1.1","time_of_run":"2024-12-11T22:11:05.867373","scores":{"da":{"accuracy":0.4342397336293008,"f1":0.3998793870382494,"accuracy_stderr":0.025613002998655047,"f1_stderr":0.018941333655636754,"main_score":0.4342397336293008}},"main_score":"accuracy"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"DanFEVER","task_description":"A Danish dataset intended for misinformation research. It follows the same format as the English FEVER dataset.","task_version":"1.1.1","time_of_run":"2024-11-13T21:36:36.608562","scores":{"da":{"ndcg_at_1":0.25859,"ndcg_at_3":0.34764,"ndcg_at_5":0.35958,"ndcg_at_10":0.36608,"ndcg_at_100":0.37129,"ndcg_at_1000":0.37169,"map_at_1":0.25851,"map_at_3":0.32666,"map_at_5":0.33333,"map_at_10":0.33608,"map_at_100":0.33722,"map_at_1000":0.33724,"recall_at_1":0.25851,"recall_at_3":0.40797,"recall_at_5":0.43676,"recall_at_10":0.45646,"recall_at_100":0.48015,"recall_at_1000":0.48313,"precision_at_1":0.25859,"precision_at_3":0.13604,"precision_at_5":0.0874,"precision_at_10":0.04568,"precision_at_100":0.00481,"precision_at_1000":0.00048,"mrr_at_1":0.25875,"mrr_at_3":0.32674,"mrr_at_5":0.33343,"mrr_at_10":0.33619,"mrr_at_100":0.3373,"mrr_at_1000":0.33732}},"main_score":"ndcg_at_10"} | ||
{"task_name":"DanFEVER","task_description":"A Danish dataset intended for misinformation research. It follows the same format as the English FEVER dataset.","task_version":"1.1.1","time_of_run":"2024-12-11T22:32:46.470067","scores":{"da":{"ndcg_at_1":0.31492,"ndcg_at_3":0.3914,"ndcg_at_5":0.39856,"ndcg_at_10":0.40282,"ndcg_at_100":0.40509,"ndcg_at_1000":0.40526,"map_at_1":0.31484,"map_at_3":0.37371,"map_at_5":0.37773,"map_at_10":0.37955,"map_at_100":0.38008,"map_at_1000":0.38009,"recall_at_1":0.31484,"recall_at_3":0.4421,"recall_at_5":0.45928,"recall_at_10":0.47215,"recall_at_100":0.48203,"recall_at_1000":0.48329,"precision_at_1":0.31492,"precision_at_3":0.14744,"precision_at_5":0.09192,"precision_at_10":0.04726,"precision_at_100":0.00483,"precision_at_1000":0.00048,"mrr_at_1":0.31492,"mrr_at_3":0.37374,"mrr_at_5":0.37777,"mrr_at_10":0.37956,"mrr_at_100":0.38009,"mrr_at_1000":0.3801}},"main_score":"ndcg_at_10"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"LCC","task_description":"The leipzig corpora collection, annotated for sentiment","task_version":"1.1.1","time_of_run":"2024-11-14T13:57:48.545664","scores":{"da":{"accuracy":0.5999999999999999,"f1":0.5885173430161176,"accuracy_stderr":0.03055050463303893,"f1_stderr":0.02601687642478716,"main_score":0.5999999999999999}},"main_score":"accuracy"} | ||
{"task_name":"LCC","task_description":"The leipzig corpora collection, annotated for sentiment","task_version":"1.1.1","time_of_run":"2024-12-11T22:03:21.013601","scores":{"da":{"accuracy":0.616,"f1":0.6156757118925441,"accuracy_stderr":0.033359989341858146,"f1_stderr":0.029531773504062154,"main_score":0.616}},"main_score":"accuracy"} |
2 changes: 1 addition & 1 deletion
2
src/seb/cache/jinaai__jina-embeddings-v3/Language_Identification.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"Language Identification","task_description":"A dataset for Nordic language identification.","task_version":"1.1.1","time_of_run":"2024-11-13T21:50:33.466847","scores":{"da":{"accuracy":0.48129999999999995,"f1":0.4716944543998669,"accuracy_stderr":0.012093570376214148,"f1_stderr":0.013172338407872259,"main_score":0.48129999999999995},"sv":{"accuracy":0.48129999999999995,"f1":0.4716944543998669,"accuracy_stderr":0.012093570376214148,"f1_stderr":0.013172338407872259,"main_score":0.48129999999999995},"nb":{"accuracy":0.48129999999999995,"f1":0.4716944543998669,"accuracy_stderr":0.012093570376214148,"f1_stderr":0.013172338407872259,"main_score":0.48129999999999995},"nn":{"accuracy":0.48129999999999995,"f1":0.4716944543998669,"accuracy_stderr":0.012093570376214148,"f1_stderr":0.013172338407872259,"main_score":0.48129999999999995},"is":{"accuracy":0.48129999999999995,"f1":0.4716944543998669,"accuracy_stderr":0.012093570376214148,"f1_stderr":0.013172338407872259,"main_score":0.48129999999999995},"fo":{"accuracy":0.48129999999999995,"f1":0.4716944543998669,"accuracy_stderr":0.012093570376214148,"f1_stderr":0.013172338407872259,"main_score":0.48129999999999995}},"main_score":"accuracy"} | ||
{"task_name":"Language Identification","task_description":"A dataset for Nordic language identification.","task_version":"1.1.1","time_of_run":"2024-12-13T16:27:56.414291","scores":{"da":{"accuracy":0.4083333333333333,"f1":0.3925482204639472,"accuracy_stderr":0.008633268983029165,"f1_stderr":0.007916299274783293,"main_score":0.4083333333333333},"sv":{"accuracy":0.4083333333333333,"f1":0.3925482204639472,"accuracy_stderr":0.008633268983029165,"f1_stderr":0.007916299274783293,"main_score":0.4083333333333333},"nb":{"accuracy":0.4083333333333333,"f1":0.3925482204639472,"accuracy_stderr":0.008633268983029165,"f1_stderr":0.007916299274783293,"main_score":0.4083333333333333},"nn":{"accuracy":0.4083333333333333,"f1":0.3925482204639472,"accuracy_stderr":0.008633268983029165,"f1_stderr":0.007916299274783293,"main_score":0.4083333333333333},"is":{"accuracy":0.4083333333333333,"f1":0.3925482204639472,"accuracy_stderr":0.008633268983029165,"f1_stderr":0.007916299274783293,"main_score":0.4083333333333333},"fo":{"accuracy":0.4083333333333333,"f1":0.3925482204639472,"accuracy_stderr":0.008633268983029165,"f1_stderr":0.007916299274783293,"main_score":0.4083333333333333}},"main_score":"accuracy"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"Massive Intent","task_description":"MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages","task_version":"1.1.1","time_of_run":"2024-11-13T21:43:34.445063","scores":{"da":{"accuracy":0.6379959650302622,"f1":0.6031210839216415,"accuracy_stderr":0.018687738234453372,"f1_stderr":0.01601335989279753,"main_score":0.6379959650302622},"nb":{"accuracy":0.6341627437794217,"f1":0.6000839733610837,"accuracy_stderr":0.016888194867408664,"f1_stderr":0.017814875436374125,"main_score":0.6341627437794217},"sv":{"accuracy":0.6594821788836583,"f1":0.6323874307279661,"accuracy_stderr":0.022018459492548024,"f1_stderr":0.017138250345814364,"main_score":0.6594821788836583}},"main_score":"accuracy"} | ||
{"task_name":"Massive Intent","task_description":"MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages","task_version":"1.1.1","time_of_run":"2024-12-13T12:30:27.747355","scores":{"da":{"accuracy":0.7229993275050437,"f1":0.6777312864344521,"accuracy_stderr":0.021605725550200672,"f1_stderr":0.019667189317759903,"main_score":0.7229993275050437},"nb":{"accuracy":0.7127437794216543,"f1":0.6693877898238115,"accuracy_stderr":0.014615011239315112,"f1_stderr":0.012458761208978947,"main_score":0.7127437794216543},"sv":{"accuracy":0.731102891728312,"f1":0.6918924054876893,"accuracy_stderr":0.021828159697349376,"f1_stderr":0.02278756069148256,"main_score":0.731102891728312}},"main_score":"accuracy"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"Massive Scenario","task_description":"MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages","task_version":"1.1.1","time_of_run":"2024-11-13T21:46:44.729375","scores":{"da":{"accuracy":0.7350706119704102,"f1":0.7264571146442774,"accuracy_stderr":0.007798404730966382,"f1_stderr":0.009457832203659417,"main_score":0.7350706119704102},"nb":{"accuracy":0.7190988567585743,"f1":0.7119164347268657,"accuracy_stderr":0.01014353370687806,"f1_stderr":0.010327527565859801,"main_score":0.7190988567585743},"sv":{"accuracy":0.7415265635507734,"f1":0.7321277292640845,"accuracy_stderr":0.009021276077818262,"f1_stderr":0.009842616330101132,"main_score":0.7415265635507734}},"main_score":"accuracy"} | ||
{"task_name":"Massive Scenario","task_description":"MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages","task_version":"1.1.1","time_of_run":"2024-12-13T16:04:24.007081","scores":{"da":{"accuracy":0.8303967720242097,"f1":0.8144907891114421,"accuracy_stderr":0.01345679043361314,"f1_stderr":0.012214051749594796,"main_score":0.8303967720242097},"nb":{"accuracy":0.8227303295225286,"f1":0.809055319974268,"accuracy_stderr":0.01164970138337166,"f1_stderr":0.010758393028445808,"main_score":0.8227303295225286},"sv":{"accuracy":0.8377269670477471,"f1":0.8214039898309877,"accuracy_stderr":0.011274273146386207,"f1_stderr":0.010311356519461268,"main_score":0.8377269670477471}},"main_score":"accuracy"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"NoReC","task_description":"A Norwegian dataset for sentiment classification on review","task_version":"1.1.1","time_of_run":"2024-11-13T21:51:14.423683","scores":{"nb":{"accuracy":0.5984375,"f1":0.5772198167649724,"accuracy_stderr":0.018554944488351894,"f1_stderr":0.01755204684778349,"main_score":0.5984375}},"main_score":"accuracy"} | ||
{"task_name":"NoReC","task_description":"A Norwegian dataset for sentiment classification on review","task_version":"1.1.1","time_of_run":"2024-12-13T16:29:32.719809","scores":{"nb":{"accuracy":0.61494140625,"f1":0.5977866105449843,"accuracy_stderr":0.030842373184347828,"f1_stderr":0.02791508266619411,"main_score":0.61494140625}},"main_score":"accuracy"} |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"Norwegian courts","task_description":"Nynorsk and Bokmål parallel corpus from Norwegian courts. Norway has two standardised written languages. Bokmål is a variant closer to Danish, while Nynorsk was created to resemble regional dialects of Norwegian.","task_version":"1.1.1","time_of_run":"2024-11-13T21:52:11.201687","scores":{"nb":{"precision":0.9049707602339182,"recall":0.9298245614035088,"f1":0.9130116959064327,"accuracy":0.9298245614035088,"main_score":0.9130116959064327},"nn":{"precision":0.9049707602339182,"recall":0.9298245614035088,"f1":0.9130116959064327,"accuracy":0.9298245614035088,"main_score":0.9130116959064327}},"main_score":"f1"} | ||
{"task_name":"Norwegian courts","task_description":"Nynorsk and Bokmål parallel corpus from Norwegian courts. Norway has two standardised written languages. Bokmål is a variant closer to Danish, while Nynorsk was created to resemble regional dialects of Norwegian.","task_version":"1.1.1","time_of_run":"2024-12-13T16:36:58.457846","scores":{"nb":{"precision":0.9203216374269007,"recall":0.9385964912280702,"f1":0.9261695906432749,"accuracy":0.9385964912280702,"main_score":0.9261695906432749},"nn":{"precision":0.9203216374269007,"recall":0.9385964912280702,"f1":0.9261695906432749,"accuracy":0.9385964912280702,"main_score":0.9261695906432749}},"main_score":"f1"} |
2 changes: 1 addition & 1 deletion
2
src/seb/cache/jinaai__jina-embeddings-v3/Norwegian_parliament.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"Norwegian parliament","task_description":"Norwegian parliament speeches annotated with the party of the speaker (`Sosialistisk Venstreparti` vs `Fremskrittspartiet`)","task_version":"1.1.1","time_of_run":"2024-11-13T21:52:00.593689","scores":{"nb":{"accuracy":0.6006666666666666,"f1":0.598941034068567,"ap":0.5607351566727996,"accuracy_stderr":0.018195695461656135,"f1_stderr":0.019486204000061156,"ap_stderr":0.013099108893732402,"main_score":0.6006666666666666}},"main_score":"accuracy"} | ||
{"task_name":"Norwegian parliament","task_description":"Norwegian parliament speeches annotated with the party of the speaker (`Sosialistisk Venstreparti` vs `Fremskrittspartiet`)","task_version":"1.1.1","time_of_run":"2024-12-14T15:36:28.812275","scores":{"nb":{"accuracy":0.5688333333333333,"f1":0.5652299805603136,"ap":0.5401466909214766,"accuracy_stderr":0.026199554703595005,"f1_stderr":0.02768317419649942,"ap_stderr":0.01711798949729587,"main_score":0.5688333333333333}},"main_score":"accuracy"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"task_name":"SNL Clustering","task_description":"Webscrabed articles from the Norwegian lexicon 'Det Store Norske Leksikon'. Uses articles categories as clusters.","task_version":"0.0.1","time_of_run":"2024-11-13T21:54:10.329920","scores":{"nb":{"v_measure":0.5957282278588,"v_measure_std":0.014688046062978624}},"main_score":"v_measure"} | ||
{"task_name":"SNL Clustering","task_description":"Webscrabed articles from the Norwegian lexicon 'Det Store Norske Leksikon'. Uses articles categories as clusters.","task_version":"0.0.1","time_of_run":"2024-12-14T19:34:35.710300","scores":{"nb":{"v_measure":0.6868723955917873,"v_measure_std":0.009034855336828941}},"main_score":"v_measure"} |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.