-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchanges.txt
154 lines (109 loc) · 4.3 KB
/
changes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
Jan 14, 2025
dao: refactored sql queries for better readability
Jan 02, 2025
gene nomenclature events: fixed NullPointerException
Nov 11, 2024
if a given species (and its assembly) is not available in the current Biomart release,
the species is skipped from processing, and the pipeline could continue processing other species.
Previously, that issue was breaking the pipeline.
Aug 06, 2024
renamed 'master' branch to 'main'
Feb 22, 2024
gene loader: fixed handling of nomenclature events
Nov 10, 2023
gene loader: alternate assembly names are provided in config
Oct 31, 2023
gene loader: nomenclature events are no longer generated for genes having NOMEN_SOURCE='MGI'
Note: The problem was that mgi-nomenclature-pipeline was renaming Ensembl genes, setting NOMEN_SOURCE='MGI',
and then ensembl-gene-pipeline was renaming them to Ensembl nomenclature, setting NOMEN_SOURCE='Ensembl';
that resulted in endless circle of gene renaming for some mouse genes.
Since MGI is the primary nomenclature source, it must trump any other gene name sources.
May 09, 2023
gene loader: code cleanup
Apr 03, 2023
removed Harika from email list -- Harika is no longer with RGD
Feb 27, 2023
renamed distro to 'ensembl-data-pipeline'
Feb 21, 2023
fixed names of downloaded transcript files (no more spaces in file name)
Feb 20, 2023
better logging
Jan 26, 2023
updated build.gradle to be gradle 6.x + compatible
Jan 10, 2023
removed Harika from email recipients
Oct 07, 2022
loader: added species SpeTri2.0 (squirrel) to the list of supported species
loader: added matching of incoming Ensembl genes to existing NCBI genes by position
Jun 01, 2022
genes: added logging and summary reporting of inserted genes
May 25, 2022
aliases: prefix 'LOW QUALITY PROTEIN:' is removed if present
May 23, 2022
improved logging of conflicts
May 20, 2022
implemented updates of Ensembl gene type
added loading for naked mole-rat assembly HetGla_female_1.0
May 13, 2022
improved QC reporting
May 11, 2022
better code to match by NCBI gene ids
May 10, 2022
tuned up logging
2022-03-07
updated dependencies
2022-01-04
updated log4j to avoid zero day exploit
updated dog and rat: new assemblies ROS_Cfam_1.0 and mRatBN7.2
2021-08-31
gff3 transcript loader: improved parsing nad loading code
transcript loader: implemented full qc of transcript positions
2021-08-05
gff3 transcript loader: implemented
2021-07-30
gff3 gene loader: implemented loading of basic gene info from gff3 file
2021-06-15
added skeleton code to load data from gff3 files
refactored species to process
2021-05-25
implemented deletion of obsolete gene positions (to prevent duplicate positions)
2021-02-23
updated mouse assembly to GRCm39
2021-02-03
transcript loader: added code to detect and remove obsolete utr regions
2021-02-02
transcript loader: added code to detect and remove obsolete exons
2020-12-03
logging fix: conflicts.txt file is no longer appended to
2020-09-24
for scaffold assemblies: when loading positions, Genebank chromosome accessions
are replaced with RefSeq accessions
2020-09-22
transcript loader: added loading of UTR regions
2020-09-16
dao tuneup: replaced custom sql code with a call to rgdcore
2020-09-08
tuned up handling of transcript versions
improved reporting: separated conflict report from summary report
2020-08-26
downloads the files in compressed format
added handling of transcript versions (STABLE_TRANSCRIPTS table)
2020-08-06
transcripts loader summary: shows count of inserted transcripts
2020-07-31
added assembly map verification for all species
added chinchilla to list of processed species
2020-05-28
replaced some custom sql queries with rgdcore equivalents for better code maintainability
2020-05-27
fixed Ensembl query for dog
2020-04-16
added logging of xdb ids
added script run_ensembl.sh that was used on PROD but never had been committed to git
2020-03-16
tuned up logging
2020-03-10
new genes: nomenclature event (status='PROVISIONAL') with ref_key=20683 is automatically generated only for RAT genes
2020-03-05
new genes: when added, a nomenclature event (status='PROVISIONAL') with ref_key=20683 is automatically generated
code refactored to handle rgdcore API changes for NCBI and Ensembl assembly maps