-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathTODO.html
232 lines (230 loc) · 10.4 KB
/
TODO.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
<!DOCTYPE html>
<html>
<head>
<title>Caltech Library's Digital Library Development Sandbox</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="./">README</a></li>
<li><a href="LICENSE">LICENSE</a></li>
<li><a href="INSTALL.html">INSTALL</a></li>
<li><a href="user-manual.html">User Manual</a></li>
<li><a href="how-to/">Tutorials</a></li>
<li><a href="search.html">Search Docs</a></li>
<li><a href="about.html">About</a></li>
<li><a href="https://github.com/caltechlibrary/datatools">GitHub</a></li>
</ul>
</nav>
<section>
<h2 id="action-items">Action Items</h2>
<h2 id="bug">Bug</h2>
<ul class="task-list">
<li><label><input type="checkbox" />HTML entities are getting translated
to Unicode code points, this is a result of using the default Marshal()
in json package. Solution is to replace with custom encoder/coders with
sane (an configurable) defaults, see updates to crossrefapi for example
code.</label></li>
<li><label><input type="checkbox" />One of the tools does a JSON
decode/encode to do indentation, this isn’t necessary, just using
json.Ident instead (or json.Compact to go the otherway)</label></li>
<li><label><input type="checkbox" />Some man pages use the old “USAGE”
format, needs to be updated to use Pandoc structure</label></li>
<li><label><input type="checkbox" />Need to finish depreciating the cli
package in favor of the standard flag package</label></li>
<li><label><input type="checkbox" />findfile v0.0.23-pre option -f,
-full-path doesn’t return full paths</label></li>
<li><label><input type="checkbox" /><a
href="https://github.com/caltechlibrary/datatools/issues/12">Issue
#12</a></label>
<ul>
<li>do we support non-string representation of context?</li>
<li>do we support string representation in person/organisation?</li>
<li>is this added complexity worth it?</li>
</ul></li>
</ul>
<h2 id="next">Next</h2>
<ul class="task-list">
<li><label><input type="checkbox" />Reorganize documentation into man
pages and how tos</label></li>
<li><label><input type="checkbox" />documentation, simplify site
navigation by flattening to two levels with second level linked by user
manual page and “how to” page.</label></li>
<li><label><input type="checkbox" />Update the build process to match my
current practices</label></li>
<li><label><input type="checkbox" />ioutil is depreciated, need to
update the code that uses it.</label></li>
<li><label><input type="checkbox" />Update how docs are generated, see
about dropped cli package, there are better simpler ways to move
forward</label></li>
<li><label><input type="checkbox" />Review <a
href="https://go-app.dev">Go-app.dev</a> and see if I can make a useful
format converted GUI based on the code for all the cli in the
project.</label></li>
<li><label><input type="checkbox" />Create man pages for all cli, adopt
man page structure for usage.</label></li>
<li><label><input type="checkbox" />Possible needed tooling</label>
<ul class="task-list">
<li><label><input type="checkbox" checked="" />A codemeta generator
(e.g. read a CITATION.cff and write a codemeta file)</label></li>
</ul></li>
<li><label><input type="checkbox" />Drop cli package, update Makefile,
switch from mk_website.py to website.mak, add Man pages</label></li>
<li><label><input type="checkbox" />Review
https://csvkit.readthedocs.io/en/latest/ and implement the features in
datatools that make sense, e.g. csvsql, a csv2sql, sql2csv,
etc.</label></li>
<li><label><input type="checkbox" />upgrade to use the new cli
v0.0.5-dev</label></li>
<li><label><input type="checkbox" />csvrows would output a range of rows
(e.g. [2:] would be all rows but the first row)</label></li>
<li><label><input type="checkbox" />csv utilities should support integer
ranges notation for columns and rows references, E.g. “1,3:4,7,10:” or
all</label></li>
</ul>
<h2 id="someday-maybe">Someday, Maybe</h2>
<ul class="task-list">
<li><label><input type="checkbox" />finddir should have an option to
exclude directories (e.g. exclude .git directories from a
listing)</label></li>
<li><label><input type="checkbox" />textscraper - a tool for select out
text and storing it as a JSON field value, sort grep plus sed cleanup
and semi-structured text (e.g. webpage)</label>
<ul>
<li>look at how cut, sed, grep are commonly used in my scripts and merge
that functionality into a single tool</li>
</ul></li>
<li><label><input type="checkbox" />csvcols, csvrows should have a
length option to give you a number of columns or rows
respectively</label></li>
<li><label><input type="checkbox" />csvcols, csvrows should have a
filter option to filter to support filting output
conditionally</label></li>
<li><label><input type="checkbox" />csvsort should allow a multi-column
sort respecting column headings</label>
<ul>
<li>plus column number would be ascending by that column</li>
<li>minos column number would be descending by that column</li>
<li>sort would be read from left to right</li>
<li>it would be good to include support for column names and not just
column numbers to describe the sort</li>
</ul></li>
<li><label><input type="checkbox" />jsonmodify takes a JSON document, a
dotpath and value then creates/updates the dotpath in the JSON document
with the new value</label>
<ul>
<li>“(delete DOTPATH)” would remove the property described by the
dotpath</li>
<li>“(update DOTPATH NEW_VALUE)” would replace the property described by
the dotpath with a new value (value can be a string, number, or
JSON)</li>
<li>“(create” DOTPATH NEW_VALUE)” would add a new property at the
described dotpath with a new value (value can be a string, number, or
JSON)</li>
<li>“(join DOTH_PATH SEP)” combines JSON array elements into a string
version using separator</li>
<li>“(concat DOTPATH1 DOTPATH2… SEP)” combines values into a
concatenated string, it takes one or more dotpath values (must be string
or number) and return them as a concatenated value (concat .last_name
.first_name “,”) would return a last name comma first name string.</li>
<li>“(split DOTH_PATH SEP)” turns a string into an array of strings
using separator</li>
</ul></li>
<li><label><input type="checkbox" />csvcols, csvrows should have a
filter mechanism should provide a mechanism to filter by column or
row</label>
<ul>
<li>using a prefix notation (e.g. ‘(and (eq (join (cols (colNo “Last
Name”) (colNo “First Name”)) “,”) “Doiel, R. S.”) (gt (cols 4)
“2017-06-12”))’)</li>
</ul></li>
<li><label><input type="checkbox" />csvfind, csvjoin should have an
inverted match operation</label></li>
<li><label><input type="checkbox" />a range should accept the word “all”
as well as comma delimited list of rows and ranges</label></li>
<li><label><input type="checkbox" />Add -uuid and -skip-header-row
options constistantly to all csv tools</label>
<ul class="task-list">
<li><label><input type="checkbox" />csvcols</label></li>
</ul></li>
<li><label><input type="checkbox" />unify the options vocabulary to work
the same between each cli</label>
<ul>
<li>Need a common approach to column ranges in csvcols, csvfind,
csvjoin</li>
<li>csv2json, csv2mdtable, csv2xlsx should accept a column and row range
option for output</li>
</ul></li>
<li><label><input type="checkbox" />csvfind add filter by row number
(helpful when combined with csvcols for snapshotting the middle of a
table)</label></li>
<li><label><input type="checkbox" />csv2json should have an option that
will include a row number in JSON blob output</label></li>
<li><label><input type="checkbox" />csv2json should have the options to
normalize property names in JSON objects</label>
<ul>
<li>camel case</li>
<li>snake case</li>
<li>lower case/upper case</li>
<li>space to underscores</li>
<li>strip punctuation</li>
<li>rename keys</li>
</ul></li>
<li><label><input type="checkbox" />csvrotate would take a CSV file as
import and output columns as rows</label></li>
<li><label><input type="checkbox" />smartcat would function like cat but
with support for ranges of lines (e.g. show me last 20 lines: smartcat
-start=0 -end=“-20” file.txt; cat starting with 10th line: smartcat
-start=10 file.txt)</label>
<ul class="task-list">
<li><label><input type="checkbox" />allow prefix line number with a
specific delimiter (E.g. comma would let you cat a CSV file adding row
numbers as first column)</label></li>
<li><label><input type="checkbox" />show lines with prefix, suffix,
containing or regxp</label></li>
<li><label><input type="checkbox" />show lines without prefix, suffix,
containing or regexp</label></li>
</ul></li>
</ul>
<h2 id="completed">Completed</h2>
<ul class="task-list">
<li><label><input type="checkbox" checked="" />consolidate string
utilities (e.g. toupper, tolower, totitle) into string cli</label></li>
<li><label><input type="checkbox" checked="" />csvcols -col option
should not be a boolean, it should take a range like other csv
cli</label></li>
<li><label><input type="checkbox" checked="" />utilities should use
starting index of 1 instead of zero as humans refer to column 1 when
intending to work on the first column</label></li>
<li><label><input type="checkbox" checked="" />for all cli the
-delimiter option should support special characters like , </label></li>
<li><label><input type="checkbox" checked="" />csvfind would accept CSV
input from stdin and output rows with matching column values</label>
<ul>
<li>E.g.
<code>cat file1.csv | csvfind -levenshtein -stop-words="the:a:of" -col=1 "This Red Book of West March"</code></li>
<li>E.g.
<code>cat file1.csv | csvfind -inverted -levenstein -stop-words="the:a:of" -col=1 "This Red Book of West March"</code></li>
<li>E.g.
<code>cat file1.csv | csvfind -contains -col=1 "Red Book"</code></li>
</ul></li>
<li><label><input type="checkbox" checked="" />csvjoin should have
option for fuzzy match on columns (e.g. comparing titles)</label></li>
</ul>
</section>
<footer>
<span><h1><A href="http://caltech.edu">Caltech</a></h1></span>
<span>© 2023 <a href="https://www.library.caltech.edu/copyright">Caltech library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
<span><a href="mailto:[email protected]">Email Us</a></span>
<a class="cl-hide" href="sitemap.xml">Site Map</a>
</footer>
</body>
</html>