-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathindex.html
113 lines (111 loc) · 5.02 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
<!DOCTYPE html>
<html>
<head>
<title>Caltech Library's Digital Library Development Sandbox</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="./">README</a></li>
<li><a href="LICENSE">LICENSE</a></li>
<li><a href="INSTALL.html">INSTALL</a></li>
<li><a href="user-manual.html">User Manual</a></li>
<li><a href="how-to/">Tutorials</a></li>
<li><a href="search.html">Search Docs</a></li>
<li><a href="about.html">About</a></li>
<li><a href="https://github.com/caltechlibrary/datatools">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="datatools">datatools</h1>
<p><em>datatools</em> is a rich collection of command line programs
targetting data conversion, cleanup and analysis directly from your
favorite POSIX shell. It has proven useful for data collaberations where
individual members of a project may prefer different toolsets in their
analysis (e.g. Julia, R, Python) but want to work from a common
baseline. It also has been used intensively for internal reporting from
various Caltech Library metadata sources.</p>
<p>The tools fall into three broad categories</p>
<ul>
<li>data transformation and conversion</li>
<li>shell scripting helpers</li>
<li>“string”, a tool providing the common string operations missing from
shell</li>
</ul>
<p>See <a href="user-manual.html">user manual</a> for a complete list of
the command line programs. The data transformation tools include support
for formats such as Excel XML, csv, tab delimited files, json, yaml and
toml.</p>
<p>Compiled versions of the datatools collection are provided for Linux
(amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). See
https://github.com/caltechlibrary/datatools/releases.</p>
<p>Use “-help” option for a full list of options for each utility
(e.g. <code>csv2json -help</code>).</p>
<h2 id="data-transformation">Data transformation</h2>
<p>The tooling around transformation includes data conversion. These
include tools that work with CSV, tab delimited, JSON, TOML, YAML and
Excel XML.</p>
<p>There is also tooling to change data shapes using JSON as the
intermediate data format.</p>
<h2 id="for-the-shell">For the shell</h2>
<p>Various utilities for simplifying work on the command line.</p>
<ul>
<li><a href="docs/findfile/">findfile</a> - find files based on prefix,
suffix or contained string</li>
<li><a href="docs/finddir/">finddir</a> - find directories based on
prefix, suffix or contained string</li>
<li><a href="docs/mergepath/">mergepath</a> - prefix, append, clip path
variables</li>
<li><a href="docs/range/">range</a> - emit a range of integers (useful
for numbered loops in Bash)</li>
<li><a href="docs/reldate/">reldate</a> - display a relative date in
YYYY-MM-DD format</li>
<li><a href="docs/reltime/">reltime</a> - display a relative time in 24
hour notation, HH:MM:SS format</li>
<li><a href="docs/timefmt/">timefmt</a> - format a time value based on
Golang’s time format language</li>
<li><a href="docs/urlparse/">urlparse</a> - split a URL into parts</li>
</ul>
<h2 id="for-strings">For strings</h2>
<p><em>datatools</em> provides the <a href="docs/string/">string</a>
command for working with text strings (limited to memory available).
This is commonly needed when cleanup data for analysis. The
<em>string</em> command was created for when the old Unix standbys-
grep, awk, sed, tr are unwieldly or inconvient. <em>string</em> provides
operations are common in most language like, trimming, spliting, and
transforming letter case. The <em>string</em> command also makes it easy
to join JSON string arrays into single a string using a delimiter or
split a string into a JSON array based on a delimiter. The form of the
command is
<code>string [OPTIONS] [ACTION] [ARCTION_PARAMETERS...]</code></p>
<pre class="shell"><code> string toupper "one two three"</code></pre>
<p>Would yield “ONE TWO THREE”.</p>
<p>Some of the features included</p>
<ul>
<li>change case (upper, lower, title, English title)</li>
<li>length, position and count of substrings</li>
<li>has prefix, suffix or contains</li>
<li>trim prefix, suffix and cutsets</li>
<li>split and join to/from JSON string arrays</li>
</ul>
<p>See <a href="docs/string/">string</a> for full details</p>
<h2 id="installation">Installation</h2>
<p>See <a href="install.html">INSTALL.md</a> for details for installing
pre-compiled versions of the programs.</p>
</section>
<footer>
<span><h1><A href="http://caltech.edu">Caltech</a></h1></span>
<span>© 2023 <a href="https://www.library.caltech.edu/copyright">Caltech library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
<span><a href="mailto:[email protected]">Email Us</a></span>
<a class="cl-hide" href="sitemap.xml">Site Map</a>
</footer>
</body>
</html>