-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathcsvfind.1.html
156 lines (154 loc) · 4.06 KB
/
csvfind.1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
<!DOCTYPE html>
<html>
<head>
<title>Caltech Library's Digital Library Development Sandbox</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="./">README</a></li>
<li><a href="LICENSE">LICENSE</a></li>
<li><a href="INSTALL.html">INSTALL</a></li>
<li><a href="user-manual.html">User Manual</a></li>
<li><a href="how-to/">Tutorials</a></li>
<li><a href="search.html">Search Docs</a></li>
<li><a href="about.html">About</a></li>
<li><a href="https://github.com/caltechlibrary/datatools">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="name">NAME</h1>
<p>csvfind</p>
<h1 id="synopsis">SYNOPSIS</h1>
<p>csvfind <a href="#options">OPTIONS</a> TEXT_TO_MATCH</p>
<h1 id="description">DESCRIPTION</h1>
<p>csvfind processes a CSV file as input returning rows that contain the
column with matched text. Columns are counted from one instead of zero.
Supports exact match as well as some Levenshtein matching.</p>
<h1 id="options">OPTIONS</h1>
<dl>
<dt>-help</dt>
<dd>
display help
</dd>
<dt>-license</dt>
<dd>
display license
</dd>
<dt>-version</dt>
<dd>
display version
</dd>
<dt>-allow-duplicates</dt>
<dd>
allow duplicates when searching for matches
</dd>
<dt>-append-edit-distance</dt>
<dd>
append column with edit distance found (useful for tuning levenshtein)
</dd>
<dt>-case-sensitive</dt>
<dd>
perform a case sensitive match (default is false)
</dd>
<dt>-col, -cols</dt>
<dd>
column to search for match in the CSV file
</dd>
<dt>-contains</dt>
<dd>
use contains phrase for matching
</dd>
<dt>-d, -delimiter</dt>
<dd>
set delimiter character
</dd>
<dt>-delete-cost</dt>
<dd>
set the delete cost to use for levenshtein matching
</dd>
<dt>-i, -input</dt>
<dd>
input filename
</dd>
<dt>-insert-cost</dt>
<dd>
set the insert cost to use for levenshtein matching
</dd>
<dt>-levenshtein</dt>
<dd>
use levenshtein matching
</dd>
<dt>-max-edit-distance</dt>
<dd>
set the edit distance thresh hold for match, default 0
</dd>
<dt>-nl, -newline</dt>
<dd>
include trailing newline from output
</dd>
<dt>-o, -output</dt>
<dd>
output filename
</dd>
<dt>-quiet</dt>
<dd>
suppress error messages
</dd>
<dt>-skip-header-row</dt>
<dd>
skip the header row
</dd>
<dt>-stop-words</dt>
<dd>
use the colon delimited list of stop words
</dd>
<dt>-substitute-cost</dt>
<dd>
set the substitution cost to use for levenshtein matching
</dd>
<dt>-trim-leading-space</dt>
<dd>
trim leadings space in field(s) for CSV input
</dd>
<dt>-trimspace, -trimspaces</dt>
<dd>
trim spaces around cell values before comparing
</dd>
<dt>-use-lazy-quotes</dt>
<dd>
use lazy quotes on CSV input
</dd>
</dl>
<h1 id="examples">EXAMPLES</h1>
<p>Find the rows where the third column matches “The Red Book of
Westmarch” exactly</p>
<pre><code> csvfind -i books.csv -col=2 "The Red Book of Westmarch"</code></pre>
<p>Find the rows where the third column (colums numbered 1,2,3) matches
approximately “The Red Book of Westmarch”</p>
<pre><code> csvfind -i books.csv -col=2 -levenshtein \
-insert-cost=1 -delete-cost=1 -substitute-cost=3 \
-max-edit-distance=50 -append-edit-distance \
"The Red Book of Westmarch"</code></pre>
<p>In this example we’ve appended the edit distance to see how close the
matches are.</p>
<p>You can also search for phrases in columns.</p>
<pre><code> csvfind -i books.csv -col=2 -contains "Red Book"</code></pre>
<p>csvfind 1.2.12</p>
</section>
<footer>
<span><h1><A href="http://caltech.edu">Caltech</a></h1></span>
<span>© 2023 <a href="https://www.library.caltech.edu/copyright">Caltech library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
<span><a href="mailto:[email protected]">Email Us</a></span>
<a class="cl-hide" href="sitemap.xml">Site Map</a>
</footer>
</body>
</html>