forked from joshua-decoder/joshua
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
75 lines (49 loc) · 2.76 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Running the Joshua Decoder:
---------------------------
If you wish to run the complete machine translation pipeline, Joshua includes a
black-box implementation that enables the entire pipeline to be run by typing
a single restartable command. See the documentation for a walkthrough and more
information about the many options available to the pipeline.
- web: http://joshua-decoder.org/5.0/pipeline.html
- local mirror: ./joshua-decoder.org/5.0/pipeline.html
Manually Running the Joshua Decoder:
------------------------------------
To run the decoder, first set these environment variables:
export JAVA_HOME=/path/to/java # maybe /usr/java/home
export JOSHUA=/path/to/joshua
You might also find it helpful to set these:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
Then, compile Joshua by typing:
cd $JOSHUA
ant
The basic method for invoking the decoder looks like this:
cat SOURCE | JOSHUA -c CONFIG > OUTPUT
You can test this using the sample configuration files and inputs can be found
in the example/ directory. For example, type:
cat examples/example/test.in | $JOSHUA/joshua-decoder -c examples/example/joshua.config
The decoder output will load the language model and translation models defined
in the configuration file, and will then decode the five sentences in the
example file.
There are a variety of command line options that you can feed to Joshua.
For example, you can enable multithreaded decoding with the -threads N flag:
cat examples/example/test.in | $JOSHUA/joshua-decoder -c examples/example/joshua.config -threads 5
The configuration file defines many additional parameters, all of which can be
overridden on the command line by using the format -PARAMETER value. For
example, to output the top 10 hypotheses instead of just the top 1 specified in
the configuration file, use -top-n N:
cat examples/example/test.in | $JOSHUA/joshua-decoder -c examples/example/joshua.config -top_n 10
Parameters, whether in the configuration file or on the command line, are
converted to a canonical internal representation that ignores hyphens,
underscores, and case. So, for example, the following parameters are all
equivalent:
{top-n, topN, top_n, TOP_N, t-o-p-N}
{poplimit, pop-limit, pop-limit, popLimit}
and so on. For an example of parameters, see the Joshua configuration file
template in $JOSHUA/scripts/training/templates/tune/joshua.config or the online
documentation at joshua-decoder.org/4.0/decoder.html. There is a wealth of
information in the online documentation.
After you have successfully run the decoding example above, we recommend that
you take a look at the Joshua pipeline script, which allows you to do full
end-to-end training of a translation model. It is stored in
$JOSHUA/examples