Weslang: is a standalone WEb Service to detect the LANGuage of a given piece of text.
It works by executing both CLD2 and Language-Detection.
The exposed API is very simple:
host:8080/detect?q=<TEXT>
The endpoint also supports POST
requests in case longer payloads are required.
The response will be a JSON document like the following:
{
"language": "en",
"confidence": 0.99
}
Where language is the ISO_639-1 code of the language (except for Chinese in which the locale is also returned, that is the result would be either zh-cn or zh-tw).
Additional endpoints for checking the health of the webservice will be exposed at localhost:9001.
Among the endopoints one can find
- http://localhost:9001/health
- http://localhost:9001/metrics
- http://localhost:9001/env
- http://localhost:9001/info
- http://localhost:9001/mappings
- http://localhost:9001/trace
This is done automatically by the Spring-Boot framework, via the Actuator plugin.
This project includes two components that could easily be a project on their own.
Using JNA a Java interface is exposed for getting the language via the CLD2 library.
The code lives in //java/com/deezer/research/cld2
In //third-party/java/language-detection-v2
we have a fork of
language-detection.
The main changes we did, was to remove the randomization and some performance improvements. See the file THIRD_PARTY.yaml in that folder for a comprehensive list of changes.
To build and test this project BUCK is required and also Java 7. That means that it cannot be built under Windows.
$ buck test --all
$ buck build //java/com/deezer/research/language:detection_app
The build command generated a file called
buck-out/gen/java/com/deezer/research/language/detection_app.jar
, which is a
self contained binary.
To run it just execute:
$ java -jar detection_app.jar
If for some reason you don't want or can't execute both detectors, you could run:
$ java -jar detection_app.jar --spring.profiles.active=java_only
$ java -jar detection_app.jar --spring.profiles.active=cld2
$ java -jar detection_app.jar --spring.profiles.active=both
Currently the Cld2 bindings are only generated for linux-x86-64
, so if your
machine is different it probably won't work. In such a case, just execute it
with the java_only
profile.
This project is possible to several Open Source Projects
- Spring-Boot: Java Framework.
- CLD2: The language detector built into Chrome.
- Language-Detection: Language detector provided by Cybozu Labs.
- BUCK: build system released by Facebook.
- Guava: Additional Java libraries provided By Google.
- JNA: Libary to easily integrate C libraries with Java.
This project is released under the Apache 2.0 License.