diff --git a/README.md b/README.md index fe9715885..f5915f5bc 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,7 @@ +##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details. + +##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0). + # OAP MLlib ## Overview @@ -45,6 +49,8 @@ Intel® oneAPI Toolkits components used by the project are already included into ### Spark Configuration +#### General Configuration + Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. ``` @@ -56,6 +62,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar spark.executor.extraClassPath ./oap-mllib-x.x.x.jar ``` +#### OAP MLlib Specific Configuration + +OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point. + ### Sanity Check #### Setup `env.sh` diff --git a/docs/OAP-Installation-Guide.md b/docs/OAP-Installation-Guide.md index c269b978e..ca1a6f558 100644 --- a/docs/OAP-Installation-Guide.md +++ b/docs/OAP-Installation-Guide.md @@ -36,7 +36,7 @@ Once finished steps above, you have completed OAP dependencies installation and Dependencies below are required by OAP and all of them are included in OAP Conda package, they will be automatically installed in your cluster when you Conda install OAP. Ensure you have activated environment which you created in the previous steps. -- [Arrow](https://github.com/Intel-bigdata/arrow) +- [Arrow](https://github.com/oap-project/arrow/tree/arrow-3.0.0-oap-1.1) - [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/) - [Memkind](https://anaconda.org/intel/memkind) - [Vmemcache](https://anaconda.org/intel/vmemcache) diff --git a/docs/User-Guide.md b/docs/User-Guide.md index 34331ccce..3425d57e3 100644 --- a/docs/User-Guide.md +++ b/docs/User-Guide.md @@ -1,7 +1,3 @@ -##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details. - -##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0). - # OAP MLlib ## Overview @@ -13,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range. For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used. -## Online Documentation - -You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib). ## Getting Started @@ -49,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into ### Spark Configuration +#### General Configuration + Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. ``` @@ -60,6 +55,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar spark.executor.extraClassPath ./oap-mllib-x.x.x.jar ``` +#### OAP MLlib Specific Configuration + +OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point. + ### Sanity Check #### Setup `env.sh` diff --git a/docs/index.md b/docs/index.md index 9fb2e0396..3425d57e3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -9,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range. For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used. -## Online Documentation - -You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib). ## Getting Started @@ -45,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into ### Spark Configuration +#### General Configuration + Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. ``` @@ -56,6 +55,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar spark.executor.extraClassPath ./oap-mllib-x.x.x.jar ``` +#### OAP MLlib Specific Configuration + +OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point. + ### Sanity Check #### Setup `env.sh`