From 900b48fdf54ae495b3009cfac0cab267feeb27dd Mon Sep 17 00:00:00 2001 From: sadikovi Date: Sun, 14 Feb 2016 14:50:17 +1300 Subject: [PATCH] updated to 0.1.0 --- README.md | 16 +++++++++++----- Untitled | 1 - version.sbt | 2 +- 3 files changed, 12 insertions(+), 7 deletions(-) delete mode 100644 Untitled diff --git a/README.md b/README.md index 03fbbe5..ca2d330 100644 --- a/README.md +++ b/README.md @@ -7,25 +7,28 @@ A library for reading NetFlow files from [Spark SQL](http://spark.apache.org/doc ## Requirements | Spark version | spark-netflow version | |---------------|-----------------------| -| 1.4+ | [0.0.2](http://spark-packages.org/package/sadikovi/spark-netflow) | +| 1.4+ | [0.1.0](http://spark-packages.org/package/sadikovi/spark-netflow) | ## Linking The spark-netflow library can be added to Spark by using the `--packages` command line option. For example, run this to include it when starting the spark shell: ```shell - $SPARK_HOME/bin/spark-shell --packages sadikovi:spark-netflow:0.0.2-s_2.10 + $SPARK_HOME/bin/spark-shell --packages sadikovi:spark-netflow:0.1.0-s_2.10 ``` ## Features - Column pruning +- Predicate pushdown on capture time (`unix_secs` field) +- Fields conversion (IP addresses, protocol, etc.) - NetFlow version 5 support +- NetFlow version 7 support ### Options Currently supported options: | Name | Example | Description | |------|:-------:|-------------| -| `version` | _5_ | version to use when parsing NetFlow files +| `version` | _5, 7_ | version to use when parsing NetFlow files | `buffer` | _1024, 32Kb, 3Mb, etc_ | buffer size for NetFlow compressed stream (default: 3Mb) | `stringify` | _true, false_ | convert certain fields (e.g. IP) into human-readable format (default: false) @@ -44,12 +47,15 @@ val df = sqlContext.read.format("com.github.sadikovi.spark.netflow"). option("version", "5").option("buffer", "50Mb").load("file:/...") ``` -Alternatively you can use shortcut for NetFlow v5 files +Alternatively you can use shortcuts for NetFlow files ```scala import com.github.sadikovi.spark.netflow._ // this will read version 5 with default buffer size -val df = sqlContext.read.netflow("file:/...") +val df = sqlContext.read.netflow5("file:/...") + +// this will read version 7 with fields conversion +val df = sqlContext.read.option("stringify", "true").netflow7("file:/...") ``` ### Python API diff --git a/Untitled b/Untitled deleted file mode 100644 index 500948b..0000000 --- a/Untitled +++ /dev/null @@ -1 +0,0 @@ -import com.github.sadikovi.spark.netflow.sources.NetFlowRegistry diff --git a/version.sbt b/version.sbt index 57b0bcb..e765444 100644 --- a/version.sbt +++ b/version.sbt @@ -1 +1 @@ -version in ThisBuild := "0.1.0-SNAPSHOT" +version in ThisBuild := "0.1.0"