The spear-framework provides scope to write simple ETL-connectors/pipelines for moving data from different sources to different destinations which greatly minimizes the effort of writing complex codes for data ingestion. Connectors which have the ability to extract and load (ETL or ELT) any kind of data from source with custom tansformations applied can be written and executed seamlessly using spear connectors.
- Introduction
- Pre-Requisites
- Getting started with Spear
- Build your first connector
- Example Connectors
- Contributions and License
- Explore more
Spear Framework is basically used to write connectors (ETL jobs) from a source to a target,applying business logic/transformations over the soure data and ingesting it to the corresponding destination with minimal code.
Following are the pre-requisites you need for playing around with spear:
- Need to have a linux machine with 16GB Ram and 4 CPU's for better performance
- Install docker and docker-compose using the below steps
#To install docker on centos:
============================
sudo yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
sudo yum install -y yum-utils
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io
sudo systemctl start docker
#To install docker ubuntu:
=========================
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
sudo apt-get install docker.io
sudo systemctl start docker
#install docker-compose:
=======================
sudo curl -L "https://github.com/docker/compose/releases/download/1.28.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
Below are the steps to setup spear on any machine having docker and docker-compose installed:
- Clone the repository from git and navigate to project directory
git clone https://github.com/AnudeepKonaboina/spear-framework.git && cd spear-framework
- Run setup.sh script using the command
sh setup.sh
- Once the setup is completed run the below commands for starting spear-framework on spark:
->Enter into spark-conatiner using the comamnd:
user@node~$ docker exec -it spark bash
->Run `spear-shell` to start the shell:
root@hadoop # spear-shell
NOTE: This spark shell is encpsulated with default hadoop/hive environment readily availble to read data from any source and write it to HDFS so that it gives you complete environment to play with spear-framework.
- Start writing your own single line connectors and explore .To understand how to write a connector click here
Below are the steps to write any connector:
- Get the suitable connector object using Spearconnector by providing the source and destination details as shown below:
import com.github.edge.roman.spear.SpearConnector
val connector = SpearConnector
.createConnector(name="defaultconnector")//give a name to your connector(any name)
.source(sourceType = "relational", sourceFormat = "jdbc")//source type and format for loading data
.target(targetType = "FS", targetFormat = "parquet")//target type and format for writing data to dest.
.getConnector
Below table shows all the supported source and destination types.
Below are the source and destination type combinations that spear-framework supports:
|source type | dest. type | description |
|------------ |:-------------:|:-----------------------------------------------------------:
| file | relational |connector object with file source and database as dest. |
| relational | relational |connector object with database source and database as dest. |
| stream | relational |connector object with stream source and database as dest. |
| file | FS |connector object with file source and FileSystem as dest. |
| relational | FS |connector object with database source and FileSystem as dest|
| stream | FS |connector object with stream source and FileSystem as dest. |
| FS | FS |connector object with FileS source and FileSystem as dest. |
- Write the connector logic using the connector object.
connector
//souce object and connection profile needs to be specified
.source(sourceObject="can be <filename/tablename/topic/api>", <connection profile Map((key->value))>)
//creates a temporary table on the source data with the given alias name which can be used for further transformations
.saveAs("<alias temporary table name>")
//apply custom tranformations on the loaded source data.(optional/can be applied only if necessary)
.transformSql("<transformations to be applied on source data>")
//target details where you want to load the data.
.targetFS(destinationFilePath = "<hdfs /s3/gcs file path>", saveAsTable = "<tablename>", <Savemode can be overwrite/append/ignore>)
- On completion stop the connector.
//stops the connector object
connector.stop()
- Enable verbose logging To get the output df at each stage in your connector you can explicitly enable verbose logging as below on top of connector object.
connector.setVeboseLogging(true) //default value is false.
Connector is basically the logic/code which allows you to create a pipeline from source to target using the spear framework using which you can ingest data from any source to any destination.
Spear framework supports writing data to any RDBMS with jdbc as destination(postgres/oracle/msql etc..) from various sources like a file(csv/json/filesystem etc..)/database(RDBMS/cloud db etc..)/streaming(kafka/dir path etc..).Given below are examples of few connectors with JDBC as target.Below examples are written for postgresql as JDBC target,but this can be extended for any jdbc target.
An example connector for reading csv file applying transformations and storing it into postgres table using spear:\
The input data is available in the data/us-election-2012-results-by-county.csv. Simply copy the below connector and paste it in your interactive shell and see your data being moved to a table in postgres with such minimal code !!!.
import com.github.edge.roman.spear.SpearConnector
import org.apache.log4j.{Level, Logger}
import java.util.Properties
import org.apache.spark.sql.{Column, DataFrame, SaveMode}
val properties = new Properties()
properties.put("driver", "org.postgresql.Driver");
properties.put("user", "postgres_user")
properties.put("password", "mysecretpassword")
properties.put("url", "jdbc:postgresql://postgres:5432/pgdb")
//create a connector object
val csvJdbcConnector = SpearConnector
.createConnector(name="CSVtoPostgresConnector")
.source(sourceType = "file", sourceFormat = "csv")
.target(targetType = "relational", targetFormat = "jdbc")
.getConnector
//enable verbose logging i.e.., the output after every stage is displayed in the console
csvJdbcConnector.setVeboseLogging(true)
//connector logic
csvJdbcConnector
.source(sourceObject="file:///opt/spear-framework/data/us-election-2012-results-by-county.csv", Map("header" -> "true", "inferSchema" -> "true"))
.saveAs("__tmp__")
.transformSql(
"""select state_code,party,
|sum(votes) as total_votes
|from __tmp__
|group by state_code,party""".stripMargin)
.targetJDBC(tableName="mytable", properties, SaveMode.Overwrite)
csvJdbcConnector.stop()
21/05/20 16:17:37 INFO targetjdbc.FiletoJDBC: Reading source file: file:///opt/spear-framework/data/us-election-2012-results-by-county.csv with format: csv status:success
+----------+----------+------------+-------------------+-----+----------+---------+-----+
|country_id|state_code|country_name|country_total_votes|party|first_name|last_name|votes|
+----------+----------+------------+-------------------+-----+----------+---------+-----+
|1 |AK |Alasaba |220596 |Dem |Barack |Obama |91696|
|2 |AK |Akaskak |220596 |Dem |Barack |Obama |91696|
|3 |AL |Autauga |23909 |Dem |Barack |Obama |6354 |
|4 |AK |Akaska |220596 |Dem |Barack |Obama |91696|
|5 |AL |Baldwin |84988 |Dem |Barack |Obama |18329|
|6 |AL |Barbour |11459 |Dem |Barack |Obama |5873 |
|7 |AL |Bibb |8391 |Dem |Barack |Obama |2200 |
|8 |AL |Blount |23980 |Dem |Barack |Obama |2961 |
|9 |AL |Bullock |5318 |Dem |Barack |Obama |4058 |
|10 |AL |Butler |9483 |Dem |Barack |Obama |4367 |
+----------+----------+------------+-------------------+-----+----------+---------+-----+
only showing top 10 rows
21/05/20 16:17:39 INFO targetjdbc.FiletoJDBC: Saving data as temporary table:__tmp__ success
21/05/20 16:17:39 INFO targetjdbc.FiletoJDBC: Executing tranformation sql: select state_code,party,
sum(votes) as total_votes
from __tmp__
group by state_code,party status :success
+----------+-----+-----------+
|state_code|party|total_votes|
+----------+-----+-----------+
|AL |Dem |793620 |
|NY |GOP |2226637 |
|MI |CST |16792 |
|ID |GOP |420750 |
|ID |Ind |2495 |
|WA |CST |7772 |
|HI |Grn |3121 |
|MS |RP |969 |
|MN |Grn |13045 |
|ID |Dem |212560 |
+----------+-----+-----------+
only showing top 10 rows
21/05/20 16:17:56 INFO targetjdbc.FiletoJDBC: Write data to table/object mytable completed with status:success
+----------+-----+-----------+
|state_code|party|total_votes|
+----------+-----+-----------+
|AL |Dem |793620 |
|NY |GOP |2226637 |
|MI |CST |16792 |
|ID |GOP |420750 |
|ID |Ind |2495 |
|WA |CST |7772 |
|HI |Grn |3121 |
|MS |RP |969 |
|MN |Grn |13045 |
|ID |Dem |212560 |
+----------+-----+-----------+
only showing top 10 rows
Connector for reading avro file applying transformations and storing it into postgres table using spear:
The input data is available in the data/sample_data.avro
import com.github.edge.roman.spear.SpearConnector
import org.apache.spark.sql.SaveMode
import scala.collection.JavaConverters._
import java.util.Properties
val properties = new Properties()
properties.put("driver", "org.postgresql.Driver");
properties.put("user", "postgres")
properties.put("password", "pass")
properties.put("url", "jdbc:postgresql://localhost:5432/pgdb")
val avroJdbcConnector = SpearConnector
.createConnector(name = "AvrotoPostgresConnector")
.source(sourceType = "file", sourceFormat = "avro")
.target(targetType = "relational", targetFormat = "jdbc")
.getConnector
avroJdbcConnector.setVeboseLogging(true)
avroJdbcConnector
.source(sourceObject = "file:///opt/spear-framework/data/sample_data.avro")
.saveAs("__tmp__")
.transformSql(
"""select id,
|cast(concat(first_name ,' ', last_name) as VARCHAR(255)) as name,
|coalesce(gender,'NA') as gender,
|cast(country as VARCHAR(20)) as country,
|cast(salary as DOUBLE) as salary,email
|from __tmp__""".stripMargin)
.saveAs("__transformed_table__")
.transformSql(
"""
|select id,name,
|country,email,
|salary
|from __transformed_table__""".stripMargin)
.targetJDBC(tableName = "avro_data", properties, SaveMode.Overwrite)
avroJdbcConnector.stop()
21/05/20 16:55:42 INFO targetjdbc.FiletoJDBC: Reading source file: file:///opt/spear-framework/data/sample_data.avro with format: avro status:success
+--------------------+---+----------+---------+------------------------+------+--------------+----------------+----------------------+----------+---------+------------------------+--------+
|registration_dttm |id |first_name|last_name|email |gender|ip_address |cc |country |birthdate |salary |title |comments|
+--------------------+---+----------+---------+------------------------+------+--------------+----------------+----------------------+----------+---------+------------------------+--------+
|2016-02-03T07:55:29Z|1 |Amanda |Jordan |[email protected] |Female|1.197.201.2 |6759521864920116|Indonesia |3/8/1971 |49756.53 |Internal Auditor |1E+02 |
|2016-02-03T17:04:03Z|2 |Albert |Freeman |[email protected] |Male |218.111.175.34|null |Canada |1/16/1968 |150280.17|Accountant IV | |
|2016-02-03T01:09:31Z|3 |Evelyn |Morgan |[email protected] |Female|7.161.136.94 |6767119071901597|Russia |2/1/1960 |144972.51|Structural Engineer | |
|2016-02-03T12:36:21Z|4 |Denise |Riley |[email protected] |Female|140.35.109.83 |3576031598965625|China |4/8/1997 |90263.05 |Senior Cost Accountant | |
|2016-02-03T05:05:31Z|5 |Carlos |Burns |[email protected]| |169.113.235.40|5602256255204850|South Africa | |null | | |
|2016-02-03T07:22:34Z|6 |Kathryn |White |[email protected] |Female|195.131.81.179|3583136326049310|Indonesia |2/25/1983 |69227.11 |Account Executive | |
|2016-02-03T08:33:08Z|7 |Samuel |Holmes |[email protected] |Male |232.234.81.197|3582641366974690|Portugal |12/18/1987|14247.62 |Senior Financial Analyst| |
|2016-02-03T06:47:06Z|8 |Harry |Howell |[email protected] |Male |91.235.51.73 |null |Bosnia and Herzegovina|3/1/1962 |186469.43|Web Developer IV | |
|2016-02-03T03:52:53Z|9 |Jose |Foster |[email protected] |Male |132.31.53.61 |null |South Korea |3/27/1992 |231067.84|Software Test Engineer I|1E+02 |
|2016-02-03T18:29:47Z|10 |Emily |Stewart |[email protected]|Female|143.28.251.245|3574254110301671|Nigeria |1/28/1997 |27234.28 |Health Coach IV | |
+--------------------+---+----------+---------+------------------------+------+--------------+----------------+----------------------+----------+---------+------------------------+--------+
only showing top 10 rows
21/05/20 16:55:43 INFO targetjdbc.FiletoJDBC: Saving data as temporary table:__tmp__ success
21/05/20 16:55:43 INFO targetjdbc.FiletoJDBC: Executing tranformation sql: select id,
cast(concat(first_name ,' ', last_name) as VARCHAR(255)) as name,
coalesce(gender,'NA') as gender,
cast(country as VARCHAR(20)) as country,
cast(salary as DOUBLE) as salary,email
from __tmp__ status :success
+---+--------------+------+----------------------+---------+------------------------+
|id |name |gender|country |salary |email |
+---+--------------+------+----------------------+---------+------------------------+
|1 |Amanda Jordan |Female|Indonesia |49756.53 |[email protected] |
|2 |Albert Freeman|Male |Canada |150280.17|[email protected] |
|3 |Evelyn Morgan |Female|Russia |144972.51|[email protected] |
|4 |Denise Riley |Female|China |90263.05 |[email protected] |
|5 |Carlos Burns | |South Africa |null |[email protected]|
|6 |Kathryn White |Female|Indonesia |69227.11 |[email protected] |
|7 |Samuel Holmes |Male |Portugal |14247.62 |[email protected] |
|8 |Harry Howell |Male |Bosnia and Herzegovina|186469.43|[email protected] |
|9 |Jose Foster |Male |South Korea |231067.84|[email protected] |
|10 |Emily Stewart |Female|Nigeria |27234.28 |[email protected]|
+---+--------------+------+----------------------+---------+------------------------+
only showing top 10 rows
21/05/20 16:55:45 INFO targetjdbc.FiletoJDBC: Saving data as temporary table:__transformed_table__ success
21/05/20 16:55:45 INFO targetjdbc.FiletoJDBC: Executing tranformation sql:
select id,name,
country,email,
salary
from __transformed_table__ status :success
+---+--------------+----------------------+------------------------+---------+
|id |name |country |email |salary |
+---+--------------+----------------------+------------------------+---------+
|1 |Amanda Jordan |Indonesia |[email protected] |49756.53 |
|2 |Albert Freeman|Canada |[email protected] |150280.17|
|3 |Evelyn Morgan |Russia |[email protected] |144972.51|
|4 |Denise Riley |China |[email protected] |90263.05 |
|5 |Carlos Burns |South Africa |[email protected]|null |
|6 |Kathryn White |Indonesia |[email protected] |69227.11 |
|7 |Samuel Holmes |Portugal |[email protected] |14247.62 |
|8 |Harry Howell |Bosnia and Herzegovina|[email protected] |186469.43|
|9 |Jose Foster |South Korea |[email protected] |231067.84|
|10 |Emily Stewart |Nigeria |[email protected]|27234.28 |
+---+--------------+----------------------+------------------------+---------+
only showing top 10 rows
21/05/20 16:55:48 INFO targetjdbc.FiletoJDBC: Write data to table/object avro_data completed with status:success
+---+--------------+----------------------+------------------------+---------+
|id |name |country |email |salary |
+---+--------------+----------------------+------------------------+---------+
|1 |Amanda Jordan |Indonesia |[email protected] |49756.53 |
|2 |Albert Freeman|Canada |[email protected] |150280.17|
|3 |Evelyn Morgan |Russia |[email protected] |144972.51|
|4 |Denise Riley |China |[email protected] |90263.05 |
|5 |Carlos Burns |South Africa |[email protected]|null |
|6 |Kathryn White |Indonesia |[email protected] |69227.11 |
|7 |Samuel Holmes |Portugal |[email protected] |14247.62 |
|8 |Harry Howell |Bosnia and Herzegovina|[email protected] |186469.43|
|9 |Jose Foster |South Korea |[email protected] |231067.84|
|10 |Emily Stewart |Nigeria |[email protected]|27234.28 |
+---+--------------+----------------------+------------------------+---------+
only showing top 10 rows
import com.github.edge.roman.spear.SpearConnector
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SaveMode
import java.util.Properties
val properties = new Properties()
properties.put("driver", "org.postgresql.Driver");
properties.put("user", "postgres")
properties.put("password", "pass")
properties.put("url", "jdbc:postgresql://localhost:5432/pgdb")
Logger.getLogger("com.github").setLevel(Level.INFO)
val oracleTOPostgresConnector = SpearConnector
.createConnector(name="OracletoPostgresConnector")
.source(sourceType = "relational", sourceFormat = "jdbc")
.target(targetType = "relational", targetFormat = "jdbc")
.getConnector
oracleTOPostgresConnector.setVeboseLogging(true)
oracleTOPostgresConnector
.sourceSql(Map("driver" -> "oracle.jdbc.driver.OracleDriver", "user" -> "user", "password" -> "pass", "url" -> "jdbc:oracle:thin:@ora-host:1521:orcl"),
"""
|SELECT
| to_char(sys_extract_utc(systimestamp), 'YYYY-MM-DD HH24:MI:SS.FF') as ingest_ts_utc,
| to_char(TIMESTAMP_0, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_0,
| to_char(TIMESTAMP_5, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_5,
| to_char(TIMESTAMP_7, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_7,
| to_char(TIMESTAMP_9, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_9,
| to_char(TIMESTAMP0_WITH_TZ) as timestamp0_with_tz , to_char(sys_extract_utc(TIMESTAMP0_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS') as timestamp0_with_tz_utc,
| to_char(TIMESTAMP5_WITH_TZ) as timestamp5_with_tz , to_char(sys_extract_utc(TIMESTAMP5_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp5_with_tz_utc,
| to_char(TIMESTAMP8_WITH_TZ) as timestamp8_with_tz , to_char(sys_extract_utc(TIMESTAMP8_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp8_with_tz_utc,
| to_char(TIMESTAMP0_WITH_LTZ) as timestamp0_with_ltz , to_char(sys_extract_utc(TIMESTAMP0_WITH_LTZ), 'YYYY-MM-DD HH24:MI:SS') as timestamp0_with_ltz_utc,
| to_char(TIMESTAMP5_WITH_LTZ) as timestamp5_with_ltz , to_char(sys_extract_utc(TIMESTAMP5_WITH_LTZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp5_with_ltz_utc,
| to_char(TIMESTAMP8_WITH_LTZ) as timestamp8_with_ltz , to_char(sys_extract_utc(TIMESTAMP8_WITH_LTZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp8_with_ltz_utc
| from DBSRV.ORACLE_TIMESTAMPS
|""".stripMargin)
.saveAs("__source__")
.transformSql(
"""
|SELECT
| TO_TIMESTAMP(ingest_ts_utc) as ingest_ts_utc,
| TIMESTAMP_0 as timestamp_0,
| TIMESTAMP_5 as timestamp_5,
| TIMESTAMP_7 as timestamp_7,
| TIMESTAMP_9 as timestamp_9,
| TIMESTAMP0_WITH_TZ as timestamp0_with_tz,TIMESTAMP0_WITH_TZ_utc as timestamp0_with_tz_utc,
| TIMESTAMP5_WITH_TZ as timestamp5_with_tz,TIMESTAMP5_WITH_TZ_utc as timestamp5_with_tz_utc,
| TIMESTAMP8_WITH_TZ as timestamp8_with_tz,TIMESTAMP8_WITH_TZ_utc as timestamp8_with_tz_utc,
| TIMESTAMP0_WITH_LTZ as timestamp0_with_ltz,TIMESTAMP0_WITH_LTZ_utc as timestamp0_with_ltz_utc,
| TIMESTAMP5_WITH_LTZ as timestamp5_with_ltz,TIMESTAMP5_WITH_LTZ_utc as timestamp5_with_ltz_utc,
| TIMESTAMP8_WITH_LTZ as timestamp8_with_ltz,TIMESTAMP8_WITH_LTZ_utc as timestamp8_with_ltz_utc
| from __source__
|""".stripMargin)
.targetJDBC(tableName = "pgdb.ora_to_postgres", properties, SaveMode.Overwrite)
oracleTOPostgresConnector.stop()
21/05/04 17:35:50 INFO targetjdbc.JDBCtoJDBC: Executing source sql query:
SELECT
to_char(sys_extract_utc(systimestamp), 'YYYY-MM-DD HH24:MI:SS.FF') as ingest_ts_utc,
to_char(TIMESTAMP_0, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_0,
to_char(TIMESTAMP_5, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_5,
to_char(TIMESTAMP_7, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_7,
to_char(TIMESTAMP_9, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_9,
to_char(TIMESTAMP0_WITH_TZ) as timestamp0_with_tz , to_char(sys_extract_utc(TIMESTAMP0_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS') as timestamp0_with_tz_utc,
to_char(TIMESTAMP5_WITH_TZ) as timestamp5_with_tz , to_char(sys_extract_utc(TIMESTAMP5_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp5_with_tz_utc,
to_char(TIMESTAMP8_WITH_TZ) as timestamp8_with_tz , to_char(sys_extract_utc(TIMESTAMP8_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp8_with_tz_utc
from DBSRV.ORACLE_TIMESTAMPS
21/05/04 17:35:50 INFO targetjdbc.JDBCtoJDBC: Data is saved as a temporary table by name: __source__
21/05/04 17:35:50 INFO targetjdbc.JDBCtoJDBC: showing saved data from temporary table with name: __source__
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|INGEST_TS_UTC |TIMESTAMP_0 |TIMESTAMP_5 |TIMESTAMP_7 |TIMESTAMP_9 |TIMESTAMP0_WITH_TZ |TIMESTAMP0_WITH_TZ_UTC|TIMESTAMP5_WITH_TZ |TIMESTAMP5_WITH_TZ_UTC |TIMESTAMP8_WITH_TZ |TIMESTAMP8_WITH_TZ_UTC |
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|2021-05-04 17:35:50.620944|2021-04-07 15:15:16.0|2021-04-07 15:15:16.03356|2021-04-07 15:15:16.0335610|2021-04-07 15:15:16.033561000|07-APR-21 03.15.16 PM ASIA/CALCUTTA|2021-04-07 09:45:16 |07-APR-21 03.15.16.03356 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356|07-APR-21 03.15.16.03356100 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356100|
|2021-05-04 17:35:50.620944|2021-04-07 15:16:51.6|2021-04-07 15:16:51.60911|2021-04-07 15:16:51.6091090|2021-04-07 15:16:51.609109000|07-APR-21 03.16.52 PM ASIA/CALCUTTA|2021-04-07 09:46:52 |07-APR-21 03.16.51.60911 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60911|07-APR-21 03.16.51.60910900 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60910900|
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
21/05/04 17:35:50 INFO targetjdbc.JDBCtoJDBC: Data after transformation using the SQL :
SELECT
TO_TIMESTAMP(ingest_ts_utc) as ingest_ts_utc,
TIMESTAMP_0 as timestamp_0,
TIMESTAMP_5 as timestamp_5,
TIMESTAMP_7 as timestamp_7,
TIMESTAMP_9 as timestamp_9,
TIMESTAMP0_WITH_TZ as timestamp0_with_tz,TIMESTAMP0_WITH_TZ_utc as timestamp0_with_tz_utc,
TIMESTAMP5_WITH_TZ as timestamp5_with_tz,TIMESTAMP5_WITH_TZ_utc as timestamp5_with_tz_utc,
TIMESTAMP8_WITH_TZ as timestamp8_with_tz,TIMESTAMP8_WITH_TZ_utc as timestamp8_with_tz_utc
from __source__
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|ingest_ts_utc |timestamp_0 |timestamp_5 |timestamp_7 |timestamp_9 |timestamp0_with_tz |timestamp0_with_tz_utc|timestamp5_with_tz |timestamp5_with_tz_utc |timestamp8_with_tz |timestamp8_with_tz_utc |
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|2021-05-04 17:35:50.818643|2021-04-07 15:15:16.0|2021-04-07 15:15:16.03356|2021-04-07 15:15:16.0335610|2021-04-07 15:15:16.033561000|07-APR-21 03.15.16 PM ASIA/CALCUTTA|2021-04-07 09:45:16 |07-APR-21 03.15.16.03356 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356|07-APR-21 03.15.16.03356100 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356100|
|2021-05-04 17:35:50.818643|2021-04-07 15:16:51.6|2021-04-07 15:16:51.60911|2021-04-07 15:16:51.6091090|2021-04-07 15:16:51.609109000|07-APR-21 03.16.52 PM ASIA/CALCUTTA|2021-04-07 09:46:52 |07-APR-21 03.16.51.60911 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60911|07-APR-21 03.16.51.60910900 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60910900|
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
21/05/04 17:35:50 INFO targetjdbc.JDBCtoJDBC: Writing data to target table: pgdb.ora_to_postgres
21/05/04 17:35:56 INFO targetjdbc.JDBCtoJDBC: Showing data in target table : pgdb.ora_to_postgres
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|ingest_ts_utc |timestamp_0 |timestamp_5 |timestamp_7 |timestamp_9 |timestamp0_with_tz |timestamp0_with_tz_utc|timestamp5_with_tz |timestamp5_with_tz_utc |timestamp8_with_tz |timestamp8_with_tz_utc |
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|2021-05-04 17:35:52.709503|2021-04-07 15:15:16.0|2021-04-07 15:15:16.03356|2021-04-07 15:15:16.0335610|2021-04-07 15:15:16.033561000|07-APR-21 03.15.16 PM ASIA/CALCUTTA|2021-04-07 09:45:16 |07-APR-21 03.15.16.03356 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356|07-APR-21 03.15.16.03356100 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356100|
|2021-05-04 17:35:52.709503|2021-04-07 15:16:51.6|2021-04-07 15:16:51.60911|2021-04-07 15:16:51.6091090|2021-04-07 15:16:51.609109000|07-APR-21 03.16.52 PM ASIA/CALCUTTA|2021-04-07 09:46:52 |07-APR-21 03.16.51.60911 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60911|07-APR-21 03.16.51.60910900 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60910900|
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
While writing to Salesforce,below are the pre-requisites one must take care of:
- A salesforce object/dataset must be available at Salesforce as spark will not create an object automatically.
- Also the columns/fields in source data or dataframe must match the fileds in the destination salesforce object
import com.github.edge.roman.spear.SpearConnector
import org.apache.spark.sql.{Column, DataFrame, SaveMode}
import java.util.Properties
val targetproperties = new Properties()
targetproperties.put("username", "user")
targetproperties.put("password", "pass<TOKEN>")
val postgresSalesForce = SpearConnector
.createConnector("POSTGRES-SALESFORCE")
.source(sourceType = "relational", sourceFormat = "jdbc")
.target(targetType = "relational", targetFormat = "soql")
.getConnector
postgresSalesForce
.source(sourceObject = "salesforce_source", Map("driver" -> "org.postgresql.Driver", "user" -> "postgres_user", "password" -> "mysecretpassword", "url" -> "jdbc:postgresql://postgres:5432/pgdb"))
.saveAs("__tmp__")
.transformSql(
"""
|select person_id as person_id__c,
|name as person_name__c
|from __tmp__""".stripMargin)
.targetJDBC(tableName = "Sample__c", targetproperties, SaveMode.Overwrite)
postgresSalesForce.stop()
import com.github.edge.roman.spear.SpearConnector
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
import java.util.Properties
val properties = new Properties()
properties.put("driver", "org.postgresql.Driver");
properties.put("user", "postgres_user")
properties.put("password", "mysecretpassword")
properties.put("url", "jdbc:postgresql://postgres:5432/pgdb")
val streamTOPostgres=SpearConnector
.createConnector(name="StreamKafkaToPostgresconnector")
.source(sourceType = "stream",sourceFormat = "kafka")
.target(targetType = "relational",targetFormat = "jdbc")
.getConnector
val schema = StructType(
Array(StructField("id", StringType),
StructField("name", StringType)
))
streamTOPostgres
.source(sourceObject = "stream_topic",Map("kafka.bootstrap.servers"-> "kafka:9092","failOnDataLoss"->"true","startingOffsets"-> "earliest"),schema)
.saveAs("__tmp2__")
.transformSql("select cast (id as INT) as id, name from __tmp2__")
.targetJDBC(tableName="person", properties, SaveMode.Append)
streamTOPostgres.stop()
import com.github.edge.roman.spear.SpearConnector
import org.apache.spark.sql.SaveMode
import org.apache.log4j.{Level, Logger}
import java.util.Properties
Logger.getLogger("com.github").setLevel(Level.INFO)
val postgresToHiveConnector = SpearConnector
.createConnector(name="PostgrestoHiveConnector")
.source(sourceType = "relational", sourceFormat = "jdbc")
.target(targetType = "FS", targetFormat = "parquet")
.getConnector
postgresToHiveConnector.setVeboseLogging(true)
postgresToHiveConnector
.source("source_db.instance", Map("driver" -> "org.postgresql.Driver", "user" -> "postgres", "password" -> "test", "url" -> "jdbc:postgresql://postgres-host:5433/source_db"))
.saveAs("__tmp__")
.transformSql(
"""
|select cast( uuid as string) as uuid,
|cast( type_id as bigint ) as type_id,
|cast( factory_message_process_id as bigint) as factory_message_process_id,
|cast( factory_uuid as string ) as factory_uuid,
|cast( factory_id as bigint ) as factory_id,
|cast( engine_id as bigint ) as engine_id,
|cast( topic as string ) as topic,
|cast( status_code_id as int) as status_code_id,
|cast( cru_by as string ) as cru_by,cast( cru_ts as timestamp) as cru_ts
|from __tmp__""".stripMargin)
.targetFS(destinationFilePath = "/tmp/ingest_test.db", saveAsTable = "ingest_test.postgres_data", SaveMode.Overwrite)
postgresToHiveConnector.stop()
21/05/01 10:39:20 INFO targetFS.JDBCtoFS: Reading source data from table: source_db.instance
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
|uuid |type_id |factory_message_process_id |factory_uuid |factory_id | engine_id |topic |status_code_id|cru_by|cru_ts |
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
|null |1 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |factory_2_2 |5 |ABCDE |2021-04-27 10:17:37.529195|
|null |1 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |factory_1_1 |5 |ABCDE |2021-04-27 10:17:37.533318|
|null |1 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |factory_3_3 |5 |ABCDE |2021-04-27 10:17:37.535323|
|59d9b23e-ff93-4351-af7e-0a95ec4fde65|10 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_authtoken |5 |ABCDE |2021-04-27 10:17:50.441147|
|111eeff6-c61d-402e-9e70-615cf80d3016|10 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |bale_2_authtoken |5 |ABCDE |2021-04-27 10:18:02.439379|
|2870ff43-73c9-424e-9f3c-c89ac4dda278|10 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |bale_3_authtoken |5 |ABCDE |2021-04-27 10:18:14.5242 |
|58fe7575-9c4f-471e-8893-9bc39b4f1be4|18 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |bale_3_errorbot |5 |ABCDE |2021-04-27 10:21:17.098984|
|534a2af0-af74-4633-8603-926070afd76f|16 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |bale_2_filter_resolver_jdbc|5 |ABCDE |2021-04-27 10:21:17.223042|
|9971130b-9ae1-4a53-89ce-aa1932534956|18 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_errorbot |5 |ABCDE |2021-04-27 10:21:17.437489|
|6db9c72f-85b0-4254-bc2f-09dc1e63e6f3|9 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_flowcontroller |5 |ABCDE |2021-04-27 10:21:17.780313|
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
only showing top 10 rows
21/05/01 10:39:31 INFO targetFS.JDBCtoFS: Data is saved as a temporary table by name: __tmp__
21/05/01 10:39:31 INFO targetFS.JDBCtoFS: showing saved data from temporary table with name: __tmp__
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
|uuid |type_id |factory_message_process_id |factory_uuid |factory_id | engine_id |topic |status_code_id|cru_by|cru_ts |
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
|null |1 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |factory_2_2 |5 |ABCDE |2021-04-27 10:17:37.529195|
|null |1 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |factory_1_1 |5 |ABCDE |2021-04-27 10:17:37.533318|
|null |1 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |factory_3_3 |5 |ABCDE |2021-04-27 10:17:37.535323|
|59d9b23e-ff93-4351-af7e-0a95ec4fde65|10 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_authtoken |5 |ABCDE |2021-04-27 10:17:50.441147|
|111eeff6-c61d-402e-9e70-615cf80d3016|10 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |bale_2_authtoken |5 |ABCDE |2021-04-27 10:18:02.439379|
|2870ff43-73c9-424e-9f3c-c89ac4dda278|10 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |bale_3_authtoken |5 |ABCDE |2021-04-27 10:18:14.5242 |
|58fe7575-9c4f-471e-8893-9bc39b4f1be4|18 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |bale_3_errorbot |5 |ABCDE |2021-04-27 10:21:17.098984|
|534a2af0-af74-4633-8603-926070afd76f|16 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |bale_2_filter_resolver_jdbc|5 |ABCDE |2021-04-27 10:21:17.223042|
|9971130b-9ae1-4a53-89ce-aa1932534956|18 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_errorbot |5 |ABCDE |2021-04-27 10:21:17.437489|
|6db9c72f-85b0-4254-bc2f-09dc1e63e6f3|9 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_flowcontroller |5 |ABCDE |2021-04-27 10:21:17.780313|
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
only showing top 10 rows
21/05/01 10:39:33 INFO targetFS.JDBCtoFS: Data after transformation using the SQL :
select cast( uuid as string) as uuid,
cast( type_id as bigint ) as type_id,
cast( factory_message_process_id as bigint) as factory_message_process_id,
cast( factory_uuid as string ) as factory_uuid,
cast( factory_id as bigint ) as factory_id,
cast( workflow_engine_id as bigint ) as workflow_engine_id,
cast( topic as string ) as topic,
cast( status_code_id as int) as status_code_id,
cast( cru_by as string ) as cru_by,cast( cru_ts as timestamp) as cru_ts
from __tmp__
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
|uuid |type_id|factory_message_process_id |factory_uuid |factory_id |engine_id |topic |status_code_id|cru_by|cru_ts |
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
|null |1 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |factory_2_2 |5 |ABCDE |2021-04-27 10:17:37.529195|
|null |1 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |factory_1_1 |5 |ABCDE |2021-04-27 10:17:37.533318|
|null |1 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |factory_3_3 |5 |ABCDE |2021-04-27 10:17:37.535323|
|59d9b23e-ff93-4351-af7e-0a95ec4fde65|10 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_authtoken |5 |ABCDE |2021-04-27 10:17:50.441147|
|111eeff6-c61d-402e-9e70-615cf80d3016|10 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |bale_2_authtoken |5 |ABCDE |2021-04-27 10:18:02.439379|
|2870ff43-73c9-424e-9f3c-c89ac4dda278|10 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |bale_3_authtoken |5 |ABCDE |2021-04-27 10:18:14.5242 |
|58fe7575-9c4f-471e-8893-9bc39b4f1be4|18 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |bale_3_errorbot |5 |ABCDE |2021-04-27 10:21:17.098984|
|534a2af0-af74-4633-8603-926070afd76f|16 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |bale_2_filter_resolver_jdbc|5 |ABCDE |2021-04-27 10:21:17.223042|
|9971130b-9ae1-4a53-89ce-aa1932534956|18 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_errorbot |5 |ABCDE |2021-04-27 10:21:17.437489|
|6db9c72f-85b0-4254-bc2f-09dc1e63e6f3|9 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_flowcontroller |5 |ABCDE |2021-04-27 10:21:17.780313|
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
only showing top 10 rows
21/05/01 10:39:35 INFO targetFS.JDBCtoFS: Writing data to target file: /tmp/ingest_test.db
21/05/01 10:39:35 INFO targetFS.JDBCtoFS: Saving data to table:ingest_test.postgres_data
21/05/01 10:39:35 INFO targetFS.JDBCtoFS: Target Data in table:ingest_test.postgres_data
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
|uuid |type_id |factory_message_process_id |factory_uuid |factory_id | engine_id |topic |status_code_id|cru_by|cru_ts |
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
|null |1 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |factory_2_2 |5 |ABCDE |2021-04-27 10:17:37.529195|
|null |1 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |factory_1_1 |5 |ABCDE |2021-04-27 10:17:37.533318|
|null |1 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |factory_3_3 |5 |ABCDE |2021-04-27 10:17:37.535323|
|59d9b23e-ff93-4351-af7e-0a95ec4fde65|10 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_authtoken |5 |ABCDE |2021-04-27 10:17:50.441147|
|111eeff6-c61d-402e-9e70-615cf80d3016|10 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |bale_2_authtoken |5 |ABCDE |2021-04-27 10:18:02.439379|
|2870ff43-73c9-424e-9f3c-c89ac4dda278|10 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |bale_3_authtoken |5 |ABCDE |2021-04-27 10:18:14.5242 |
|58fe7575-9c4f-471e-8893-9bc39b4f1be4|18 |1619518658043 |5ef4bcb3-f064-4532-ad4f-5e8b68c33f70|3 |3 |bale_3_errorbot |5 |ABCDE |2021-04-27 10:21:17.098984|
|534a2af0-af74-4633-8603-926070afd76f|16 |1619518657679 |b218b4a2-2723-4a51-a83b-1d9e5e1c79ff|2 |2 |bale_2_filter_resolver_jdbc|5 |ABCDE |2021-04-27 10:21:17.223042|
|9971130b-9ae1-4a53-89ce-aa1932534956|18 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_errorbot |5 |ABCDE |2021-04-27 10:21:17.437489|
|6db9c72f-85b0-4254-bc2f-09dc1e63e6f3|9 |1619518657481 |ec65395c-fdbc-4697-ac91-bc72447ae7cf|1 |1 |bale_1_flowcontroller |5 |ABCDE |2021-04-27 10:21:17.780313|
+------------------------------------+-----------+------------------------------+------------------------------------+--------------+------------------+---------------------------+--------------+------+--------------------------+
only showing top 10 rows
import com.github.edge.roman.spear.SpearConnector
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
val streamTOHdfs=SpearConnector
.createConnector(name="StreamKafkaToPostgresconnector")
.source(sourceType = "stream",sourceFormat = "kafka")
.target(targetType = "FS",targetFormat = "parquet")
.getConnector
val schema = StructType(
Array(StructField("id", StringType),
StructField("name", StringType)
))
streamTOHdfs
.source(sourceObject = "stream_topic",Map("kafka.bootstrap.servers"-> "kafka:9092","failOnDataLoss"->"true","startingOffsets"-> "earliest"),schema)
.saveAs("__tmp2__")
.transformSql("select cast (id as INT), name as __tmp2__")
.targetFS(destinationFilePath = "/tmp/ingest_test.db", saveAsTable = "ingest_test.ora_data", SaveMode.Append)
streamTOHdfs.stop()
import com.github.edge.roman.spear.SpearConnector
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SaveMode
Logger.getLogger("com.github").setLevel(Level.INFO)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "*****")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "*****")
val oracleTOS3Connector = SpearConnector.init
.source(sourceType = "relational", sourceFormat = "jdbc")
.target(targetType = "FS", targetFormat = "parquet")
.withName(connectorName ="OracleToS3Connector" )
.getConnector
oracleTOS3Connector.setVeboseLogging(true)
oracleTOS3Connector
.sourceSql(Map("driver" -> "oracle.jdbc.driver.OracleDriver", "user" -> "user", "password" -> "pass", "url" -> "jdbc:oracle:thin:@ora-host:1521:orcl"),
"""
|SELECT
| to_char(sys_extract_utc(systimestamp), 'YYYY-MM-DD HH24:MI:SS.FF') as ingest_ts_utc,
| to_char(TIMESTAMP_0, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_0,
| to_char(TIMESTAMP_5, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_5,
| to_char(TIMESTAMP_7, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_7,
| to_char(TIMESTAMP_9, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_9,
| to_char(TIMESTAMP0_WITH_TZ) as timestamp0_with_tz , to_char(sys_extract_utc(TIMESTAMP0_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS') as timestamp0_with_tz_utc,
| to_char(TIMESTAMP5_WITH_TZ) as timestamp5_with_tz , to_char(sys_extract_utc(TIMESTAMP5_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp5_with_tz_utc,
| to_char(TIMESTAMP8_WITH_TZ) as timestamp8_with_tz , to_char(sys_extract_utc(TIMESTAMP8_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp8_with_tz_utc
| from DBSRV.ORACLE_NUMBER
|""".stripMargin)
.saveAs("__source__")
.transformSql(
"""
|SELECT
| TO_TIMESTAMP(ingest_ts_utc) as ingest_ts_utc,
| TIMESTAMP_0 as timestamp_0,
| TIMESTAMP_5 as timestamp_5,
| TIMESTAMP_7 as timestamp_7,
| TIMESTAMP_9 as timestamp_9,
| TIMESTAMP0_WITH_TZ as timestamp0_with_tz,TIMESTAMP0_WITH_TZ_utc as timestamp0_with_tz_utc,
| TIMESTAMP5_WITH_TZ as timestamp5_with_tz,TIMESTAMP5_WITH_TZ_utc as timestamp5_with_tz_utc,
| TIMESTAMP8_WITH_TZ as timestamp8_with_tz,TIMESTAMP8_WITH_TZ_utc as timestamp8_with_tz_utc
| from __source__
|""".stripMargin)
.targetFS(destinationFilePath="s3a://destination/data",SaveMode.Overwrite)
oracleTOS3Connector.stop()
21/05/08 08:46:11 INFO targetFS.JDBCtoFS: Executing source sql query:
SELECT
to_char(sys_extract_utc(systimestamp), 'YYYY-MM-DD HH24:MI:SS.FF') as ingest_ts_utc,
to_char(TIMESTAMP_0, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_0,
to_char(TIMESTAMP_5, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_5,
to_char(TIMESTAMP_7, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_7,
to_char(TIMESTAMP_9, 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp_9,
to_char(TIMESTAMP0_WITH_TZ) as timestamp0_with_tz , to_char(sys_extract_utc(TIMESTAMP0_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS') as timestamp0_with_tz_utc,
to_char(TIMESTAMP5_WITH_TZ) as timestamp5_with_tz , to_char(sys_extract_utc(TIMESTAMP5_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp5_with_tz_utc,
to_char(TIMESTAMP8_WITH_TZ) as timestamp8_with_tz , to_char(sys_extract_utc(TIMESTAMP8_WITH_TZ), 'YYYY-MM-DD HH24:MI:SS.FF') as timestamp8_with_tz_utc
from DBSRV.ORACLE_TIMESTAMPS
21/05/08 08:46:11 INFO targetFS.JDBCtoFS: Data is saved as a temporary table by name: __source__
21/05/08 08:46:11 INFO targetFS.JDBCtoFS: Showing saved data from temporary table with name: __source__
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|INGEST_TS_UTC |TIMESTAMP_0 |TIMESTAMP_5 |TIMESTAMP_7 |TIMESTAMP_9 |TIMESTAMP0_WITH_TZ |TIMESTAMP0_WITH_TZ_UTC|TIMESTAMP5_WITH_TZ |TIMESTAMP5_WITH_TZ_UTC |TIMESTAMP8_WITH_TZ |TIMESTAMP8_WITH_TZ_UTC |
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|2021-05-08 08:46:12.178719|2021-04-07 15:15:16.0|2021-04-07 15:15:16.03356|2021-04-07 15:15:16.0335610|2021-04-07 15:15:16.033561000|07-APR-21 03.15.16 PM ASIA/CALCUTTA|2021-04-07 09:45:16 |07-APR-21 03.15.16.03356 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356|07-APR-21 03.15.16.03356100 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356100|
|2021-05-08 08:46:12.178719|2021-04-07 15:16:51.6|2021-04-07 15:16:51.60911|2021-04-07 15:16:51.6091090|2021-04-07 15:16:51.609109000|07-APR-21 03.16.52 PM ASIA/CALCUTTA|2021-04-07 09:46:52 |07-APR-21 03.16.51.60911 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60911|07-APR-21 03.16.51.60910900 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60910900|
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
21/05/08 08:46:12 INFO targetFS.JDBCtoFS: Data after transformation using the SQL :
SELECT
TO_TIMESTAMP(ingest_ts_utc) as ingest_ts_utc,
TIMESTAMP_0 as timestamp_0,
TIMESTAMP_5 as timestamp_5,
TIMESTAMP_7 as timestamp_7,
TIMESTAMP_9 as timestamp_9,
TIMESTAMP0_WITH_TZ as timestamp0_with_tz,TIMESTAMP0_WITH_TZ_utc as timestamp0_with_tz_utc,
TIMESTAMP5_WITH_TZ as timestamp5_with_tz,TIMESTAMP5_WITH_TZ_utc as timestamp5_with_tz_utc,
TIMESTAMP8_WITH_TZ as timestamp8_with_tz,TIMESTAMP8_WITH_TZ_utc as timestamp8_with_tz_utc
from __source__
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|ingest_ts_utc |timestamp_0 |timestamp_5 |timestamp_7 |timestamp_9 |timestamp0_with_tz |timestamp0_with_tz_utc|timestamp5_with_tz |timestamp5_with_tz_utc |timestamp8_with_tz |timestamp8_with_tz_utc |
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
|2021-05-08 08:46:12.438578|2021-04-07 15:15:16.0|2021-04-07 15:15:16.03356|2021-04-07 15:15:16.0335610|2021-04-07 15:15:16.033561000|07-APR-21 03.15.16 PM ASIA/CALCUTTA|2021-04-07 09:45:16 |07-APR-21 03.15.16.03356 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356|07-APR-21 03.15.16.03356100 PM ASIA/CALCUTTA|2021-04-07 09:45:16.03356100|
|2021-05-08 08:46:12.438578|2021-04-07 15:16:51.6|2021-04-07 15:16:51.60911|2021-04-07 15:16:51.6091090|2021-04-07 15:16:51.609109000|07-APR-21 03.16.52 PM ASIA/CALCUTTA|2021-04-07 09:46:52 |07-APR-21 03.16.51.60911 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60911|07-APR-21 03.16.51.60910900 PM ASIA/CALCUTTA|2021-04-07 09:46:51.60910900|
+--------------------------+---------------------+-------------------------+---------------------------+-----------------------------+-----------------------------------+----------------------+-----------------------------------------+-------------------------+--------------------------------------------+----------------------------+
21/05/08 08:46:12 INFO targetFS.JDBCtoFS: Writing data to target path: s3a://destination/data
21/05/08 08:47:06 INFO targetFS.JDBCtoFS: Saving data to path:s3a://destination/data
Data at S3:
===========
user@node:~$ aws s3 ls s3://destination/data
2021-05-08 12:10:09 0 _SUCCESS
2021-05-08 12:09:59 4224 part-00000-71fad52e-404d-422c-a6af-7889691bc506-c000.snappy.parquet
Software Licensed under the Apache License 2.0
Anudeep Konaboina [email protected]
Kayan Deshi [email protected]
To explore various connectors written for data movement from different sources to differnrt targets, visit github page here