Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan on adaption to Spark 2.x #82

Closed
dongx-psu opened this issue Feb 8, 2017 · 7 comments
Closed

Plan on adaption to Spark 2.x #82

dongx-psu opened this issue Feb 8, 2017 · 7 comments

Comments

@dongx-psu
Copy link
Member

dongx-psu commented Feb 8, 2017

I suggest a total rewrite for this particular task. Open a new empty branch and adding things back into the structure.

  • Find a way to hack through user defined types. For Spark 2.1, it is in a private scope because of this ticket:
    https://issues.apache.org/jira/browse/SPARK-14155.
    Maybe Encoder is the correct direction to go?
  • Dive deep into DataSet abstraction, tailor our current design to this new abstraction.
  • Reconstruct the whole project structure for better adoption to the current Spark SQL architecture.

Will keep update to this ticket.

@merlintang
Copy link

Any updating on this, for the scope issue, there are some discussion. [Spark Namespace]: Expanding Spark ML under Different Namespace?

Lots of people are running this issue to extend spark with personal functions.

one guy said "What I tend to do is keep my own code in its package and try to do as think a bridge over to it from the [private] scope. It's also important to name things obviously, say, org.apache.spark.microsoft , so stack traces in bug reports can be dealt with more easily"

Thus, we can use the apache.spark.sql.simba as well.

@dongx-psu
Copy link
Member Author

Can you forward me the link on this discussion?

@merlintang
Copy link

merlintang commented Feb 27, 2017 via email

@dongx-psu
Copy link
Member Author

Finish first migration to Spark 2.1 at branch standalone-2.1. It would be really nice if you can help us test it.

@geoHeil
Copy link

geoHeil commented Mar 15, 2017

@Skyprophet I would like to test it but have some troubles getting started with Simba as outlined #84

@MixalisV
Copy link

MixalisV commented May 29, 2017

Is there any guide that describes how to install Simba on Spark 2.1?
The only developing guide that I found, is here
(https://gitlab.com/InitialDLab/simba/blob/master/INSTALL.md) but it refers to spark 1.6

@dongx-psu
Copy link
Member Author

dongx-psu commented May 30, 2017

On the standalone branch, simply sbt package will work.

Then, you can import the package to Spark through spark-submit options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants