@lmergen great. Having other step in to fill the gap would be fantastic. Plus my hours answering questions are limited to work hours now.
hello! trying to start work with onyx-sql, and for exercise β copy data from one table to another
(def id-column :id)
(def table :generated_data)
(def copy-table :generated_data_onyx)
;; schema of both tables is equal
(def catalog
[{:onyx/name :partition-keys
:onyx/plugin :onyx.plugin.sql/partition-keys
:onyx/type :input
:onyx/medium :sql
:sql/classname (:classname config)
:sql/subprotocol (:subprotocol config)
:sql/subname (:host config)
:sql/db-name (:database config)
:sql/user (:user config)
:sql/password (:password config)
:sql/table table
:sql/id id-column
:sql/rows-per-segment 1000
:sql/columns [:*]
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Partitions a range of primary keys into subranges"
:sql/lower-bound 0
:sql/upper-bound 100000}
{:onyx/name :read-rows
;; :onyx/tenancy-ident :onyx.plugin.sql/read-rows
:onyx/fn :onyx.plugin.sql/read-rows
:onyx/type :function
:sql/classname (:classname config)
:sql/subprotocol (:subprotocol config)
:sql/subname (:host config)
:sql/db-name (:database config)
:sql/user (:user config)
:sql/password (:password config)
:sql/table table
:sql/id id-column
:onyx/batch-size batch-size
:onyx/doc "Reads rows of a SQL table bounded by a key range"}
{:onyx/name :identity
:onyx/fn :sql-data.core/rows
:onyx/type :function
:onyx/batch-size batch-size
:onyx/batch-timeout batch-timeout
:onyx/doc "identity"}
{:onyx/name :write-rows
:onyx/plugin :onyx.plugin.sql/write-rows
:onyx/type :output
:onyx/medium :sql
:sql/classname (:classname config)
:sql/subprotocol (:subprotocol config)
:sql/subname (:host config)
:sql/db-name (:database config)
:sql/user (:user config)
:sql/password (:password config)
:sql/table copy-table
:sql/copy? false
;; :sql/copy-fields [:first :second :third]
:onyx/batch-size batch-size
:onyx/doc "Writes segments from the :rows keys to the SQL database"}
])
I have this exception in logs:
clojure.lang.ExceptionInfo: Wrong number of args (1) passed to: sql/read-rows
offending-segment: {:id 1, :name "name", :price 100, :created_date #inst "2016-04-22T21:00:00.000000000-00:00", :description "description", :in_stock true}
offending-task: :read-rows
original-exception: :clojure.lang.ArityException
clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task lifecycle :lifecycle/apply-fn. Killing the job. -> Exception type: clojure.lang.ExceptionInfo. Exception message: Wrong number of args (1) passed to: sql/read-rows
job-id: #uuid "5fa02f30-675f-8c9c-8e5d-fd27609f2207"
metadata: {:job-id #uuid "5fa02f30-675f-8c9c-8e5d-fd27609f2207", :job-hash "2e8adc49564869d2ca4536a0b155de9411e5c55d78b576a4cd13411e444aaa"}
offending-segment: {:id 1, :name "name", :price 100, :created_date #inst "2016-04-22T21:00:00.000000000-00:00", :description "description", :in_stock true}
offending-task: :read-rows
original-exception: :clojure.lang.ArityException
peer-id: #uuid "cb52b552-5c86-04ed-a94f-f874dfd46aca"
task-name: :read-rows
Which args I should provide?@rustam.gilaztdinov that's a very weird error, which version of onyx are you using ?
are you sure your version of onyx is compatible with the version of the plugin ?
[org.onyxplatform/onyx "0.13.3-alpha4"]
[org.onyxplatform/onyx-sql "0.13.3.0-alpha4"]
hmm
wait a minute, this doesn't make sense, the docs are not good
Itβs probably a missing lifecycle causing an argument not to be injected into an onyx/fn?
but I donβt have any args, just identity
no, the docs are not corect
you're not supposed to call read-rows anymore since i did that refactoring to 0.10
https://github.com/onyx-platform/onyx-sql/blob/master/src/onyx/plugin/sql.clj#L116
SqlPartitioner now calls read-rows itself
@rustam.gilaztdinov as a matter of debugging, could you do something for me ? instead of this catalog entry:
{:onyx/name :read-rows
;; :onyx/tenancy-ident :onyx.plugin.sql/read-rows
:onyx/fn :onyx.plugin.sql/read-rows
:onyx/type :function
replace the :onyx/fn
with another fuction, and (println ..)
the output ?it should be {:id 1, :name "name", :price 100, :created_date #inst "2016-04-22T21:00:00.000000000-00:00", :description "description", :in_stock true}
docs need to be updated i think π
here you can see the tests also do not use read-rows anymore: https://github.com/onyx-platform/onyx-sql/blob/master/test/onyx/plugin/input_test.clj#L43
yes, I removed :read-rows
from catalog and workflow and this works!
π
actually, another question π Batch size is equal to 10 After I submit job, only one batch processed, then in log I have this exception
java.lang.NullPointerException:
clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task lifecycle :lifecycle/read-batch. Killing the job. -> Exception type: java.lang.NullPointerException. Exception message: null
job-id: #uuid "7bacb4ad-e7c8-1292-8814-b9e6a9183a7f"
metadata: {:job-id #uuid "7bacb4ad-e7c8-1292-8814-b9e6a9183a7f", :job-hash "b167932da4fa7ff4b4cff26b921fcf0576cc010fb9fbf5c607022507b3b9f6d"}
peer-id: #uuid "76fd87b1-64af-254e-1568-96a986a649b7"
task-name: :partition-keys
If I changed batch size β still have this exception and one batch processed
Is that means β I should add :lifecycle/after-batch
function or handle exception?
Sorry, Iβm pretty new to onyx, any tips will be super helpfulperhaps there is an SQL error somewhere ? can you look at your database logs to see which queries are being sent ?
NPE doesn't look too good
to be perfectly honest, I don't think the sql plugin is used that much -- I mostly use it as an output plugin, and I think I'm one of the few people actually using it π
so the chances are that you might run into some corner cases
If i change batch size to 20, all works well. So, no NPE on data. NPE comes with next batch
ok, i wouldnt be able to tell right now. it's most likely a bug in the sql plugim
Oh(
It's bad, I'm data analyst, and thinking about writing complex transformation on sql data( we have huge postgres database, and this is why i peek onyx. Clojure is so great to work with data and in combination with onyx it's promise so much. Which kind of workflow you suggest -- produce data to kafka, and work with kafka plugin and sql-plugin to output?
it completely depends upon what you want to do with the data. I would say that Onyx shines more in production workloads, for exploratory data analysis you want to keep things in a relational database or data warehouse.
in production, you could use Onyx to implement your actual algorithms, for example indeed sourcing from Kafka and writing into PostgreSQL
but it all completely depends on what you plan on doing
and it needs to be properly tailored towards your use case
There is no one size fits all solution here :)
Agree) but when data is huge, not so big for hadoop and spark -- ETL on this size of data on single machine is slow. Think, this totally onyx case.