Discussions

Expand all | Collapse all

Loading Data Connection Failed

  • 1.  Loading Data Connection Failed

    Posted 11-20-2018 19:28

    Hi, I’m using Databricks/PySpark to load a DataFrame onto OmniSci. Here is my code and the error:

    df.write.format(“jdbc”).option(“url”, “jdbc:mapd:EC2ADDRESS:9091:mapd”).option(“driver”, “com.mapd.jdbc.MapDDriver”).option(“dbtable”, “subs_dim”).option(“user”, “mapd”).option(“password”, “INSTANCEID”).save()

    Here is the error:
    java.sql.SQLException: Connection failed - org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out (Connection timed out)

    Py4JJavaError: An error occurred while calling o477.save.
    : java.sql.SQLException: Connection failed - org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out (Connection timed out)
    at com.mapd.jdbc.MapDConnection.(MapDConnection.java:113)
    at com.mapd.jdbc.MapDDriver.connect(MapDDriver.java:55)
    at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:63)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:88)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:150)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:138)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:190)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:187)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:108)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:108)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:683)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:683)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:89)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:175)
    at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:84)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:126)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:683)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:287)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:281)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)



  • 2.  RE: Loading Data Connection Failed

    Posted 11-20-2018 20:24

    Hi @mayanklive1,

    An obvious question, as I see a connection timeout error, have you opened access to port 9091 in security group? You can confirm that OmniSci is listening at port 9091 for client connections using curl.

    curl -v telnet://EC2ADDRESS:9091

    Regards,
    Veda



  • 3.  RE: Loading Data Connection Failed

    Posted 11-20-2018 23:37

    Obvious is good. In this case, I believe the port is open. I did the curl request and got this response:



  • 4.  RE: Loading Data Connection Failed

    Posted 11-21-2018 00:32

    Hi @mayanklive1,

    Thanks for checking. To further isolate the problem, have you tried testing the OmniSci JDBC interface using the sample code provided in the doc.

    Regards,
    Veda



  • 5.  RE: Loading Data Connection Failed

    Posted 11-21-2018 12:02

    I don’t know Java unfortunately.

    I ran into a similar error a year ago but I had the wrong port/did have the right jar loaded. This time I made sure to download the latest. Does the extended error tell you anything?



  • 6.  RE: Loading Data Connection Failed

    Posted 11-21-2018 15:01

    I changed the EC2 address to a bogus one and got the same error. This leads me to believe my Databricks/Spark code isn’t even reaching the instance. Is there a way to run a similar curl-type command from PySpark? Perhaps using the requests module? (Just to confirm Databricks can reach the OmniSci dB)

    I tried requests.get(""telnet://EC2Address.compute-1.amazonaws.com:9091"") but it didn’t understand that, so I tried requests.get('http://EC2ADDRESS.compute-1.amazonaws.com:9091') and nothing happened. Advice?



  • 7.  RE: Loading Data Connection Failed

    Posted 11-21-2018 18:02

    Hi @mayanklive1,

    I do not have experience testing OmniSci with Databricks/Spark, I will research and get back to you.

    Happy Thanksgiving!
    Regards,
    Veda



  • 8.  RE: Loading Data Connection Failed

    Posted 11-21-2018 23:48

    Hi @mayanklive1,

    I see that @dwayneberry has answered this question and laid out a step-by-step procedure for loading a dataset from Spark into an OmniSci table using a DataFrame write procedure. Could you please try the steps listed here.

    Regards,
    Veda



  • 9.  RE: Loading Data Connection Failed

    Posted 11-23-2018 03:01

    Thats my post from a year ago But I know the next couple steps to take to move the ball forward:

    1. Confirm OmniSci is listening at port 9091 with curl command from a different IP (not whitelisted to access VPC). If the curl command fails, then I likely need to whitelist my Spark-running EC2s to access Omnisci, as they are probably just not permitted access.

    2. Follow DwyaneBerry’s linked steps from my previous post. SSH onto OmniSci, etc.

    Will report back.

    Happy Thanksgiving!



  • 10.  RE: Loading Data Connection Failed

    Posted 11-26-2018 13:10

    @veda.shankar @dwayneberry

    Still working on this

    I’m following Dwayne Berry’s instructions from a year ago and I’m confused by this part of his instructions:

    When I ssh onto the OmniSci instance and type “build/bin/mapdql -p HyperInteractive” I get an error. “No such file or directory”

    What am I missing? Do I need to SSH onto the OmniSci instance and change something in order to write to it? Thanks

    EDIT: To confirm, here is my write command with the ec2 address/password slightly changed. Is there something wrong with my formatting?

    subs.write.format(""jdbc"").option(""url"", ""jdbc:mapd:ec2-52-292-231-169.compute-1.amazonaws.com:9091:mapd"").option(""driver"", ""com.mapd.jdbc.MapDDriver"").option(""dbtable"", ""subs_dim"").option(""user"", ""mapd"").option(""password"", ""i-09f1082e9f0320222"").save()



  • 11.  RE: Loading Data Connection Failed

    Posted 11-26-2018 15:58

    It’s unlikely you will find mapdql on this path.

    It’s located in bin subdirectory of mapd installation



  • 12.  RE: Loading Data Connection Failed

    Posted 11-26-2018 16:02

    I’m using the AWS AMI of OmniSci (community). How would I go about accessing it there?



  • 13.  RE: Loading Data Connection Failed

    Posted 11-26-2018 17:13

    Hi @mayanklive1,

    the aws image would have the build installed under /opt/mapd/, so the mapdql command is under /opt/mapd/bin directory.

    anyway I think you are experiencing a network problem between your Databricks/PySpark and Omnisci instances and this could be caused by the following reason (from the most to the least likely reason)

    1-the Omnisci instance hasn’t the public ip configured to accept incoming connection by DB/PS public ip address
    2-The DB/PS instance is configured to block outgoing connections to Omnisci’s one
    3-Omnisci instance hasn’t available connections to serve new incoming connections, so it’s queueing your connection
    4-You have an asymmetric routing problem between the source and destination hosts