planck

Planck ClojureScript REPL
johanatan 2016-06-30T03:47:45.000594Z

@mfikes: so I got it to link to a debug JavaScriptCore (I think) with the help of one of the webkit devs (whose bug was actually responsible for the WTFCrash I pasted above and which was fixed in the last 2 days).

johanatan 2016-06-30T03:48:22.000595Z

[so, my hacked together libtool-joined mega-.o file approach was giving the same result as dynamic linking-- I was just hitting a bug in webkit]

johanatan 2016-06-30T03:48:44.000596Z

anyway, now that their bug is fixed, this is happening:

Jonathans-MacBook-Pro:planck-c jonathan$ ./planck 
Planck 2.0
ClojureScript 1.9.89
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
    Exit: Control+D or :cljs/quit or exit or quit
 Results: Stored in vars *1, *2, *3, an exception in *e

cljs.user=> (require '[planck.shell])
Could not require cljs.source-map.base64-vlq
Maximum call stack size exceeded.
nil

johanatan 2016-06-30T03:48:53.000597Z

any idea on that?

johanatan 2016-06-30T03:51:06.000598Z

[oh, there was one other tip the webkit dev gave me which is crucial to making this work: you need to set an env var DYLD_FRAMEWORK_PATH to point to the path where your debug JavaScriptCore.framework is located before starting the executable]

johanatan 2016-06-30T03:52:46.000599Z

[essentially since this involves with dynamic linking, you can do the compile/link time stuff with either copy of the framework and that env var above determines which one will actually be used at runtime]

johanatan 2016-06-30T04:02:20.000600Z

[my best guess on the Could not require cljs.source-map.base64-vlq. Maximum call stack size exceeded is that the behavior of JavaScriptCore has changed and we (planck or CLJS) were depending on specific now-obsolete behavior. I have no idea whether that was a bug fixed, interface/spec/behavior changed, or a bug introduced but it seems that it is good that we are hitting this now rather than later when JSC is released next].

johanatan 2016-06-30T04:03:36.000602Z

And, btw, that error happens regardless which command I enter first (or even if I enter a command at all) into the REPL.

johanatan 2016-06-30T04:04:23.000603Z

Oooh, but it is continuable from-- just a slightly annoying artifact which causes your REPL to hang while it is thinking about it.

johanatan 2016-06-30T04:04:47.000604Z

And here's the stack trace for my original bug with symbols!!!

cljs.user=> (planck.shell/sh-async "ls" :dir "/Users/jonathan/Documents" :env {"blah" "blaz"} (fn [_] 9))
ASAN:DEADLYSIGNAL
=================================================================
==63655==ERROR: AddressSanitizer: SEGV on unknown address 0x003100000011 (pc 0x00010e879a9c bp 0x700000093ac0 sp 0x700000093ac0 T20)
    #0 0x10e879a9b in WTF::StringImpl::bufferOwnership() const StringImpl.h:854
    #1 0x10ecd56c8 in WTF::StringImpl::requiresCopy() const StringImpl.h:797
    #2 0x10ecd5226 in WTF::StringImpl::isolatedCopy() const StringImpl.h:1117
    #3 0x10f7bd746 in WTF::String::isolatedCopy() const & WTFString.cpp:684
    #4 0x10f3f934f in OpaqueJSString::string() const OpaqueJSString.cpp:61
    #5 0x10f1c0647 in JSEvaluateScript JSBase.cpp:65
    #6 0x10e27881b in wait_for_child shell.c:135
    #7 0x10e278904 in thread_proc shell.c:144
    #8 0x7fff9dbc499c in _pthread_body (libsystem_pthread.dylib+0x399c)
    #9 0x7fff9dbc4919 in _pthread_start (libsystem_pthread.dylib+0x3919)
    #10 0x7fff9dbc2350 in thread_start (libsystem_pthread.dylib+0x1350)

johanatan 2016-06-30T05:58:50.000607Z

The problem was that I was [in essence] casting JSStringRef to JSValueRef and back again (due to c_string_to_value returning a JSValueRef and JSEvaluateScript accepting a JSStringRef. Fix was to call JSStringCreateWithUTF8CString directly instead of c_string_to_value.

johanatan 2016-06-30T06:09:57.000610Z

@mfikes: So, I'm afraid your JSGlobalContextCreateInGroup fix isn't doing what we want here-- it seems that it creates a new context beside the original one in the same group (whatever that means). Unfortunately the vars that I'm putting the callbacks into on the CLJS side aren't copied into that new context and only exist in the original one.

johanatan 2016-06-30T06:13:50.000613Z

Hmm, I may be able to manually copy over the vars in question from the original context into the new one. I will try that next.

mfikes 2016-06-30T14:00:18.000614Z

@johanatan: Wow. Interesting stuff. I haven’t had time to look into any of it recently 😞

johanatan 2016-06-30T20:31:43.000615Z

@mfikes: you wouldn't happen to know where we obtain our original JSContext from would you?

johanatan 2016-06-30T20:32:11.000616Z

the webkit devs said that as long as we created it ourselves and didn't get it from a WebView (webkit context) then it should be fine to hit it from a bkg thread

johanatan 2016-06-30T20:32:51.000617Z

[however, it's clearly crashing for us from bkg thread (repeatably) so I'm starting to suspect that we might have a webkit/webview ctx]

johanatan 2016-06-30T20:33:07.000618Z

but i'd imagine that a REPL doesn't need the DOM/view stuff etc so that sounds strange

johanatan 2016-06-30T20:41:52.000622Z

mm, that definitely doesn't look like a WebView-provided one

johanatan 2016-06-30T20:43:11.000623Z

so, i did notice a bit of heisenburgness about this issue (i.e., using the original ctx from the bkg thread) and that is: sometimes the crash happens during the first call on the context (JSMakeNumber) and sometimes on the second call to the ctx (c_string_to_value) [both inside result_to_object_ref]

johanatan 2016-06-30T20:43:50.000624Z

and given the other heisen-stuff I was seeing before it leads me to think there's something unstable in the underlying global setup of planck itself (i.e., not in my shell.c code [which doesn't run until you actually issue a sh or sh-async command]).

mfikes 2016-06-30T20:45:22.000626Z

@johanatan: I wonder if we can repro what you are seeing with Planck master...

mfikes 2016-06-30T20:46:56.000627Z

@johanatan: You are saying that you see it by simply requiring the planck.shell namespace?

johanatan 2016-06-30T20:47:14.000628Z

well there's two heisen-issues

johanatan 2016-06-30T20:48:14.000629Z

1) the sh-async crash with the main thread's ctx happens on different operations on the ctx [not always the first, but seems to be always one of the first two]

johanatan 2016-06-30T20:49:07.000631Z

2) [I would have to scroll back through Slack's history to find the other one I mentioned but it either happened on planck startup itself or on requiring planck.shell-- don't remember which]

johanatan 2016-06-30T20:49:17.000632Z

[I hope we have that much Slack history lol]

johanatan 2016-06-30T20:51:51.000633Z

here's the original heisenbug

johanatan 2016-06-30T20:52:54.000637Z

so, yes, looks like that one was on require planck.shell

mfikes 2016-06-30T20:53:38.000638Z

Yeah. Planck master doesn’t do that. Perhaps there is something interesting in the ClojureScript code in your branch.

mfikes 2016-06-30T20:54:05.000639Z

If you can, you can also do script/build and then build/Release/planck to see if 1.x crashes in the same way.

johanatan 2016-06-30T21:11:36.000640Z

Well, the thing is I haven't seen that particular heisenbug in a few days. It started out life with extremely low frequency, then started appearing somewhat regularly, then went away completely. I'm sure it's probably connected to some other factor.

johanatan 2016-06-30T21:12:31.000641Z

But the existence of both of these occurrences of non-determinism is a bit troubling (one of which continues to this day).

johanatan 2016-06-30T23:05:30.000642Z

@mfikes: here's an interesting finding: the ctx at process_line is different from the one at my function_shellexec.

johanatan 2016-06-30T23:06:00.000643Z

[`function_shellexec` is called as a callback and there's a lot of JavaScriptCore stack frames in between those two frames on the stack]

johanatan 2016-06-30T23:07:22.000644Z

[also with a debugger attached to the main thread, the bkg thread can continue successfully with the context provided-- i.e., the issue here is a race (which also explains the non-determinism I observed around exactly which operation on the context would fail)].

johanatan 2016-06-30T23:07:54.000645Z

so, it looks like there is some sort of 'local' context that is provided to each of the hooked functions and that it doesn't survive as long as we thought it did]

johanatan 2016-06-30T23:10:08.000653Z

[as you can see, this stuff is proving to be vastly easier to figure out with symbols :simple_smile:]

johanatan 2016-06-30T23:52:03.000655Z

WOO! it works!

johanatan 2016-06-30T23:52:12.000656Z

I violated encapsulation a bit to pull it off

mfikes 2016-06-30T23:52:32.000657Z

:)

johanatan 2016-06-30T23:52:41.000658Z

but basically referring to the global_ctx (defined in repl.c) from shell.c directly instead of using the one passed in works.

mfikes 2016-06-30T23:52:56.000659Z

Wow

johanatan 2016-06-30T23:52:59.000660Z

the problem is that JSObjectCallAsFunction creates a local context

johanatan 2016-06-30T23:53:09.000661Z

and copies into it all of the things from the outer context

mfikes 2016-06-30T23:53:16.000662Z

Good sleuthing!

johanatan 2016-06-30T23:53:18.000663Z

and when our main thread exits, that context goes away

johanatan 2016-06-30T23:53:20.000664Z

thx!

johanatan 2016-06-30T23:53:21.000665Z

🙂