data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
val_waeselynck 2019-11-21T15:44:48.161400Z

Does anyone here know of a tool for plotting 'confidence regions' of 2D probability distributions?

val_waeselynck 2019-11-21T15:44:53.161700Z

More precisely, I'd like to draw (posterior) probability densities as 2D heat maps, with 'contour lines' delineating regions of probability mass 95%, 99% etc.

val_waeselynck 2019-11-21T15:44:54.161800Z

Does that make sense, and does it have a name?

ben 2019-11-21T15:48:27.162300Z

I think you can achieve something similar with ggplot2: https://ggplot2.tidyverse.org/reference/geom_contour.html

ben 2019-11-21T15:49:17.163500Z

Might need to do something with stat_contour to get the specific regions you’re interested in. No idea about clj, I’m afraid

val_waeselynck 2019-11-21T16:01:57.165200Z

Thanks. Thinking out loud, I guess I could also find the appropriate density levels, either by numerical integration + dichotomic search, or by filling a 2D array with densities, sorting the values and searching for quantiles. Then draw the contours at the appropriate level lines.

val_waeselynck 2019-11-21T16:03:45.166600Z

I'm also wondering about the relevance of this approach for data analysis - are there alternative approaches to choosing / viewing 2D confidence regions that make this one uninteresting?

genmeblog 2019-11-24T19:14:41.167900Z

This is not trivial. Contours are made out of kernel density estimator which is usually just gaussian blur (for 2d) or specific kernel function (for 1d). I don't see an easy way to estimate inverse CDF for such approach.

val_waeselynck 2019-11-25T17:33:13.168400Z

@tsulej In this case, I can evaluate the density at any point, so it seems doable: https://clojurians.slack.com/archives/C0BQDEJ8M/p1574352117165200

genmeblog 2019-11-25T17:52:04.168700Z

Still integrating area is much more trickier than 1d range for symmetric distribution.

genmeblog 2019-11-26T10:47:36.169800Z

@val_waeselynck > by filling a 2D array with densities, sorting the values and searching for quantiles

genmeblog 2019-11-26T10:49:58.170Z

to find quantiles you want to use icdf (cumulative density) not pdf (density). For 2d you want to find volume and area which covers say 95% of total density volume.

genmeblog 2019-11-26T10:53:05.170200Z

For distributions like multivariate normal some numerical algorithms exist but I suppose they can't be applied to general case and any distribution (especially multidimentional empirical)

val_waeselynck 2019-11-26T16:20:01.171900Z

> to find quantiles you want to use icdf (cumulative density) not pdf (density). Yes of course, just forgot to mention it :)

val_waeselynck 2019-11-26T16:27:29.172100Z

> For distributions like multivariate normal some numerical algorithms exist but I suppose they can't be applied to general case and any distribution (especially multidimentional empirical) Yes for 2d gaussians this can be solved analytically - once you have an eigen-decomposition of the covariance matrix you're good, and even that may not be mandatory.