Tuesday, April 1, 2014

Potpourri 2: more misc emails regarding doing Bayesian data analysis

More miscellaneous communications with readers. (I have omitted all the salutations and pleasantries to save space here, but most correspondents do begin and end their messages with introductions and salutations that I truly appreciate.) Again, apologies to those I don't reply to -- too many!

Topics:
  • Adapt ANOVA-style models for mixed between- within- designs? 
  • glm() in R, with a poem 
  • Metropolis algorithm
  • I have some data, what should I do?
  • Could you look over my conference paper?
  • Time series models?
  • How to write up a Bayesian analysis
  • Trying to adapt programs, getting errors
  • Modifying FilconJags.R and FilconJagsPower.R for two groups instead of four groups
  • Sequential updating (including more data) when using MCMC
  • How to take the difference between two parameter distributions
  • What book should be studied after Doing Bayesian Data Analysis?
  • Exercise 13.4

March 9: Adapt ANOVA-style models for mixed between- within- designs?
My data set comprises two populations: patients and control subjects.
All subjects were repeatedly assessed - albeit in different conditions, because the treatment manipulation of the patients is not applicable for controls.
While our main interest pertains the patients' treatment conditions (4 levels), I'd like to compare them with the "natural variation" observed in the control group, who just repeated our experiments twice.
Thus abstractly, the patient group is split by a within-factor of 4 levels P1-P4, and the controls are split by a 2-level within factor, C1 and C2. What's the best method to compare these, assuming normal distributed response variables?

Currently I use one-way BANOVAs, extended by a subject factor to account for repeated measures. But with these models (4 and 2 levels, respectively), I think I can only model the two population groups separately in different models, which hinders direct comparisons between parameter estimates.

My questions in particular:
1) Can/should I extend the BANOVA model to include both populations?
2) Or is it possible to compare the parameters of different BANOVA models? If so, how?
3) Does this kind of design have a specific name? This might be handy to know to search for modeling examples.
Thank you very much in advance!
I talked to a colleague, who was of the opinion that I can directly compare posteriors, provided that they are in the same units and play the same "role" in the model.
In my example, it would be admissible to compare the b0 parameters, i.e. the BANOVA baselines from different models. Using that comparison, I would at least be able to state whether the groups are different from each other on average.

Specifically, the comparison has to be directly computed from the MCMC chains. For each pair of values, randomly sampled from the chains, I would compute the difference, and thus build up a distribution of differences. This then may be inspected with the usual methods, e.g. using its 95% HDI.

While this sounds correct to me, I'd like to know some more details. How many samples should/can I draw from the chains? Should I inspect the autocorrelation? Any help and/or pointers to literature are greatly appreciated!

Regarding the model structure, yes, you should be able modify the ANOVA-style models to suit the structure of your data. This is not to say it is necessarily trivial or obvious how to do that, especially when trying to impose the sum-to-zero constraints (if you choose to do so). For example, see this blog post about a split-plot design, which is related to the sort of design you have:
http://doingbayesiandataanalysis.blogspot.com/2012/05/split-plot-design-in-jags-revised.html

Regarding computing differences: At every step in the MCMC chain, you can compute differences or ratios or other transformations of parameters in whatever way is meaningful. At each step in the chain, the parameters are *jointly* representative of the posterior distribution.

If you have split your data into two parts that are modeled separately, then the parameter values are, by definition, independent. You can then set the two chains side-by-side and take differences between them as if they were in a joint space.


March 11: glm() in R, with a poem
n R question that I cannot get to the bottom of. I was hoping that you might be able to shed some light on it, as everyone I ask is drawing a blank. 

It is rather simple actually, I have two factors, and I want to do a logistic regression against a (0,1) target using glm in R. 

I have a 0,1 target and then use some explanatory variables which are factors with levels e.g. ... When I try and perform the logistic regression it refuses to use the first level of each factor, as you can see from the screen cap jk1.png attached to this e-mail. ...  Is this something you have experience of? Am I overlooking something trivial?! Apologies if so! 
As your question is about glm() in R, and not about Bayesian analysis per se, I will defer to the R experts. You might try posting your question on http://stats.stackexchange.com/ or related groups.
Best wishes.
In my desperation
To elucidate the facts
I disturbed you unnecessarily
When I should have consulted stacks.

You're welcome, it's a superb book, and has helped me a great deal. Kind regards

March 1: Metropolis algorithm
I am currently doing my statistics thesis on ... which requires quite a great knowledge of Bayesian Theory especially MCMC methods. However I have some minor problems regarding the Metropolis algorithm:

Any distribution can be attributed to the proposal density function?

Also, the initial values given to the parameters (i.e. the first step of the Metropolis algorithm) are random or there must be some thinking behind the choice of values?

Thank you very much
If you are programming your own Metropolis sampler, you should study the general Metropolis-Hastings algorithm and learn about methods for "tuning" the proposal distribution during an adaptation phase. A great thing about JAGS/BUGS is that they do it for you.

The initial values can be anywhere that has non-zero prior and non-zero likelihood (actually, even that is okay - the real constraint is that the prior times likelihood needs to be defined at the initial values), but initial values near the mode of the posterior make burn-in a lot faster. In either case, you need to monitor convergence to be sure that the chains have lost any bias from their starting positions.

Good luck with your work.



March 14: I have some data, what should I do?
I'm contacting you to ask for your advise on the following problem.


I have 500 samples, each sample being ... [extended description omitted].
Using Bayesian analysis I want to determine the error rate of ... .
Below is an exemplary of the data set. [data omitted] I have never used Bayesian Analysis before so I dont really know how to proceed. I'm thinking that Bayesian analysis is suitable to answer the research question.
-The idea is to use a non informative prior, so the sample can be positive or negative.
-Not sure what distribution to use (binomial, bernoulli, multinomial,). not sure.
-The rest, I do not know.


I look forward to your reply.
Thanks very much.
Thanks for you interest in Bayesian analysis. I hope you do choose to pursue it.

The first thing you need to analyze your data is a descriptive model. The goal of the analysis is to estimate parameter values that are meaningful descriptions of the data. Once you have a model, Bayesian analysis provides a distribution of credible values of the parameters, given the data.

As your field of research is not my specialization, I'm not in a position to prescribe the model you should use. But, whatever it is, I am confident that Bayesian analysis is a richly informative method for analyzing your data.

Best wishes,



 March 16: Could you look over my conference paper?
... I will be presenting a paper at the annual meeting of the ... Association. My paper is based on your BEST software, and I was wondering if you would be willing to take a look at it to point out any major errors I may have committed and save me from potential embarrassment.
Thanks for thinking of me and showing me a draft of your paper. [And I meant that sincerely!] Unfortunately I don't have time to study its 20 pages. If you have a specific question about some topic, however, I might be able to provide a quick answer. Best of luck with your presentation!



March 16: Time series models?
Do you think it is possible to apply Bayesian data methods to time series? What I would hope to avoid is trying to splice GARCH methodology with the Bayesian approach. GARCH is lumbering and clumsy enough in itself. For high frequency financial data, it must surely be possible to assume distribution stability, until evidence begins to emerge that the distribution has broken down, and needs to be reformulated? I would greatly value your advice.
I have not ventured deeply into time-series models, so I have no personal endorsements of particular resources. But Bayesian methods can (in principle) be applied to any model that has parameters, even so-called "non-parametric" models (which have up to an infinite number of parameters). For example, there is a 2011 book titled "Bayesian time series models" that collects recent advanced applications. Although it's not a beginner's tutorial, it might provide useful pointers.


Feb. 25: How to write up a Bayesian analysis
Sir, thanks for Bayesian Estimation Supersedes the t-Test, I can make the plots, sorry but how to analyse the plots and write a paper. Will you please put up a model paper with such analyses. [Here] we have been brought up on a strict diet of ANOVA-RBD/CRD, and t tests with CV values (Analysis of Variance - Randomized Block Design / Completely Randomized design). How can I do these analyses / similar in Bayesian methods,
Reply from Mike Meredith:

I can't point you to papers describing a Bayesian analysis of ANOVA-type models, but I will indicate sources with advice and pass this on to John Kruschke who may have more references to hand.

John has a blog post on the topic at http://doingbayesiandataanalysis.blogspot.com/2012/05/how-to-report-bayesian-analysis.html. This has a reproduction of the first part of section (23.1) on "Reporting a Bayesian analysis" from Kruschke, J.K. (2011) Doing Bayesian data analysis: a tutorial with R and BUGS Elsevier, Amsterdam etc.

The main points are summarised in Kruschke, J.K., Aguinis, H., & Joo, H. (2012) The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15, 722-752 in the section "Recommendations and Illustration of How to Report Bayesian Analysis Results", where they also give an example of writing up a regression analysis.

Also Kruschke, J.K. (2013) Bayesian estimation supersedes the t test. *Journal of Experimental Psychology: General*, 142, 573-603 has a section on "Reporting the results of a Bayesian analysis". You would need to cite that paper as the description of the methods for robust Bayesian analysis.

There are links to the two Kruschke papers at http://www.indiana.edu/~kruschke/publications.html

In my own field (wildlife ecology) we can rarely do experiments, so hypothesis testing is scarcely relevant anyway. The focus is on estimation (of density, abundance, occupancy, survival, etc) or on modelling. In the latter context, information theoretic approaches (using Akaike's Information Criterion, AIC) have been the norm, and moving from there to a Bayesian approach is relatively painless. So I can't point you to papers reporting a Bayesian analysis of ANOVA-type data; Marc Kery's book, An Introduction to WinBUGS for Ecologists (Academic Press, 2010), shows how to do this, but not how to write it up.



April 1: Trying to adapt programs, getting errors
I have been looking through your textbook Doing Bayesian Data Analysis and specifically have been looking at how to do the Bayesian ANOVA in Chapter 18. I have been running into many problems, and I was wondering if you would have time to take a quick look over my code to see what's going wrong. I have no familiarity with R prior to this book and no experience programming, so I am not sure where my errors are. Generally speaking, the problem is that objects and functions are not being found. I have attached a the .csv of my imaginary data set and a .txt of the code I am running in RStudio in case you are able to take a look (I don't mean to presume you will, it's just quicker to already have the materials attached). I do load the .csv into RStudio before running the code. I suppose my core problem here is that I don't know how to take the code from the book using the examples there and apply the code to my own data sets for personal use. Thank you for your time,
Thanks very much for your interest in the book. To have any chance to help you, I need to know what error messages are being shown. Could you please run your script and then copy and paste the R console, with error messages, into an email? Especially important is to identify the very first error that occurs.
Then I run the code, which I have attempted (unsuccessfully) to adapt from what is in the textbook for a Bayesian ANOVA.

After running the code, this is what I get. It seems like it's failing to create several objects that it needs to perform the analysis, or is not finding the functions it needs to do it. I'm not really sure where the problem begins. I know in programming code a single error can lead to other errors, but this is my first exposure to R programming and the code in the book is for specific practice examples and I'm just not sure if I am adapting it to my pretend data set properly (which I'm most likely not given the errors).

 I also get several windows where graphs are supposed to be, but nothing is in them. I colored the errors red to make them easier to find.

...

> modelCheck( "model.txt" )
Error: could not find function "modelCheck"
+     fnroot = paste( fnroot , dataSource , sep="" )
+     datarecord = read.table( "model.txt", header=T ,
+                              colClasses=c("factor","numeric") )
... 
Error: unexpected symbol in:
"    NxLvl = length(unique(datarecord$Group))
    contrastList = list(1v2"
>    
A few quick thoughts:

* If you're just interested in comparing two groups, definitely look here:
http://www.indiana.edu/~kruschke/BEST/
It provides you with complete programs for that case.

* I now strongly recommend NOT using BUGS, and instead using JAGS. JAGS is much like BUGS but works better. See these posts:
http://doingbayesiandataanalysis.blogspot.com/2012/01/now-in-jags-now-in-jags.html
http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-steps-for-installing-software.html

* Your first error,
> modelCheck( "model.txt" )
Error: could not find function "modelCheck"
is that R cannot find BRugs. I think you need to first say
library(BRugs)
or whatever (I've forgotten since it's been so long that I've used BRugs).

* Your second error is that datarecord() is reading in model.txt, not data.
Hope that helps.



March 25: Modifying FilconJags.R and FilconJagsPower.R for two groups instead of four groups
The R code assumes the existence of four conditions (consistent with the ongoing example), but the data file has data for only two, so the code fails when it attempts to access the non-existent data. I can remove the code for handling the two conditions not in the data, but I assume that the intent was to have all four in the Rdata file. Is there a version of FilconJagsMuKappa.Rdata that has the expected four conditions?
If I understand your question correctly, what you are trying to do is apply the FilconJags.R and FilconJagsPower.R programs to data with only 2 groups instead of 4 groups. I think the programs should run without complaint for just 2 groups, although there won't be much benefit (i.e., shrinkage) from the hierarchical structure when there are only 2 groups.

Try this:
In FilconJags.R, include only the first 2 groups. Find the data section and insert the lines shown here:

# THE DATA.
# For each subject, specify the condition s/he was in,
# the number of trials s/he experienced, and the number correct.
# (These lines intentionally exceed the margins so that they don't take up
# excessive space on the printed page.)
cond = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4)
N = c(64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64)
z = c(45,63,58,64,58,63,51,60,59,47,63,61,60,51,59,45,61,59,60,58,63,56,63,64,64,60,64,62,49,64,64,58,64,52,64,64,64,62,64,61,59,59,55,62,51,58,55,54,59,57,58,60,54,42,59,57,59,53,53,42,59,57,29,36,51,64,60,54,54,38,61,60,61,60,62,55,38,43,58,60,44,44,32,56,43,36,38,48,32,40,40,34,45,42,41,32,48,36,29,37,53,55,50,47,46,44,50,56,58,42,58,54,57,54,51,49,52,51,49,51,46,46,42,49,46,56,42,53,55,51,55,49,53,55,40,46,56,47,54,54,42,34,35,41,48,46,39,55,30,49,27,51,41,36,45,41,53,32,43,33)

# Include only the first two groups:
includeIndex = ( cond <= 2 )
cond = cond[includeIndex]
N = N[includeIndex]
z = z[includeIndex]

nSubj = length(cond)
nCond = length(unique(cond))
Then source FilconJags.R. This will save the MCMC chain for the 2-group model. It will also try to produce graphs, but because the graphs assume four groups, the program will balk. But the posterior MCMC is saved before that happens.

FilconJagsPower.R will now run as is, except for two things. First, its graphics assume 4 groups. Therefore you can make this change:

    # Make plots for first 10 simulated experiments:
    #if ( expIdx <= 10 ) { plotRes = T } else { plotRes = F }
    plotRes=FALSE
And, its "goals" are defined in terms of 4 groups, so you can make this change:

# Goal is filtration vs condensation 95% HDI exceeds zero:
#goal1Ach = ( HDIofMCMC( (mu[1,]+mu[2,])/2 - (mu[3,]+mu[4,])/2 )[1] > 0.0 )
# Goal is filtration1 vs filtration2 95% HDI exceeds zero:
goal2Ach = ( HDIofMCMC( mu[1,]-mu[2,] )[1] > 0.0 )
# Goal is condensation1 vs condensation2 95% HDI has width less than 0.2:
#HDIlim = HDIofMCMC( mu[3,]-mu[4,] )
#HDIwidth = HDIlim[2] - HDIlim[1]
#goal3Ach = ( HDIwidth < 0.2 )
#goalAchieved = c( goal1Ach , goal2Ach , goal3Ach )
goalAchieved = c( goal2Ach )

I hope that steers you toward achieving your goals!


April 1: Sequential updating (including more data) when using MCMC
I'm sorry to bother you, but I have a question for which I did a lot of researches, but I couldn't get any result. Basically, I implemented a Hierarchal Bayesian model in R and it works fine. However, what I will like to do now is to apply the model on a trial by trial basis (instead of applying it on the whole set of trials) and get the parameters estimations after each trial to use them both as "prior" parameters on the next trial and to track the evolution of these parameters. I do not want to use sequential methods, I just want to apply the model on a trial-by-trial methods. Do you think you can give some idea?
If you are using MCMC methods, and you really want to use the posterior from one run as the prior for the next, it is not directly possible. This is because the posterior distribution from the first run will usually be complex (e.g., curved and correlated in parameter space, with possibly skewed tails and so on) and not possible to be exactly expressed mathematically in the prior specification. (The beauty of using conjugate priors is that the posterior has the same mathematical form as the prior, so incremental updating is elegantly easy. For complex models, there are no [known] conjugate priors.)
Some people attempt to get around that problem by approximating the posterior of the first run with some simpler mathematical form, and using that mathematical re-description as the prior for the next run. But this method is only as good (or bad) as the approximation, and I wouldn't recommend it unless you have really cogent reasons to think that the approximation is excellent.
If you are merely illustrating how the posterior changes with incremental updating, then you can simply do the data analysis using incremental data sets. Do an analysis with {x1,...xN}, then an analysis using the original prior with {x1,...,xN,xN+1}, then an analysis using the original prior with {x1,...xN,xN+1,xN+2}, and so on.



March 28: How to take the difference between two parameter distributions
one thing that has me perplexed right now is how you take the difference between two posterior distributions to see if there is a change. One very common application to us is to have similar data on a specific metric before and after a ... change. Naturally, the intent of our analysts is to understand if there has been a difference after that ... change. Our first step is to calculate the posteriors of the two datasets (before and after). Then I assumed that the posteriors are a random variable so they are compared by forming the convolution of the two distributions. However, if you just visualize two normal distributions (the posteriors) where one has a mean near zero and the other is shifted (say, to 5), the convolution will still cover zero. So although these two datasets are clearly different and the software change has had an impact, the HDI will still straddle zero and the conclusion will be that there was no change. Is this method of using convolution how you did your poster comparisons in the textbook? If not, I would very much like to know the methods for taking this difference like you did with respect to recall and classical music in your text (chapter 12). Thanks very much for you time,
At every step in the MCMC chain, the parameter values at that step are jointly representative of the multidimensional posterior distribution. Therefore the difference of relevant parameter values at each step is a representative difference.

I think the first example of this in the book occurs at the top of p. 176.

Because posterior distributions can have strange shapes, it is important to compute the difference at each step, maintaining the joint parameter values. See Fig. 12.3, p. 300.



March 27: What book should be studied after Doing Bayesian Data Analysis?
What do you recommend as a good second book [after Doing Bayesian Data Analysis]? I’m a long time R user and my problem space is in engineering.
For a follow-up book, I'd recommend
Ntzoufras (2009) Bayesian modeling using WinBUGS
and
Gelman et al. (2014) Bayesian data analysis 3rd ed.
You might also try
Lee & Wagenmakers (2014) Bayesian cognitive modeling.

None of those is specifically engineering, but Ntzoufras and Lee & Wagenmakers give lots of BUGS/JAGS examples and Gelman et al. is a great authoritative resource.



March 27: Exercise 13.4
Exercise 13.4 refers to Kruscke1996CSbugs.R; I don’t see it (or the Jags version) in the collection of R programs and data. The associated data file is there (Kruschke1996CSdatsum.Rdata), however. Is it not there on purpose (i.e., the student is expected to write the code from scratch)?
The program for that exercise was deliberately excluded from the posted programs because it was supposed to be a programming exercise, not merely "click the 'source' button and watch what happens." I should not have mentioned the program name in the exercise description, but I did because that was how I got my word processing software (LaTeX) to include code snippets. The program I have for that exercise is old and not prepared for release, so I would prefer not to circulate it. I apologize that this is not the answer you were hoping for!


Saturday, March 22, 2014

Potpourri of recent inquires about doing Bayesian data analysis


I get a lot of email asking about the book and the blog. Too often, I say to myself, "That's interesting. I'll have to think about it and reply later." And then later never comes. My apologies to everyone to whom I have not yet replied. Below is a sample of a few of the messages I've received in the last few days, actually with replies.

Topics:
  • Initializing chains for multi-factor ANOVA-style models.
  • Errors in BUGS (another reason to use JAGS).
  • Getting the decimal places of the HDI limits that don't appear in the graphs.
  • Effect size measure in ANOVA-style log-linear models.
  • Variable selection in multiple regression.



March 22, 2014:
Dear Prof. Kruschke,

by now I've opened and closed your contact form many a times, and have been successfully able to tackle each problem I had at the time using your book. Now, though, I am stumped. I am calculating a Poisson Exponential "ANOVA" for 4 nominal predictors and the code for initializing the chains troubles me. For that matter, my question is regarding any and all situations in which there are more than two nominal predictors present. The code to initialize the linear predictor sum is this:

linpred = as.vector( outer( a1 , a2 , "+" ) + a0 )

I understand the concept of the outer() function for two variables, but could you please explain how you would deal with adding a3 and a4 into this mix, assuming the rest of the code has been updated accordingly?

Kind regards,

Reply:

Thanks very much for your interest in the book.

One approach is to try not to initialize manually, letting JAGS initialize for you. You have to check that the chains have burned in well and converged. But if it works, problem solved.

But if you really want to manually initialize, you have a few choices. One option is not to bother with the interaction terms. That is, initialize the main effects with lines like
a0 = mean( theData$y )
a1 = aggregate( theData$y , list( theData$x1 ) , mean )[,2] - a0
a2 = aggregate( theData$y , list( theData$x2 ) , mean )[,2] - a0

but leave the interaction terms all initialized at zero.

If you really, really want to initialize the interactions, you can use the outer() function to build higher dimensional arrays. To get help, type
?outer
at R's command line. As an example, consider this:

> a=1:2
> b=3:5
> c=6:9
> outer(a,b,"+")
     [,1] [,2] [,3]
[1,]    4    5    6
[2,]    5    6    7
> outer(c,outer(a,b,"+"),"+")
, , 1

     [,1] [,2]
[1,]   10   11
[2,]   11   12
[3,]   12   13
[4,]   13   14

, , 2

     [,1] [,2]
[1,]   11   12
[2,]   12   13
[3,]   13   14
[4,]   14   15

, , 3

     [,1] [,2]
[1,]   12   13
[2,]   13   14
[3,]   14   15
[4,]   15   16


Hope that helps!
--John


March 20, 2014:
Hi John,

First of all I would like to mention that I am really enjoying reading the book.

I have studied a bit of Bayesian statistics in a university course and I find really interesting to see real world applications of MCMC.

I have been doing the exercises okay but I have come across a stumbling block in chapter Nine.

On exercise 9.2 when I changed the data parameters as per exercise solutions I do get the following error message. These errors appear when I run the FilconBrugs.R script in its original form as well

'Initializing chain 1:
initial values loaded and chain initialized but another chain contain uninitialized variables
Initializing chain 2:
initial values loaded and chain initialized but another chain contain uninitialized variables
Initializing chain 3:
model is initialized,

After this error then I encounter the following one:

"update error for node <theta[4]> algorithm conjugate beta updater error argument two  has too small a value"

Basically the model does not produce any posterior. If you could let me know what the error could be due to it would be much appreciated. I'm using R studio and I haven't come across any errors in the rest of exercises and code that I have ran so far.

Kind Regards,

Reply:

Thanks very much for your interest in the book.

First off, I strongly recommend using JAGS instead of BUGS. I have repeatedly found that BUGS yields troubles when JAGS works fine. Here is the link to a blog post about using JAGS instead of BUGS:
http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-steps-for-installing-software.html

Both of the errors you encounter are from BUGS. The statements about intializing the chains are not errors, but merely status reports. It's simply going through the chains and initializing them, but letting you know that after chain 1 there are still 2 more chains to go, and after chain 1 there is still another chain to go.

The bit about "update error for node..." is a genuine error. As I recall, I do not get that sort of error in JAGS. There are work-arounds in BUGS, by truncating the distribution or the values of higher-level parameters. But give JAGS a try and see if that solves the problem.

Thanks again.
--John



March 20, 2014:
Good afternoon Mr. Kruschke,

I have a  questions about two-way ANOVA from your book. On page 522, you run an example using salary data and you mention that the means and HDI limits are displayed with only three significant digits, but more precise values can be obtained directly from the program.  How do I get those more precise values?

Thank you for help,

Reply:
Thanks very much for your interest in the book.

The 22-Jun-2012 version of plotPost.R (see programs at http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/ ) produces a summary of the posterior as its returned value. For example, if you use the R statement

postInfo = plotPost( ... )

then postInfo has the mean, median, mode, HDI limits, etc.

Don't forget that even though multiple decimal points are there, they are not necessarily meaningful. The values are estimates from a random MCMC sample from the posterior, so they are noisy!

Thanks again.
--John



March 19, 2014:
Dear Prof. Krushke, I'm working with ch 22 of DBDA to analyse contingency tables. I'm curious to know where to look for a measure of effect size, both for the whole table and for individual interactions. Any tips you have would be much appreciated!
Thanks,

Reply:

Thanks for your interest in the book.

My short answer is that I don’t have an adequate answer. There are many relevant measures of effect size in the frequentist literature, of course, but we seek something for hierarchical Bayesian log-linear models. A measure of effect size is supposed to be an aggregated difference (i.e., the “effect”) standardized relative to some indicator of variance or noise in the data. Moreover, in a Bayesian setting, the effect size has a posterior distribution; it’s not just a point estimate. An example of posterior effect size for a difference between groups is given here: http://www.indiana.edu/~kruschke/BEST/
One possibility for describing effect size in hierarchical Bayesian ANOVA-style models (including log-linear models) is a ratio of estimated variance parameters. For example, the main effect of factor A might have an “effect size” given by sigma_A divided by sigma_y, where sigma_A is the scale parameter of the distribution of the factor-A deflections, and sigma_y is the scale parameter of the within-cell noise distribution. But that’s just off the top of my head.
Thanks again,
--John




March 18, 2014:
John, I really enjoyed reading your post [http://doingbayesiandataanalysis.blogspot.com/2014/01/bayesian-variable-selection-in-multiple.html] as, similar to you, I feel that when it comes to real data there is a lot of fine tuning that is usually inadequately addressed in papers/books.

I have a problem where the posterior probability of delta=1 is always above 0.5 (even when the respective variable should clearly not be selected). I was wondering whether you made progress with your issues and perhaps can share some light on mine.

Thanks,

Reply:

Thanks for the comment.

I’m not sure how you know that the “respective variable should clearly not be selected.” If you already know that, does that mean that you have prior knowledge that is not being expressed in the prior distribution of the model?

My impression of variable selection is that it’s a bit of an art, in the sense that the prior matters! But that’s true of Bayesian model selection in general. Real prior knowledge should be implemented if possible, and, in the case of variable selection, shrinkage from hierarchical structure should be implemented if appropriate.

Friday, March 14, 2014

Bayesian estimation and precision as the goal for data collection (expanded)

Precision as the goal for data collection, talk at U.C. Irvine, March 14, 2014.

Part 1: Rejecting null is not enough, also need estimate and precision. Bayesian estimation supersedes confidence intervals and "the new statistics".


Part 2: Two Bayesian ways to assess a null value. Highest density interval with region of practical equivalence. Bayesian model comparison and Bayes factor.


Part 3: Biased estimation in sequential testing and optional stopping.


Part 4: Monte Carlo study of biased estimation in sequential testing and optional stopping.


Parts 3 and 4 are elaborations of this previous blog post.

Saturday, March 8, 2014

Recent Advances in Bayesian Inference at U.C. Irvine, March 14



Recent Advances in Bayesian Inference

Venue:
University of California at Irvine,
1517 Social and Behavioral Sciences Gateway building

Date:
Friday, March 14, 2014

Speakers:
  • 9:30 John Kruschke (Indiana University). Precision as the goal for data collection.
  • 10:30 Hal Stern (UC Irvine). A statistical approach to detecting patterns in behavioral event sequences.
  • 11:30 Jeff Rouder (University of Missouri). The p < .05 rule and the hidden costs of the free lunch in inference.
  • 2:00 Wes Johnson (UC Irvine). Introduction to Bayesian Nonparametrics.
  • 3:00 Mario Peruggia (Ohio State University). Introduction to Bayesian model averaging.
Registration:
Please contact Dr. Joachim Vandekerckhove, joachim@uci.edu


Saturday, February 22, 2014

Bayesian Tests to Quantify the Result of a Replication Attempt

This manuscript looks like a nice use of Bayes factors to assess replication results. I have not read it yet in detail, but the idea sounds right on target. From the abstract:
To quantify replication outcomes we propose a novel Bayesian replication test that compares the adequacy of two competing hypotheses. The first hypothesis is that of the skeptic and holds that the effect is spurious; this is the null hypothesis that postulates a zero effect size, H0: δ = 0. The second hypothesis is that of the proponent and holds that the effect is consistent with the one found in the original study, an effect that can be quantified by a posterior distribution. Hence, the second hypothesis --the replication hypothesis-- is given by Hr : δ ~ "posterior distribution from original study".
Here is the link: Bayesian Tests to Quantify the Result of a Replication Attempt; Josine Verhagen and Eric-Jan Wagenmakers, University of Amsterdam.

Friday, February 14, 2014

Improved icons for Bayesian and frequentist analysis


This post presents icons that attempt to capture the essence of Bayesian and frequentist analysis. There are four icons: Bayesian and frequentist approaches to decisions about null values, and Bayesian and frequentist approaches to parameter estimation. This post is an update of a previous post, motivated by many helpful comments from readers. For an explanation of what I mean by the "essence" of the approaches, and what I hope to achieve from this exercise, please see the previous post. Without further ado, the icons are presented below, first in a 2x2 grid, then one at a time with explanations in the captions.

Bayesian Frequentist
Null value
assessment
Estimation

Bayesian null value assessment: The light-blue lines indicate the posterior distribution of credible lines. The dark-pink line marks the null value (zero slope). The null value falls far outside any credible value. [Added Feb 16, 2014: Of course, the full decision rule involves a ROPE around the null value. The ROPE is not displayed here, just to keep the icon uncluttered.]

Frequentist null value assessment: The dark-blue line marks the best fit. The dark-pink line marks the null hypothesis.The light-pink lines show the sampling distribution from the null hypothesis. The best fit falls far outside any null-sampled line.

Bayesian estimation: The light-blue lines indicate the posterior distribution of credible lines. There is an explicit distribution of credibilities (i.e., posterior probabilities) across possibilities (possible slopes etc.).

Frequentist estimation: The dark-blue line marks the best fit. The two dark-pink lines mark the limits of the confidence interval. The light-pink lines show the sampling distributions around each of the confidence-interval limits; notice that the best-fit line falls at the extreme of each sampling distribution. There is no distribution of probabilities across possibilities; there are only three point values: the best fit and the two CI limits.
Creative Commons license appended Feb 18, 2014, as suggested by reader comment:
Creative Commons License The four iconic images for Bayesian and frequentist data analysis by John K. Kruschke are licensed under a Creative Commons Attribution 4.0 International License. Based on a work at http://doingbayesiandataanalysis.blogspot.com/2014/02/improved-icons-for-bayesian-and.html.