Monday, August 3, 2015

Bayesian meta-analysis provides high-precision estimate of cosmic impact event at ~12,800 years before 1950

A recently published article in PNAS reports a Bayesian meta-analysis of data from 354 samples from 23 geological sites on 4 continents, yielding a 95% credible interval from 12,835 to 12,735 years before 1950 for the cosmic impact event that formed the Younger Dryas boundary layer. The article, authored by J. P. Kennett with many co-authors, can be found at the PNAS web site.

Saturday, July 25, 2015

Laplace is visited by Doing Bayesian Data Analysis

After a post from almost two years ago inviting folks to pose the book with famous Bayesians or non-Bayesians (deceased or not), the book has finally visited a monument to Laplace! Shown below (scroll down) are photos kindly taken by Carlos Ungil. Thank you Carlos for basking the book in the glow of Laplace! 

DBDA2E acknowledges the impact of Laplace on p. 100: "many historians argue that it is Bayes’ successor, Pierre-Simon Laplace (1749-1827), whose name should really label this type of analysis, because it was Laplace who independently rediscovered and extensively developed the methods (e.g., Dale, 1999; McGrayne, 2011)." (I have been told later that Laplace might have been aware of the article by Bayes and Price, so maybe Laplace "merely" extensively developed the methods without independently rediscovering Bayes' rule. I'll let the historical scholars figure it out.) 

[From first comment to the post, inserted here for extra visibility: "xi'an said... Dear John, would it be possible to mention that this statue stands in Beaumont-en-Auge, Normandy, close to Deauville and Caen? Some visitors may be interested, even though it is not as convenient as Bayes' tomb in Bunhill Fields... A local"]

The book visited Jacob Bernoulli twice: here (2nd edition) and here (1st edition). The book visited R. A. Fisher here (1st edition). The book visited Bayes here (1st edition).




Wednesday, May 13, 2015

The Bayesian New Statistics: Two Historical Trends Converge

If not null hypothesis significance testing, then what? If not p values, then confidence intervals? If not NHST, then Bayes factors? Both? Neither? These issues are addressed in a new manuscript titled The Bayesian New Statistics: Two Historical Trends Converge.

Two trends in data analysis converge on Bayesian estimation. (NHST = null hypothesis significance testing. MLE = maximum likelihood estimate.)

Abstract: There have been two historical shifts in the practice of data analysis. One shift is from hypothesis testing to estimation with uncertainty and meta-analysis, which among frequentists in psychology has recently been dubbed “the New Statistics” (Cumming, 2014). A second shift is from frequentist methods to Bayesian methods. We explain and applaud both of these shifts. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The two historical trends converge in Bayesian methods for estimation with uncertainty and meta-analysis.

Excerpt: Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. We will recapitulate the goals of the New Statistics and the frequentist methods for addressing them, and we will describe Bayesian methods for achieving those goals. We will cover hypothesis testing, estimation of magnitude (e.g., of effect size), assessment of uncertainty (with confidence intervals or posterior distributions), meta-analysis, and power analysis. We hope to convince you that Bayesian approaches to all these goals are more direct, more intuitive, and more informative than frequentist approaches. We believe that the goals of the New Statistics, including meta-analytic thinking engendered by an estimation approach, are better realized by Bayesian methods.

The manuscript is available at this link (via SSRN).

Thursday, May 7, 2015

Updated DBDA2E programs for number of MCMC chains and parallel chains in runjags

The DBDA2E programs have been updated so they deal better with parallel chains in runjags and the number of cores available on your computer. The new programs are available as this zip folder also linked at the book's software page. There are 29 modified programs. They check for the number of cores on the computer and set the number of chains and the runjags method appropriately (I hope!).

Thanks go to reader Christoph Schmid, affiliated with Universität Regensburg, for notifying me of an error that instigated me to modify the programs. 

AND there is a new version of runjags available! It has many new features.

Saturday, May 2, 2015

Graphics Window for MacOS and RStudio Server

It's been a recurring headache to get graphics windows to open seamlessly in MacOS and Windows, running RStudio locally and RStudio Server. A solution has been proposed by Professor David Zeitler. He has modified the openGraph function in the DBDA2E-utilities.R script. If you use a Mac, or you use RStudio Server, please check it out and let us know whether or not it works for you by entering a comment at the bottom of this post. To make the change, just open DBDA2E-utilities.R in an editing window. Comment out the current definition of openGraph (don't delete it, in case you need to recover the original version). Insert the new function definition listed below, and be sure to save the edited version. Then source the script and try using openGraph or any existing script that calls it.

Update: See the comments for updated versions!

openGraph = function( width=7 , height=7 , mag=1.0 , ... ) {
  if( class( try(RStudioGD(),silent=TRUE) ) == "try-error" ) {
    # Not in RStudio, use graphic windows
    if ( .Platform$OS.type != "windows" ) { # Mac OS, Linux
      if ( .Platform$GUI != "AQUA" ) { # Linux
        tryInfo = try( X11( width=width*mag , height=height*mag , 
                            type="cairo" , ... ) )
        if ( class(tryInfo)=="try-error" ) {
          lineInput = readline("WARNING: Previous graphics windows will be closed because of too many open windows.\nTO CONTINUE, PRESS <ENTER> IN R CONSOLE.\n")
          graphics.off() 
          X11( width=width*mag , height=height*mag , type="cairo" , ... )
        }
      } else {
        # Mac OS - use quartz device
        tryInfo = try( quartz(width=width*mag , height=height*mag ,
                              ... ) )
        if ( class(tryInfo)=="try-error") {
          if ( class(tryInfo)=="try-error" ) {
            lineInput = readline("WARNING: Previous graphics windows will be closed because of too many open windows.\nTO CONTINUE, PRESS <ENTER> IN R CONSOLE.\n")
            graphics.off() 
            quartz( width=width*mag , height=height*mag , ... )
          } 
        }
      }
    } else { # Windows OS
      tryInfo = try( windows( width=width*mag , height=height*mag , ... ) )
      if ( class(tryInfo)=="try-error" ) {
        lineInput = readline("WARNING: Previous graphics windows will be closed because of too many open windows.\nTO CONTINUE, PRESS <ENTER> IN R CONSOLE.\n")
        graphics.off() 
        windows( width=width*mag , height=height*mag , ... )    
      }
    }
  }
}


Thank you, David!

Tuesday, April 28, 2015

Informed priors for Bayesian comparison of two groups

The BEST programs, for Bayesian estimation of two groups, were written with generic vague priors only minimally informed by the scale of the data. Here are new versions of the programs that are better suited for specifying informed priors.

A little background: Suppose we have two groups of metric data. In each group the data seem to be unimodal and symmetrically distributed, but perhaps with outliers relative to a normal distribution. So we chose to describe the groups with possibly heavy-tailed t distributions, but using the same normality (df) parameter for both groups because outliers are rare (and it's the outliers that most influence the normality parameter). Thus there are five parameters to be estimated: μ1, σ1, μ2, σ2, and the normality parameter ν.

The original BEST program put simplistic broad priors on the five parameters. For example, σ1 was given a broad uniform prior extending across a range that far exceeded any plausible estimate for the scale of the data. It was explained how to make basic modifications of the prior in Appendix B of the article describing the programs. But the programs really were not intended for easy expression of informed priors. In the post I present a version of the BEST programs that are much easier for expression of informed priors.

In the new version, each parameter is given its own line of code for expressing its prior, so that each can be individually set.

  • Each μ is given a normal prior with its own (constant) mean and standard deviation. 
  • Each σ is given a gamma prior with its own (constant) shape and rate. The shape and rate are derived from the desired mode and standard deviation of the gamma distribution. 
  • The normality parameter ν is given a gamma prior with its own (constant) shape and rate. The shape and rate are derived from the desired mean and standard deviation of the gamma distribution. 

Pages 236-239 of DBDA2E explain the gamma distribution and formulas for converting the desired mode or mean and standard deviation to corresponding shape and rate values. In particular, it defines the R functions gammaShRaFromModeSD and gammaShRaFromMeanSD that convert desired mean or mode and SD to corresponding shape and rate parameters.

The JAGS model specification is shown below. To understand it, note that the metric data values are y[i] and the corresponding group membership is specified by x[i].

  model {
    for ( i in 1:Ntotal ) {
      y[i] ~ dt( mu[x[i]] , 1/sigma[x[i]]^2 , nu )
    }
    mu[1] ~ dnorm( muM1 , 1/muSD1^2 )  # prior for mu[1]
    sigma[1] ~ dgamma( Sh1 , Ra1 )     # prior for sigma[1]
    mu[2] ~ dnorm( muM2 , 1/muSD2^2 )  # prior for mu[2]
    sigma[2] ~ dgamma( Sh2 , Ra2 )     # prior for sigma[2]
    nu <- nuMinusOne+1
    nuMinusOne ~ dgamma( ShNu , RaNu ) # prior for nu
  }


To set the prior constants so that they are broad on the scale of the data, I use the following (somewhat arbitrary) settings:

    muM1 = mean(y)  # centered on the data
    muSD1 = sd(y)*5 # broad relative to the scale of the data

 
    Sh1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape
    Ra1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate
 

    muM2 = mean(y) 
    muSD2 = sd(y)*5 

    Sh2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape
    Ra2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate
 

    ShNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$shape
    RaNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$rate


Notice that the gamma prior for nuMinusOne is exactly the exponential prior used in the original program when its mean=29 and sd=29.

When the new program is run with the broad priors specified above, it gives output virtually identical to the original program for any modest amount of data.

Suppose now that you want to specify a strong prior on the first group, such that you want μ1 to be very nearly 100 and σ1 to be very nearly 15. You could specify that as

    muM1 = 15
    muSD1 = 1 # very small uncertainty

 
    Sh1 = gammaShRaFromModeSD( mode=15 , sd=1 )$shape
    Ra1 = gammaShRaFromModeSD( mode=15 , sd=1 )$rate


But it doesn't often make sense to just arbitrary put strong constraints in the prior. Instead, and in general, we would like the prior to resemble a posterior from previous data, starting with a vague proto-prior. Instead of relying on intuition to express an informed prior directly on the parameters, we start with a vague proto-prior and enter some reasonable data that instantiate our prior knowledge. Then we run the Bayesian analysis on the data and observe the resulting posterior distribution. It is the posterior distribution that is now the informed prior for subsequent data analysis. From the posterior distribution of the prior data, just "read off" the mode and sd of the marginal posterior on σ and other parameters. In the program, the MCMC chain is in the matrix mcmcChain, so for example, we can get at the mean and sd of the nu parameter using mean( mcmcChain[,"nu"] ) and sd( mcmcChain[,"nu"] ).

The full scripts are listed below:


BESTgamma.R:



# MODIFIED 4/28/15 WITH GAMMA PRIOR ON SIGMA
# John K. Kruschke 
# johnkruschke@gmail.com
# http://www.indiana.edu/~kruschke/BEST/
#
# This program is believed to be free of errors, but it comes with no guarantee!
# The user bears all responsibility for interpreting the results.
# Please check the webpage above for updates or corrections.
#
### ***************************************************************
### ******** SEE FILE BESTgammaExample.R FOR INSTRUCTIONS **************
### ***************************************************************

source("openGraphSaveGraph.R") # graphics functions for Windows, Mac OS, Linux

# Function for shape and rate parameters of gamma. From DBDA2E-utilities.R; see
# p. 238 of DBDA2E. DBDA2E = Doing Bayesian Data Analysis Second Edition,
# https://sites.google.com/site/doingbayesiandataanalysis/
gammaShRaFromModeSD = function( mode , sd ) {
  if ( mode <=0 ) stop("mode must be > 0")
  if ( sd <=0 ) stop("sd must be > 0")
  rate = ( mode + sqrt( mode^2 + 4 * sd^2 ) ) / ( 2 * sd^2 )
  shape = 1 + mode * rate
  return( list( shape=shape , rate=rate ) )
}

gammaShRaFromMeanSD = function( mean , sd ) {
  if ( mean <=0 ) stop("mean must be > 0")
  if ( sd <=0 ) stop("sd must be > 0")
  shape = mean^2/sd^2
  rate = mean/sd^2
  return( list( shape=shape , rate=rate ) )
}

BESTmcmc = function( y1, y2, numSavedSteps=100000, thinSteps=1, showMCMC=FALSE) {
  # This function generates an MCMC sample from the posterior distribution.
  # Description of arguments:
  # showMCMC is a flag for displaying diagnostic graphs of the chains.
  #    If F (the default), no chain graphs are displayed. If T, they are.
  require(rjags)
 
  #----------------------------------------------------------------------------
  # THE MODEL.
  modelString = "
  model {
    for ( i in 1:Ntotal ) {
      y[i] ~ dt( mu[x[i]] , 1/sigma[x[i]]^2 , nu )
    }
    mu[1] ~ dnorm( muM1 , 1/muSD1^2 )  # prior for mu[1]
    sigma[1] ~ dgamma( Sh1 , Ra1 )     # prior for sigma[1]
    mu[2] ~ dnorm( muM2 , 1/muSD2^2 )  # prior for mu[2]
    sigma[2] ~ dgamma( Sh2 , Ra2 )     # prior for sigma[2]
    nu <- nuMinusOne+1
    nuMinusOne ~ dgamma( ShNu , RaNu ) # prior for nu
  }
  " # close quote for modelString
  # Write out modelString to a text file
  writeLines( modelString , con="BESTmodel.txt" )
 
  #------------------------------------------------------------------------------
  # THE DATA.
  # Load the data:
  y = c( y1 , y2 ) # combine data into one vector
  x = c( rep(1,length(y1)) , rep(2,length(y2)) ) # create group membership code
  Ntotal = length(y)
  # Specify the data and prior constants in a list, for later shipment to JAGS:
  dataList = list(
    y = y ,
    x = x ,
    Ntotal = Ntotal ,
    # Below are the specifications for the prior constants. These are generic
    # broad priors, but you can replace with informed values if appropriate.
    muM1 = mean(y) ,
    muSD1 = sd(y)*5 ,
    muM2 = mean(y) ,
    muSD2 = sd(y)*5 ,
    Sh1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape ,
    Ra1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate ,
    Sh2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape ,
    Ra2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate ,
    ShNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$shape ,
    RaNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$rate
  )
 
  #------------------------------------------------------------------------------
  # INTIALIZE THE CHAINS.
  # Initial values of MCMC chains based on data:
  mu = c( mean(y1) , mean(y2) )
  sigma = c( sd(y1) , sd(y2) )
  # Regarding initial values in next line: (1) sigma will tend to be too big if
  # the data have outliers, and (2) nu starts at 5 as a moderate value. These
  # initial values keep the burn-in period moderate.
  initsList = list( mu = mu , sigma = sigma , nuMinusOne = 4 )
 
  #------------------------------------------------------------------------------
  # RUN THE CHAINS

  parameters = c( "mu" , "sigma" , "nu" )     # The parameters to be monitored
  adaptSteps = 500               # Number of steps to "tune" the samplers
  burnInSteps = 1000
  nChains = 3
  nIter = ceiling( ( numSavedSteps * thinSteps ) / nChains )
  # Create, initialize, and adapt the model:
  jagsModel = jags.model( "BESTmodel.txt" , data=dataList , inits=initsList ,
                          n.chains=nChains , n.adapt=adaptSteps )
  # Burn-in:
  cat( "Burning in the MCMC chain...\n" )
  update( jagsModel , n.iter=burnInSteps )
  # The saved MCMC chain:
  cat( "Sampling final MCMC chain...\n" )
  codaSamples = coda.samples( jagsModel , variable.names=parameters ,
                              n.iter=nIter , thin=thinSteps )
  # resulting codaSamples object has these indices:
  #   codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]
 
  #------------------------------------------------------------------------------
  # EXAMINE THE RESULTS
  if ( showMCMC ) {
    openGraph(width=7,height=7)
    autocorr.plot( codaSamples[[1]] , ask=FALSE )
    show( gelman.diag( codaSamples ) )
    effectiveChainLength = effectiveSize( codaSamples )
    show( effectiveChainLength )
  }

  # Convert coda-object codaSamples to matrix object for easier handling.
  # But note that this concatenates the different chains into one long chain.
  # Result is mcmcChain[ stepIdx , paramIdx ]
  mcmcChain = as.matrix( codaSamples )
  return( mcmcChain )

} # end function BESTmcmc

#==============================================================================

BESTsummary = function( y1 , y2 , mcmcChain ) {
  source("HDIofMCMC.R")
  mcmcSummary = function( paramSampleVec , compVal=NULL ) {
    meanParam = mean( paramSampleVec )
    medianParam = median( paramSampleVec )
    dres = density( paramSampleVec )
    modeParam = dres$x[which.max(dres$y)]
    hdiLim = HDIofMCMC( paramSampleVec )
    if ( !is.null(compVal) ) {
      pcgtCompVal = ( 100 * sum( paramSampleVec > compVal )
                      / length( paramSampleVec ) )
    } else {
      pcgtCompVal=NA
    }
    return( c( meanParam , medianParam , modeParam , hdiLim , pcgtCompVal ) )
  }
  # Define matrix for storing summary info:
  summaryInfo = matrix( 0 , nrow=9 , ncol=6 , dimnames=list(
    PARAMETER=c( "mu1" , "mu2" , "muDiff" , "sigma1" , "sigma2" , "sigmaDiff" ,
             "nu" , "nuLog10" , "effSz" ),
    SUMMARY.INFO=c( "mean" , "median" , "mode" , "HDIlow" , "HDIhigh" ,
                    "pcgtZero" )
    ) )
  summaryInfo[ "mu1" , ] = mcmcSummary( mcmcChain[,"mu[1]"] )
  summaryInfo[ "mu2" , ] = mcmcSummary( mcmcChain[,"mu[2]"] )
  summaryInfo[ "muDiff" , ] = mcmcSummary( mcmcChain[,"mu[1]"]
                                           - mcmcChain[,"mu[2]"] ,
                                           compVal=0 )
  summaryInfo[ "sigma1" , ] = mcmcSummary( mcmcChain[,"sigma[1]"] )
  summaryInfo[ "sigma2" , ] = mcmcSummary( mcmcChain[,"sigma[2]"] )
  summaryInfo[ "sigmaDiff" , ] = mcmcSummary( mcmcChain[,"sigma[1]"]
                                              - mcmcChain[,"sigma[2]"] ,
                                              compVal=0 )
  summaryInfo[ "nu" , ] = mcmcSummary( mcmcChain[,"nu"] )
  summaryInfo[ "nuLog10" , ] = mcmcSummary( log10(mcmcChain[,"nu"]) )
 
  N1 = length(y1)
  N2 = length(y2)
  effSzChain = ( ( mcmcChain[,"mu[1]"] - mcmcChain[,"mu[2]"] )
            / sqrt( ( mcmcChain[,"sigma[1]"]^2 + mcmcChain[,"sigma[2]"]^2 ) / 2 ) )
  summaryInfo[ "effSz" , ] = mcmcSummary( effSzChain , compVal=0 )
  # Or, use sample-size weighted version:
  # effSz = ( mu1 - mu2 ) / sqrt( ( sigma1^2 *(N1-1) + sigma2^2 *(N2-1) )
  #                               / (N1+N2-2) )
  # Be sure also to change plot label in BESTplot function, below.
  return( summaryInfo )
}

#==============================================================================

BESTplot = function( y1 , y2 , mcmcChain , ROPEm=NULL , ROPEsd=NULL ,
                    ROPEeff=NULL , showCurve=FALSE , pairsPlot=FALSE ) {
  # This function plots the posterior distribution (and data).
  # Description of arguments:
  # y1 and y2 are the data vectors.
  # mcmcChain is a list of the type returned by function BTT.
  # ROPEm is a two element vector, such as c(-1,1), specifying the limit
  #   of the ROPE on the difference of means.
  # ROPEsd is a two element vector, such as c(-1,1), specifying the limit
  #   of the ROPE on the difference of standard deviations.
  # ROPEeff is a two element vector, such as c(-1,1), specifying the limit
  #   of the ROPE on the effect size.
  # showCurve is TRUE or FALSE and indicates whether the posterior should
  #   be displayed as a histogram (by default) or by an approximate curve.
  # pairsPlot is TRUE or FALSE and indicates whether scatterplots of pairs
  #   of parameters should be displayed.
  mu1 = mcmcChain[,"mu[1]"]
  mu2 = mcmcChain[,"mu[2]"]
  sigma1 = mcmcChain[,"sigma[1]"]
  sigma2 = mcmcChain[,"sigma[2]"]
  nu = mcmcChain[,"nu"]
  if ( pairsPlot ) {
    # Plot the parameters pairwise, to see correlations:
    openGraph(width=7,height=7)
    nPtToPlot = 1000
    plotIdx = floor(seq(1,length(mu1),by=length(mu1)/nPtToPlot))
    panel.cor = function(x, y, digits=2, prefix="", cex.cor, ...) {
      usr = par("usr"); on.exit(par(usr))
      par(usr = c(0, 1, 0, 1))
      r = (cor(x, y))
      txt = format(c(r, 0.123456789), digits=digits)[1]
      txt = paste(prefix, txt, sep="")
      if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
      text(0.5, 0.5, txt, cex=1.25 ) # was cex=cex.cor*r
    }
    pairs( cbind( mu1 , mu2 , sigma1 , sigma2 , log10(nu) )[plotIdx,] ,
           labels=c( expression(mu[1]) , expression(mu[2]) ,
                     expression(sigma[1]) , expression(sigma[2]) ,
                     expression(log10(nu)) ) ,
           lower.panel=panel.cor , col="skyblue" )
  }
  source("plotPost.R")
  # Set up window and layout:
  openGraph(width=6.0,height=8.0)
  layout( matrix( c(4,5,7,8,3,1,2,6,9,10) , nrow=5, byrow=FALSE ) )
  par( mar=c(3.5,3.5,2.5,0.5) , mgp=c(2.25,0.7,0) )
 
  # Select thinned steps in chain for plotting of posterior predictive curves:
  chainLength = NROW( mcmcChain )
  nCurvesToPlot = 30
  stepIdxVec = seq( 1 , chainLength , floor(chainLength/nCurvesToPlot) )
  xRange = range( c(y1,y2) )
  xLim = c( xRange[1]-0.1*(xRange[2]-xRange[1]) ,
            xRange[2]+0.1*(xRange[2]-xRange[1]) )
  xVec = seq( xLim[1] , xLim[2] , length=200 )
  maxY = max( dt( 0 , df=max(nu[stepIdxVec]) ) /
    min(c(sigma1[stepIdxVec],sigma2[stepIdxVec])) )
  # Plot data y1 and smattering of posterior predictive curves:
  stepIdx = 1
  plot( xVec , dt( (xVec-mu1[stepIdxVec[stepIdx]])/sigma1[stepIdxVec[stepIdx]] ,
                   df=nu[stepIdxVec[stepIdx]] )/sigma1[stepIdxVec[stepIdx]] ,
        ylim=c(0,maxY) , cex.lab=1.75 ,
        type="l" , col="skyblue" , lwd=1 , xlab="y" , ylab="p(y)" ,
        main="Data Group 1 w. Post. Pred." )
  for ( stepIdx in 2:length(stepIdxVec) ) {
    lines(xVec, dt( (xVec-mu1[stepIdxVec[stepIdx]])/sigma1[stepIdxVec[stepIdx]] ,
                      df=nu[stepIdxVec[stepIdx]] )/sigma1[stepIdxVec[stepIdx]] ,
           type="l" , col="skyblue" , lwd=1 )
  }
  histBinWd = median(sigma1)/2
  histCenter = mean(mu1)
  histBreaks = sort( c( seq( histCenter-histBinWd/2 , min(xVec)-histBinWd/2 ,
                             -histBinWd ),
                        seq( histCenter+histBinWd/2 , max(xVec)+histBinWd/2 ,
                             histBinWd ) , xLim ) )
  histInfo = hist( y1 , plot=FALSE , breaks=histBreaks )
  yPlotVec = histInfo$density
  yPlotVec[ yPlotVec==0.0 ] = NA
  xPlotVec = histInfo$mids
  xPlotVec[ yPlotVec==0.0 ] = NA
  points( xPlotVec , yPlotVec , type="h" , lwd=3 , col="red" )
  text( max(xVec) , maxY , bquote(N[1]==.(length(y1))) , adj=c(1.1,1.1) )
  # Plot data y2 and smattering of posterior predictive curves:
  stepIdx = 1
  plot( xVec , dt( (xVec-mu2[stepIdxVec[stepIdx]])/sigma2[stepIdxVec[stepIdx]] ,
                   df=nu[stepIdxVec[stepIdx]] )/sigma2[stepIdxVec[stepIdx]] ,
        ylim=c(0,maxY) , cex.lab=1.75 ,
        type="l" , col="skyblue" , lwd=1 , xlab="y" , ylab="p(y)" ,
        main="Data Group 2 w. Post. Pred." )
  for ( stepIdx in 2:length(stepIdxVec) ) {
    lines(xVec, dt( (xVec-mu2[stepIdxVec[stepIdx]])/sigma2[stepIdxVec[stepIdx]] ,
                      df=nu[stepIdxVec[stepIdx]] )/sigma2[stepIdxVec[stepIdx]] ,
           type="l" , col="skyblue" , lwd=1 )
  }
  histBinWd = median(sigma2)/2
  histCenter = mean(mu2)
  histBreaks = sort( c( seq( histCenter-histBinWd/2 , min(xVec)-histBinWd/2 ,
                             -histBinWd ),
                        seq( histCenter+histBinWd/2 , max(xVec)+histBinWd/2 ,
                             histBinWd ) , xLim ) )
  histInfo = hist( y2 , plot=FALSE , breaks=histBreaks )
  yPlotVec = histInfo$density
  yPlotVec[ yPlotVec==0.0 ] = NA
  xPlotVec = histInfo$mids
  xPlotVec[ yPlotVec==0.0 ] = NA
  points( xPlotVec , yPlotVec , type="h" , lwd=3 , col="red" )
  text( max(xVec) , maxY , bquote(N[2]==.(length(y2))) , adj=c(1.1,1.1) )

  # Plot posterior distribution of parameter nu:
  histInfo = plotPost( log10(nu) , col="skyblue" , # breaks=30 ,
                       showCurve=showCurve ,
                  xlab=bquote("log10("*nu*")") , cex.lab = 1.75 , showMode=TRUE ,
                  main="Normality" ) #  (<0.7 suggests kurtosis)

  # Plot posterior distribution of parameters mu1, mu2, and their difference:
  xlim = range( c( mu1 , mu2 ) )
  histInfo = plotPost( mu1 ,  xlim=xlim , cex.lab = 1.75 ,
                       showCurve=showCurve ,
                  xlab=bquote(mu[1]) , main=paste("Group",1,"Mean") ,
                  col="skyblue" )
  histInfo = plotPost( mu2 ,  xlim=xlim , cex.lab = 1.75 ,
                       showCurve=showCurve ,
                  xlab=bquote(mu[2]) , main=paste("Group",2,"Mean") ,
                  col="skyblue" )
  histInfo = plotPost( mu1-mu2 , compVal=0 ,  showCurve=showCurve ,
                  xlab=bquote(mu[1] - mu[2]) , cex.lab = 1.75 , ROPE=ROPEm ,
                  main="Difference of Means" , col="skyblue" )

  # Plot posterior distribution of param's sigma1, sigma2, and their difference:
  xlim=range( c( sigma1 , sigma2 ) )
  histInfo = plotPost( sigma1 ,  xlim=xlim , cex.lab = 1.75 ,
                       showCurve=showCurve ,
                  xlab=bquote(sigma[1]) , main=paste("Group",1,"Std. Dev.") ,
                  col="skyblue" , showMode=TRUE )
  histInfo = plotPost( sigma2 ,  xlim=xlim , cex.lab = 1.75 ,
                       showCurve=showCurve ,
                  xlab=bquote(sigma[2]) , main=paste("Group",2,"Std. Dev.") ,
                  col="skyblue" , showMode=TRUE )
  histInfo = plotPost( sigma1-sigma2 ,
                       compVal=0 ,  showCurve=showCurve ,
                       xlab=bquote(sigma[1] - sigma[2]) , cex.lab = 1.75 ,
                       ROPE=ROPEsd ,
               main="Difference of Std. Dev.s" , col="skyblue" , showMode=TRUE )

  # Plot of estimated effect size. Effect size is d-sub-a from
  # Macmillan & Creelman, 1991; Simpson & Fitter, 1973; Swets, 1986a, 1986b.
  effectSize = ( mu1 - mu2 ) / sqrt( ( sigma1^2 + sigma2^2 ) / 2 )
  histInfo = plotPost( effectSize , compVal=0 ,  ROPE=ROPEeff ,
                        showCurve=showCurve ,
                  xlab=bquote( (mu[1]-mu[2])
                    /sqrt((sigma[1]^2 +sigma[2]^2 )/2 ) ),
              showMode=TRUE , cex.lab=1.0 , main="Effect Size" , col="skyblue" )
  return( BESTsummary( y1 , y2 , mcmcChain ) )} # end of function BESTplot

#==============================================================================


Script to run after above file is sourced:

###  Make sure that the following programs are all
###    in the same folder as this file:
###      BESTgamma.R
###      plotPost.R
###      HDIofMCMC.R
###      HDIofICDF.R
###      openGraphSaveGraph.R
### Make sure that R's working directory is the folder in which those
###    files reside. In RStudio, use menu tabs Tools -> Set Working Directory.
###    If working in R, use menu tabs File -> Change Dir.

# Get the functions loaded into R's working memory:
source("BESTgamma.R")

# Specify data as vectors:
# (Replace with your own data as needed. R can read many formats of data files,
# see the commands "scan" or "read.table" and its variants.)
y1 = c(101,100,102,104,102,97,105,105,98,101,100,123,105,103,100,95,102,106,
       109,102,82,102,100,102,102,101,102,102,103,103,97,97,103,101,97,104,
       96,103,124,101,101,100,101,101,104,100,101)
y2 = c(99,101,100,101,102,100,97,101,104,101,102,102,100,105,88,101,100,
       104,100,100,100,101,102,103,97,101,101,100,101,99,101,100,100,
       101,100,99,101,100,102,99,100,99)

# Run the Bayesian analysis:
mcmcChain = BESTmcmc( y1 , y2 , numSavedSteps=10000 )

# Plot the results of the Bayesian analysis:
postInfo = BESTplot( y1 , y2 , mcmcChain , ROPEeff=c(-0.1,0.1) )
# Show detailed summary info on console:
show( postInfo )

#-------------------------------------------------------------------------------





Monday, April 27, 2015

Final days for early registration prices at Stats Camp (including Bayesian course)


Early registration prices end this week for Doing Bayesian Data Analysis, June 1 - 5, 2015, a five-day course offered through Stats Camp convened in Dallas, Texas.

The web page for the June 1-5 workshop is here.

Other workshops are listed here.