Source: Rob J Hyndman

Link: Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization

Large collections of time series often have aggregation constraints due to product or geographical groupings. The forecasts for the most disaggregated series are usually required to add-up exactly to the forecasts of the aggregated series, a constraint we refer to as “coherence”. Forecast reconciliation is the process of adjusting forecasts to make them coherent. The reconciliation algorithm proposed by Hyndman et al. (2011) is based on a generalized least squares estimator that requires an estimate of the covariance matrix of the coherency errors (i.<img src="http://feeds.feedburn ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 14 of our Applied Regression final exam (and solution to question 13)

Here’s question 14 of our exam: 14. You are predicting whether a student passes a class given pre-test score. The fitted model is, Pr(Pass) = logit^−1(a_j + 0.1x), for a student in classroom j whose pre-test score is x. The pre-test scores range from 0 to 50. The a_j’s are estimated to have a normal ...

Source: Blog on rOpenSci - open tools for open science

Link: Community Call - Involving Multilingual Communities

rOpenSci’s community is increasingly international and multilingual. While we have operated primarily in English, we now receive submissions of packages from authors whose primary language is not. As we expand our community in this way, we want to learn from the experience of other organizations. How can we manage our peer-review process and open-source projects to be welcoming to non-native English speakers? Our guest speakers will include: Rayna Harris, who has co-led work with The Carpentries in internationalization of ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Naomi Wolf and David Brooks

Palko makes a good point: Parul Sehgal has a devastating review of the latest from Naomi Wolf, but while Sehgal is being justly praised for her sharp and relentless treatment of her subject, she stops short before she gets to the most disturbing and important implication of the story. There’s an excellent case made here ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 13 of our Applied Regression final exam (and solution to question 12)

Here’s question 13 of our exam: 13. You fit a model of the form: y ∼ x + u full + (1 | group). The estimated coefficients are 2.5, 0.7, and 0.5 respectively for the intercept, x, and u full, with group and individual residual standard deviations estimated as 2.0 and 3.0 respectively. Write the ...

Source: Econometrics and Free Software

Link: Intermittent demand, Croston and Die Hard

<div style="text-align:center;"> <p><a href="https://en.wikipedia.org/wiki/List_of_Christmas_films"> <img src="./img/diehard.jpg" title = "Die Hard is the best Christmas movie" width="600" height="400"></a></p> </div> <p>I have recently been confronted to a kind of data set and problem that I was not even aware existed: intermittent demand data. Intermittent demand arises when the demand for a certain good arrives sporadically. Let’s take a look at an example, by analyzing the number of downloads for the <code>{RDieHarder}</code> package:</p> <pre class="r"><code>library(tidyverse) library(tsi ...

Source: Blog on rOpenSci - open tools for open science

Link: Taking over maintenance of a software package

Software is maintained by people. While software can in theory live on indefinitely, to do so requires people. People change jobs, move locations, retire, and unfortunately die sometimes. When a software maintainer can no longer maintain a package, what happens to the software? Because of the fragility of people in software, in an ideal world a piece of software should have as many maintainers as possible. Increasing maintainers increases the so-called bus ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 12 of our Applied Regression final exam (and solution to question 11)

Here’s question 12 of our exam: 12. In the regression above, suppose you replaced height in inches by height in centimeters. What would then be the intercept and slope of the regression? (One inch is 2.54 centimeters.) And the solution to question 11: 11. We defined a new variable based on weight (in pounds): heavy ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: How statistics is used to crush (scientific) dissent.

Lakeland writes: When we interpret powerful as political power, I think it’s clear that Classical Statistics has the most political power, that is, the power to get people to believe things and change policy or alter funding decisions etc… Today Bayes is questioned at every turn, and ridiculed for being “subjective” with a focus on ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 11 of our Applied Regression final exam (and solution to question 10)

Here’s question 11 of our exam: 11. We defined a new variable based on weight (in pounds): heavy 200 and then ran a logistic regression, predicting “heavy” from height (in inches): glm(formula = heavy ~ height, family = binomial(link = "logit")) coef.est coef.se (Intercept) -21.51 1.60 height 0.28 0.02 --- n = 1984, k = ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 10 of our Applied Regression final exam (and solution to question 9)

Here’s question 10 of our exam: 10. For the above example, we then created indicator variables, age18_29, age30_44, age45_64, and age65up, for four age categories. We then fit a new regression: lm(formula = weight ~ age30_44 + age45_64 + age65up) coef.est coef.se (Intercept) 157.2 5.4 age30_44TRUE 19.1 7.0 age45_64TRUE 27.2 7.6 age65upTRUE 8.5 8.7 n ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: Run Rstudio server with singularity on HPC

<h3 id="background">Background</h3> <p>Please read the following before go ahead:</p> <ul> <li>what is <a href="https://www.docker.com/" target="_blank">docker</a>?<br /></li> <li>what is <a href="https://www.rocker-project.org/" target="_blank">Rocker</a>?<br /></li> <li>what is <a href="https://www.sylabs.io/docs/" target="_blank">singularity</a>?<br /></li> </ul> <p>from Harvard Research computing website: <a href="https://www.rc.fas.harvard.edu/resources/documentation/software/singularity-on-odyssey/" target="_blank">Odyssey has singularity installed</a>.</p> <blockquote> <p>Why ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 9 of our Applied Regression final exam (and solution to question 8)

Here’s question 9 of our exam: 9. We downloaded data with weight (in pounds) and age (in years) from a random sample of American adults. We created a new variables, age10 = age/10. We then fit a regression: lm(formula = weight ~ age10) coef.est coef.se (Intercept) 161.0 7.3 age10 2.6 1.6 n = 2009, k ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 8 of our Applied Regression final exam (and solution to question 7)

Here’s question 8 of our exam: 8. Out of a random sample of 50 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. And the solution to question 7: 7. You conduct an experiment in which some ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 7 of our Applied Regression final exam (and solution to question 6)

Here’s question 7 of our exam: 7. You conduct an experiment in which some people get a special get-out-the-vote message and others do not. Then you follow up with a sample, after the election, to see if they voted. If you follow up with 500 people, how large an effect would you be able to ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 6 of our Applied Regression final exam (and solution to question 5)

Here’s question 6 of our exam: 6. You are applying hierarchical logistic regression on a survey of 1500 people to estimate support for a federal jobs program. The model is fit using, as a state-level predictor, the Republican presidential vote in the state. Which of the following two statements is basically true? (a) Adding a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Tony nominations mean nothing

Someone writes: I searched up *Tony nominations mean nothing* and I found nothing. So I had to write this. There are currently 41 theaters that the Tony awards accept when nominating their choices. If we are being as generous as possible, we could say that every one of those theaters will be hosting a performance ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 5 of our Applied Regression final exam (and solution to question 4)

Here’s question 5 of our exam: 5. You have just graded an exam with 28 questions and 15 students. You fit a logistic item-response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (a) If a question is answered correctly by students with low ability, but is missed by ...

Source: Econometrics and Free Software

<div style="text-align:center;"> <p><a href="https://en.wikipedia.org/wiki/Seneca_the_Younger"> <img src="./img/seneca.png" title = "Seneca the Younger" width="400" height="600"></a></p> </div> <p>Lately I’ve been interested in trying to cluster documents, and to find similar documents based on their contents. In this blog post, I will use <a href="https://en.wikisource.org/wiki/Moral_letters_to_Lucilius">Seneca’s <em>Moral letters to Lucilius</em></a> and compute the pairwise <a href="https://en.wikipedia.org/wiki/Cosine_similarity">cosine similarity</a> of his 124 letters. Computing the ...

Source: Blog on rOpenSci - open tools for open science

Link: Access Publisher Copyright & Self-Archiving Policies via the 'SHERPA/RoMEO' API

We’ve been following rOpenSci’s work for a long time, and we use several packages on a daily basis for our scientific projects, especially taxize to clean species names, rredlist to extract species IUCN statuses or treeio to work with phylogenetic trees. rOpensci is a perfect incarnation of a vibrant and diverse community where people learn and develop new ideas, especially regarding scientific packages. We’ve also noticed how much the thorough review process improves the quality of the packages that join the rOpenSci ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 4 of our Applied Regression final exam (and solution to question 3)

Here’s question 4 of our exam: 4. A researcher is imputing missing responses for income in a social survey of American households, using for the imputation a regression model given demographic variables. Which of the following two statements is basically true? (a) If you impute income deterministically using a fitted regression model (that is, imputing ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 3 of our Applied Regression final exam (and solution to question 2)

Here’s question 3 of our exam: Here is a fitted model from the Bangladesh analysis predicting whether a person with high-arsenic drinking water will switch wells, given the arsenic level in their existing well and the distance to the nearest safe well. glm(formula = switch ~ dist100 + arsenic, family=binomial(link="logit")) coef.est coef.se (Intercept) 0.00 0.08 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 2 of our Applied Regression final exam (and solution to question 1)

Here’s question 2 of our exam: 2. A multiple-choice test item has four options. Assume that a student taking this question either knows the answer or does a pure guess. A random sample of 100 students take the item. 60% get it correct. Give an estimate and 95% confidence interval for the percentage in the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Still at work on the piranha theorems

We’re still at work on the piranha theorems. But, in the meantime, I happened to show somebody this: There can be some large and predictable effects on behavior, but not a lot, because, if there were, then these different effects would interfere with each other, and as a result it would be hard to see ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Question 1 of our Applied Regression final exam

As promised, it’s time to go over the final exam of our applied regression class. It was an in-class exam, 3 hours for 15 questions. Here’s the first question on the test: 1. A randomized experiment is performed within a survey. 1000 people are contacted. Half the people contacted are promised a $5 incentive to ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Data is Personal” and the maturing of the literature on statistical graphics

Traditionally there have been five ways to write about statistical graphics: 1. Exhortations to look at your data, make graphs, do visualizations and not just blindly follow statistical procedures. 2. Criticisms and suggested improvements for graphs, both general (pie-charts! double y-axes! colors! labels!) and specific. 3. Instruction and examples of how to make effective graphs ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: New! from Bales/Pourzanjani/Vehtari/Petzold: Selecting the Metric in Hamiltonian Monte Carlo

Ben Bales, Arya Pourzanjani, Aki Vehtari, and Linda Petzold write: We present a selection criterion for the Euclidean metric adapted during warmup in a Hamiltonian Monte Carlo sampler that makes it possible for a sampler to automatically pick the metric based on the model and the availability of warmup draws. Additionally, we present a new ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: My (remote) talk this Friday 3pm at the Department of Cognitive Science at UCSD

It was too much to do one more flight so I’ll do this one in (nearly) carbon-free style using hangout or skype. It’s 3pm Pacific time in CSB (Cognitive Science Building) 003 at the University of California, San Diego. This is what they asked for in the invite: Our Friday afternoon COGS200 series has been ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Why edit a journal? More generally, how to contribute to scientific discussion?

The other day I wrote: Journal editing is a volunteer job, and people sign up for it because they want to publish exciting new work, or maybe because they enjoy the power trip, or maybe out of a sense of duty—but, in any case, they typically aren’t in it for the controversy. Jon Baron, editor ...

Source: Hypertidy website

Link: mesh3d - recent changes in rgl workhorse format

This post describes the mesh3d format used in the rgl package and particularly how colour properties are stored and used. There are recent changes to this behaviour (see ‘meshColor’), and previously the situation was not clearly documented. rgl The rgl package has long provided interactive 3D graphics for R. The neat thing for me about 3D graphics is the requirement for mesh forms of data, and the fact that meshes are extremely useful for very many ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: Calculate scATACseq TSS enrichment score

<p><a href="https://www.encodeproject.org/data-standards/terms/#enrichment" target="_blank">TSS enrichment score</a> serves as an important quality control metric for ATACseq data. I want to write a script for single cell ATACseq data.</p> <p>From the Encode page:</p> <blockquote> <p>Transcription Start Site (TSS) Enrichment Score - The TSS enrichment calculation is a signal to noise calculation. The reads around a reference set of TSSs are collected to form an aggregate distribution of reads centered on the TSSs and extending to 1000 bp in either direction (for a total of 2000bp). This ...

Source: Simply Statistics

Link: Research quality data and research quality databases

<p>When you are doing data science, you are doing research. You want to use data to answer a question, identify a new pattern, improve a current product, or come up with a new product. The common factor underlying each of these tasks is that you want to use the data to answer a question that you haven’t answered before. The most effective process we have come up for getting those answers is the scientific research process. That is why the key word in data science is not data, it is science.</p> <p>No matter where you are doing data science - in academia, in a non-profit, or in a company - you ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Concurve plots consonance curves, p-value functions, and S-value functions

Andrew Vigotsky writes: Now that abandoning significance and embracing uncertainty is in the air, we think this package, which runs in R or Stata, may be of interest to both you and your readers. Concurve plots consonance curves, p-value functions, and S-value functions to allow readers and researchers to get a better feel of the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Let’s publish everything.

The other day someone pointed me to this article by James Kaufman and Vlad Glǎveanu in a psychology journal which begins: How does the current replication crisis, along with other recent psychological trends, affect scientific creativity? To answer this question, we consider current debates regarding replication through the lenses of creativity research and theory. Both ...

Source: Blog on rOpenSci - open tools for open science

Link: ramlegacy: a package for RAM Legacy Database

Introduction ramlegacy is a new R package to download, cache and read in all the different versions of the RAM Legacy Stock Assessment Database, a public database containing stock assessment results of commercially exploited marine populations from around the world. The package accomplishes all this by: Providing a function download_ramlegacy(), to download all the available versions of the RAM Database and cache them on the user’s computer in a location provided by the rappdirs ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Peter Ellis on Forecasting Antipodal Elections with Stan

I liked this intro to Peter Ellis from Rob J. Hyndman’s talk announcement: He [Peter Ellis] started forecasting elections in New Zealand as a way to learn how to use Stan, and the hobby has stuck with him since he moved back to Australia in late 2018. You may remember Peter from my previous post ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: What pieces do chess grandmasters move, and when?

The above image, from T. J. Mahr, is a cleaned-up version of this graph: which in turn is a slight improvement on a graph posted by Dan Goldstein (with R code!) which came from Ashton Anderson. The original, looks like this: This is just fine, but I had a few changes to make. I thought ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: They’re working for the clampdown

This is just disgraceful: powerful academics using their influence to suppress (“clamp down on”) dissent. They call us terrorists, they lie about us in their journals, and they plot to clamp down on us. I can’t say at this point that I’m surprised to see this latest, but it saddens and angers me nonetheless to ...

Source: Homepage on Liechi | 張列弛

Link: 谁有绿胡子

这篇文章也可以有个副标题：“自我标签”。 自然选择理论自提出以来，遇到了很多挑战，其中之一可以表述为：像团结友爱，互帮互组这样看起来很美好的特 ...

Source: Rob J Hyndman

Link: Poll position: statistics and the Australian federal election

One of the few people in Australia who did not write off a possible Coalition win at the recent federal election was Peter Ellis. We’ve invited him to come and give a talk about making sense of opinion polls and the Australian federal election on Friday this week at Monash University. Visitors are welcome. Here are the details. 11am, 31 May 2019. Room G03, Learning and Teaching Building, 19 Ancora Imparo Way, Clayton Campus, Monash University<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/-TmycRneZa4" height="1" width="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Crystallography Corner: The result is difficult to reproduce, but the result is still valid.

Joel Bernstein writes: I just finished reading your oped article on reproducibility in science. As an experimental scientist – more precisely a chemical crystallographer – I have had to deal with this kind of situation a number of times, and at least two examples may serve as the possible exceptions to your rules. One of ...

Source: Statistical Modeling, Causal Inference, and Social Science

NA ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Donald J. Trump and Robert E. Lee

The other day the president made some news by praising Civil War general Robert E. Lee, and it struck me that Trump and Lee had a certain amount in common. Not in their personalities, but in their situations. Lee could’ve fought on the Union side in the Civil War. Or he could’ve saved a couple ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Against Arianism 3: Consider the cognitive models of the field

“You took my sadness out of context at the Mariners Apartment Complex” – Lana Del Rey It’s sunny, I’m in England, and I’m having a very tasty beer, and Lauren, Andrew, and I just finished a paper called The experiment is just as important as the likelihood in understanding the prior: A cautionary note on robust ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: John Le Carre is good at integrating thought and action

I was reading a couple old Le Carre spy novels. They have their strong points and their weak points; I’m not gonna claim that Le Carre is a great writer. He’s no George Orwell or Graham Greene. (This review by the great Clive James nails Le Carre perfectly.) But I did notice one thing Le ...

Source: Rob J Hyndman

Link: Forecasting in social settings: the state of the art

This paper provides a non-systematic review of the progress of forecasting in social settings. It is aimed at someone outside the field of forecasting, wanting to appreciate the results of the M4 Competition by reading a survey paper to get informed about the state of the art of this discipline. It discusses the recorded improvements over time in forecast accuracy, the need to capture forecast uncertainty, and what can go wrong with predictions.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/yPL2ka6q1aw" height="1" width="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Pushing the guy in front of the trolley

So. I was reading the London Review of Books the other day and came across this passage by the philosopher Kieran Setiya: Some of the most striking discoveries of experimental philosophers concern the extent of our own personal inconsistencies . . . how we respond to the trolley problem is affected by the details of ...

Source: Nan-Hung Hsieh on Nan-Hung Hsieh

Link: GNU MCSim tutorial 3 - Markov Chain Monte Marlo Calibration

<iframe src="/slide/190523_tutorial.html#1" width="672" height="400px"></iframe> ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: (from Yair): What Happened in the 2018 Election

Yair writes: Immediately following the 2018 election, we published an analysis of demographic voting patterns, showing our best estimates of what happened in the election and putting it into context compared to 2016 and 2014. . . . Since then, we’ve collected much more data — precinct results from more states and, importantly, individual-level vote history records ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Going beyond the rainbow color scheme for statistical graphics

Yesterday in our discussion of easy ways to improve your graphs, a commenter wrote: I recently read and enjoyed several articles about alternatives to the rainbow color palette. I particularly like the sections where they show how each color scheme looks under different forms of color-blindness and/or in black and white. Here’s a couple of ...

Source: Homepage on Yihui Xie | 谢益辉

Link: From RTFM to ITFM (Improve The Fine Manual)

<p>One TinyTeX user complained that it was not obvious in the manual to him that he needed <code>tinytex::install_tinytex()</code> to install TinyTeX. Then my <a href="https://yihui.name/en/2018/07/help-answer-questions/">biggest helper</a> Christophe Dervieux (@cderv) replied him <a href="https://github.com/yihui/tinytex/issues/103#issuecomment-493793875">in the Github issue</a> and told him this instruction was in the first code block in the first section of the TinyTeX documentation. This user felt the message in the reply was like:</p> <blockquote> <p>It’s there, dummy, so why ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Neural nets vs. regression models

Eliot Johnson writes: I have a question concerning papers comparing two broad domains of modeling: neural nets and statistical models. Both terms are catch-alls, within each of which there are, quite obviously, multiple subdomains. For instance, NNs could include ML, DL, AI, and so on. While statistical models should include panel data, time series, hierarchical ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: What are some common but easily avoidable graphical mistakes?

John Kastellec writes: I was thinking about writing a short paper aimed at getting political scientists to not make some common but easily avoidable graphical mistakes. I’ve come up with the following list of such mistakes. I was just wondering if any others immediately came to mind? – Label lines directly – Make labels big ...

Source: Homepage on Liechi | 張列弛

Link: 为人但有真性情

王瑶先生去世时，北大中文系撰联纪念，云： 魏晋风度为人但有真性情， 五四精神传世岂无好文章。 上联说的是王先生的为人，下联说的是王先生的为文。王先 ...

Source: Simply Statistics

Link: I co-founded a company! Meet Problem Forward Data Science

<p>I have some exciting news about something I’ve been working on for the last year or so. I started a company! It’s called <a href="https://www.problemforward.com/">Problem Forward</a> data science. I’m pumped about this new startup for a lot of reasons.</p> <ul> <li>My co-founder is one of my families closest friends, Jamie McGovern, who has more than 2 decades of experience in the consulting world and who I’ve known for 15 years.</li> <li>We are creating a cool new model of “data scientist as a service” (more on that below)</li> <li>We have a <a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Abortion attitudes: The polarization is among richer, more educated whites

Abortion has been in the news lately. A journalist asked me something about abortion attitudes and I pointed to a post from a few years ago about partisan polarization on abortion. Also this with John Sides on why abortion consensus is unlikely. That was back in 2009, and consensus doesn’t seem any more likely today. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Alternatives and reality

I saw this cartoon from Randall Munroe, and it reminded me of something I wrote awhile ago. The quick story is that I don’t think the alternative histories within alternative histories are completely arbitrary. It seems to me that there’s a common theme in the best alternative history stories, a recognition that our world is ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: My talks at the University of Chicago this Thursday and Friday

Political Economy Workshop (12:30pm, Thurs 23 May 2019, Room 1022 of Harris Public Policy (Keller Center) 1307 E 60th Street): Political Science and the Replication Crisis We’ve heard a lot about the replication crisis in science (silly studies about ESP, evolutionary psychology, miraculous life hacks, etc.), how it happened (p-values, forking paths), and proposed remedies ...

Source: Econometrics and Free Software

Link: The never-ending editor war (?)

<div style="text-align:center;"> <p><a href="https://en.wikipedia.org/wiki/Death_mask"> <img src="./img/typical_emacs_user.gif" title = "typical emacs user working"></a></p> </div> <p>The creation of this blog post was prompted by this tweet, asking an age-old question:</p> <blockquote class="twitter-tweet"><p lang="und" dir="ltr"><a href="https://twitter.com/spacemacs?ref_src=twsrc%5Etfw">@spacemacs</a></p>— Bruno Rodrigues (@brodriguesco) <a href="https://twitter.com/brodriguesco/status/1128981852558123008?ref_src=twsrc%5Etfw">May 16, 2019</a></blockquote> <script async ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Vigorous data-handling tied to publication in top journals among public heath researchers

Gur Huberman points us to this news article by Nicholas Bakalar, “Vigorous Exercise Tied to Macular Degeneration in Men,” which begins: A new study suggests that vigorous physical activity may increase the risk for vision loss, a finding that has surprised and puzzled researchers. Using questionnaires, Korean researchers evaluated physical activity among 211,960 men and ...

Source: Econometrics and Free Software

Link: For posterity: install {xml2} on GNU/Linux distros

<div style="text-align:center;"> <p><a href="https://en.wikipedia.org/wiki/Death_mask"> <img src="./img/napoleon_death_mask.jpg" title = "I will probably be the only reader of this blog post"></a></p> </div> <p>Today I’ve removed my system’s R package and installed MRO instead. While re-installing all packages, I’ve encountered one of the most frustrating error message for someone installing packages from source:</p> <pre><code>Error : /tmp/Rtmpw60aCp/R.INSTALL7819efef27e/xml2/man/read_xml.Rd:47: unable to load shared object '/usr/lib64/R/library/xml2/libs/xml2.so': libicui18n.so.58: ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Data quality is a thing.

I just happened to come across this story, where a journalist took some garbled data and spun a false tale which then got spread without question. It’s a problem. First, it’s a problem that people will repeat unjustified claims, also a problem that when data are attached, you can get complete credulity, even for claims ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Hey, people are doing the multiverse!

Elio Campitelli writes: I’ve just saw this image in a paper discussing the weight of evidence for a “hiatus” in the global warming signal and immediately thought of the garden of forking paths. From the paper: Tree representation of choices to represent and test pause-periods. The ‘pause’ is defined as either no-trend or a slow-trend. ...

Source: Rob J Hyndman

Link: You are what you vote

I’ve tried my hand at writing for the wider public with an article for The Conversation based on my paper with Di Cook and Jeremy Forbes on “Spatial modelling of the two-party preferred vote in Australian federal elections: 2001-2016”. With the next Australian election taking place tomorrow, we thought it was timely to put out a publicly accessible version of our analysis.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/P31paAuoHVc" height="1" width="1" ...

Source: ewen

Link: Presenting the Does This Spark Joy mix

Dropping a new ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Did Jon Stewart elect Donald Trump?”

I wrote this post a couple weeks ago and scheduled it for October, but then I learned from a reporter that the research article under discussion was retracted, so it seemed to make sense to post this right away while it was still newsworthy. My original post is below, followed by a post script regarding ...

Source: Blog on rOpenSci - open tools for open science

Link: rOpenSci Dev Guide 0.2.0: Updates Inside and Out

As announced in our recent post about updates to our Software Peer Review system, all our package development, review and maintenance is available as an online book. Our goal is to update it approximately quarterly so it’s already time to present its second official version! You can read the changelog or this blog post to find out what’s new in our dev guide 0.2.0! A more legit and accessible book Let’s start with very exciting news, the dev guide now has a cover, designed by Oz Locke from Locke ...

Source: Statistical Modeling, Causal Inference, and Social Science

Tom Daula writes: I think this story from John Cook is a different perspective on replication and how scientists respond to errors. In particular the final paragraph: There’s a perennial debate over whether it is best to make security and privacy flaws public or to suppress them. The consensus, as much as there is a ...

Source: Nan-Hung Hsieh on Nan-Hung Hsieh

Link: GNU MCSim tutorial 2 - Uncertainty and sensitivity analysis

<iframe src="/slide/190425_tutorial.html#1" width="672" height="400px"></iframe> ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “MRP is the Carmelo Anthony of election forecasting methods”? So we’re doing trash talking now??

What’s the deal with Nate Silver calling MRP “the Carmelo Anthony of forecasting methods”? Someone sent this to me: and I was like, wtf? I don’t say wtf very often—at least, not on the blog—but this just seemed weird. For one thing, Nate and I did a project together once using MRP: this was our ...

Source: Blog on rOpenSci - open tools for open science

Link: POWER to the People

NASA generates and provides heaps of data to the scientific community. Not all of it is looking out at the stars. Some of it is looking back at us here on Earth. NASA’s Earth science program observes, understands and models the Earth system1. We can use these data to discover how our Earth is changing, to better predict change, and to understand the consequences for life on Earth. The Earth science program includes the Prediction of Worldwide Energy Resource (POWER) project, which was initiated to improve upon the current renewable energy data set and to create new data sets from new ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Horse-and-buggy era officially ends for survey research

Peter Enns writes: Given the various comments on your blog about evolving survey methods (e.g., Of buggy whips and moral hazards; or, Sympathy for the Aapor), I thought you might be interested that the Roper Center has updated its acquisitions policy and is now accepting non-probability samples and other methods. This is an exciting move ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Scandal! Mister P appears in British tabloid.

Tim Morris points us to this news article: And here’s the kicker: Mister P. Not quite as cool as the time I was mentioned in Private Eye, but it’s still pretty satisfying. My next goal: Getting a mention in Sports Illustrated. (More on this soon.) In all seriousness, it’s so cool when methods that my ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: When we had fascism in the United States

I was reading this horrifying and hilarious story by Colson Whitehead, along with an excellent article by Adam Gopnik in the New Yorker (I posted a nitpick on it a couple days ago) on the Reconstruction and post-Reconstruction era in the United States, and I was suddenly reminded of something. In one of the political ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Name this fallacy!

It’s the fallacy of thinking that, just cos you’re good at something, that everyone should be good at it, and if they’re not, they’re just being stubborn and doing it badly on purpose. I thought about this when reading this line from Adam Gopnik in the New Yorker: [Henry Louis] Gates is one of the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Did blind orchestra auditions really benefit women?

You’re blind! And you can’t see You need to wear some glasses Like D.M.C. Someone pointed me to this post, “Orchestrating false beliefs about gender discrimination,” by Jonatan Pallesen criticizing a famous paper from 2000, “Orchestrating Impartiality: The Impact of ‘Blind’ Auditions on Female Musicians,” by Claudia Goldin and Cecilia Rouse. We’ve all heard the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Maintenance cost is quadratic in the number of features

Bob Carpenter shares this story illustrating the challenges of software maintenance. Here’s Bob: This started with the maintenance of upgrading to the new Boost version 1.69, which is this pull request: https://github.com/stan-dev/math/pull/1082 for this issue: https://github.com/stan-dev/math/issues/1081 The issue happens first, then the pull request, then the fun of debugging starts. Today’s story starts an issue ...

Source: Blog on rOpenSci - open tools for open science

Link: Open Trade Statistics

Introduction Open Trade Statistics (OTS) was created with the intention to lower the barrier to working with international economic trade data. It includes a public API, a dashboard, and an R package for data retrieval. The project started when I was affected by the fact that many Latin American Universities have limited or no access to the United Nations Commodity Trade Statistics Database (UN COMTRADE). There are alternatives to COMTRADE, for example the Base Pour L’Analyse du Commerce International (BACI) constitutes an improvement over COMTRADE as it is constructed using the raw ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: That illusion where you think the other side is united and your side is diverse

Lots of people have written about this illusion of perspective: The people close to you look to be filled with individuality and diversity, while the people way over there in the other corner of the room all look kind of alike. But widespread knowledge of this illusion does not stop people from succumbing from it. ...

Source: Homepage on Liechi | 張列弛

Link: 进化的事情-之一

之前写过三篇关于进化的博文，简要地介绍了进化论的发展历史，现在打算再写一篇关于进化的文章。首先让我犯难的是这篇文章叫什么名字呢？我大概知道自 ...

Source: L. Collado-Torres

Link: recount-brain: a curated repository of human brain RNA-seq datasets metadata

<!-- More detail can easily be written here using *Markdown* and $\rm \LaTeX$ math code. ...

Source: Nan-Hung Hsieh on Nan-Hung Hsieh

<iframe src="/slide/190508_talk.html#1" width="672" height="400px"></iframe> ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Gremlin time: “distant future, faraway lands, and remote probabilities”

Chris Wilson writes: It appears that Richard Tol is still publishing these data, only now fitting a piecewise linear function to the same data-points. https://academic.oup.com/reep/article/12/1/4/4804315#110883819 Also still looks like counting 0 as positive, “Moreover, the 11 estimates for warming of 2.5°C indicate that researchers disagree on the sign of the net impact: 3 estimates are ...

Source: The R-Podcast

Link: Episode 30: The origins and future of RStudio with Tareef Kawaf

<ul> <li>RStudio 1.2 release highlights: <a href="https://blog.rstudio.com/2019/04/30/rstudio-1-2-release/">blog.rstudio.com/2019/04/30/rstudio-1-2-release</a></li> <li>Tareef’s opening keynote at <code>rstudio::conf</code> 2019: <a href="https://resources.rstudio.com/rstudio-conf-2019/opening-keynote-tareef-kawaf">Welcome and RStudio Vision</a></li> <li>Reproducible Environments: <a href="https://environments.rstudio.com/">environments.rstudio.com</a></li> <li>RStudio Community: <a href="https://community.rstudio.com/">community.rstudio.com</a></li> <li>RStudio Cloud (currently in ...

Source: Homepage on Liechi | 張列弛

Link: 忆一位老中医

“老中医”之“老”，热衷于中医的人，用其表示医生经验丰富，医术高明；不信中医的人，用其表示“医生”装模作样，骗术高超。我称记忆中的这位中医医 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The Arkansas paradox

Palko writes: I had a recent conversation with a friend back in Arkansas who gives me regular updates of the state and local news. A few days ago he told me about a poll that was getting a fair amount of coverage. (See also here, for example.) The poll showed that a number of progressive ...

Source: The R-Podcast

Link: Tareef Kawaf

<p>Tareef Kawaf is a software executive with over twenty five years of experience in building product teams at early stage startups. He has led the development of products in the e-commerce, online video, and open source statistical analysis spaces. At RStudio he has helped create an organization that uses the open core model to build a sustainable business that contributes more than 50% of its engineering to free and open source software. Tareef is passionate about RStudio’s vision of the positive impact that free and open source software can have on transforming decision making around ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Difference-in-difference estimators are a special case of lagged regression

Fan Li and Peng Ding write: Difference-in-differences is a widely-used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trend, which is scale dependent and may be questionable in some applications. A common alternative method is a regression model that adjusts for the lagged dependent ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Stan examples in Harezlak, Ruppert and Wand (2018) Semiparametric Regression with R

I saw earlier drafts of this when it was in preparation and they were great. Jarek Harezlak, David Ruppert and Matt P. Wand. 2018. Semiparametric Regression with R. UseR! Series. Springer. I particularly like the careful evaluation of variational approaches. I also very much like that it’s packed with visualizations and largely based on worked ...

Source: Homepage on Liechi | 張列弛

Link: 沃森说

在拍摄纪录片《美国大师-解密沃森》时，沃森被问到自己关于黑人不如白人的看法是否有改变，他答到： No, not at all. I would like for them to have changed, that there be new knowledge that says that your nurture ...

Source: Statistical Modeling, Causal Inference, and Social Science

A linguist pointed me with incredulity to this article by Horst Feldmann, “Do Linguistic Structures Affect Human Capital? The Case of Pronoun Drop,” which begins: This paper empirically studies the human capital effects of grammatical rules that permit speakers to drop a personal pronoun when used as a subject of a sentence. By de‐emphasizing the ...

Source: Econometrics and Free Software

Link: Fast food, causality and R packages, part 2

<div style="text-align:center;"> <p><a href="https://en.wikipedia.org/wiki/Joke"> <img src="./img/distracted_economist.jpg" title = "Soon, humanity will only communicate in memes"></a></p> </div> <p>I am currently working on a package for the R programming language; its initial goal was to simply distribute the data used in the Card and Krueger 1994 paper that you can read <a href="http://davidcard.berkeley.edu/papers/njmin-aer.pdf">here</a> (PDF warning). However, I decided that I would add code to perform diff-in-diff.</p> <p>In my <a href="https://www.brodrigues.co/blog/2019-04-28-diffindif ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: 13 Reasons not to trust that claim that 13 Reasons Why increased youth suicide rates

A journalist writes: My eye was caught by this very popular story that broke yesterday — about a study that purported to find a 30 percent (!) increase in suicides, in kids 10-17, in the MONTH after a controversial show about suicide aired. And that increase apparently persisted for the rest of the year. It’s ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Post-Hoc Power PubPeer Dumpster Fire

We’ve discussed this one before (original, polite response here; later response, after months of frustration, here), but it keeps on coming. Latest version is this disaster of a paper which got shredded by a zillion commenters on PubPeer. There’s lots of incompetent stuff out there in the literature—that’s the way things go; statistics is hard—but, ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: clustering scATACseq data: the TF-IDF way

<p>scATACseq data are very sparse. It is sparser than scRNAseq. To do clustering of scATACseq data, there are some preprocessing steps need to be done.</p> <p>I want to reproduce what has been done after reading the method section of these two recent scATACseq paper:</p> <ol style="list-style-type: decimal"> <li><a href="https://www.ncbi.nlm.nih.gov/pubmed/30078704">A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility</a> Darren et.al Cell 2018</li> </ol> <ul> <li>Latent Semantic Indexing Cluster Analysis</li> </ul> <p>In order to get an initial sense of the relationship between ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Olivia Goldhill and Jesse Singal report on the Implicit Association Test

A psychology researcher whom I don’t know writes: In case you aren’t already aware of it, here is a rather lengthy article pointing out challenges to the Implicit Association Test. What I found disturbing was this paragraph: Greenwald explicitly discouraged me from writing this article. ‘Debates about scientific interpretation belong in scientific journals, not popular ...

Source: Homepage on Liechi | 張列弛

Link: 彷徨于明暗之间的影

寂寞新文苑，平安旧战场。 两间余一卒，荷戟独彷徨。 这是鲁迅题在送给山县初男的《彷徨》上的一首诗，从这首诗上，能模糊地窥探到鲁迅写《彷徨》时的心 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A thought on Bayesian workflow: calculating a likelihood ratio for data compared to peak likelihood.

Daniel Lakeland writes: Ok, so it’s really deep into the comments and I’m guessing there’s a chance you will miss it so I wanted to point at my comments here and here. In particular, the second one, which suggests something that it might be useful to recommend for Bayesian workflows: calculating a likelihood ratio for ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “One should always beat a dead horse because the horse is never really dead”

Paul Alper came up with the above aphorism after reading this news article by Charles Ornstein and Katie Thomas, which goes as follows: What These Medical Journals Don’t Reveal: Top Doctors’ Ties to Industry One is dean of Yale’s medical school. Another is the director of a cancer center in Texas. A third is the ...

Source: Homepage on Joseph Stachelek

Link: What is the scope of NEON lakes data?

<blockquote> <p>Detailed setup files and runnable R scripts for reproducing this blog post are at: <a href="https://github.com/jsta/neon_lakes">https://github.com/jsta/neon_lakes</a></p> </blockquote> <p>The NEON documentation lists lakes among their targeted ecosystems. However, NEON provides so much data it can be difficult to get a grasp of their lake data apart from data collected in other ecosystems. Here, I show the scope of NEON lake data and how to get scripted access to it. In particular, I attempt to answer the question: <em><strong>How much lake data is available through NEON and ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Boosting intelligence analysts’ judgment accuracy: What works, what fails?”

Kevin Lewis points us to this research article by David Mandel, Christopher Karvetski, and Mandeep Dhami, which begins: A routine part of intelligence analysis is judging the probability of alternative hypotheses given available evidence. Intelligence organizations advise analysts to use intelligence-tradecraft methods such as Analysis of Competing Hypotheses (ACH) to improve judgment, but such methods ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Brakes

So. I noticed my rear brake wasn’t really doing anything. If I squeezed really hard, I could slow down, but not enough to stop going down a steep hill. No big deal—it’s the front brake that really matters, right?—but just for safety’s sake I went to the bike store one day and they replaced the ...

Source: Peng Zhao on Peng Zhao

Link: CARIBIC

NA ...

Source: Homepage on Yihui Xie | 谢益辉

Link: August 12, 1991 Review of S-PLUS Statistical Software

<p>Note: This is a guest post and not written by Yihui. Its author desires to remain anonymous. If anyone would like to contact the author, please contact me (Yihui). And if any statistical sleuths out there determine the author’s identity, please refrain from revealing it. Thank you.</p> <hr /> <p><strong>INTEROFFICE MEMORANDUM</strong></p> <p><strong>DATE:</strong> August 12, 1991</p> <p><strong>TO:</strong> □□□□□□□□□□□□□</p> <p><strong>FROM:</strong> □□□□□□□□□</p> <p><strong>SUBJECT:</strong> REVIEW OF S-PLUS STATISTICAL SOFTWARE</p> <p>S-PLUS, from StatSci Corp., is a new type of ...

Source: Homepage on Liechi | 張列弛

Link: R notes

This post is an R learning note for myself. It will not be a comprehensive summary on R learning, but highly adapted to my own needs. Learning by using is, I think, the best way to get new skills. I am not sure how widely this idea can be applied, but it works very well for learning programming. I started learning R two years ago, but I still feel I am a stranger to R, and R is a stranger to ...

Source: Homepage on Liechi | 張列弛

Link: Statistics notes

This post is for summarizing my statistics learning and using experience. I will update it occasionally. Data description Statistical inference Modelling ...

Source: Homepage on Liechi | 張列弛

Link: Statistics note

This post is for summarizing my statistics learning and using experience. I will update it ...

Source: test on Robin Lovelace's website. Energy. Transport. Technology. Change the World.

Link: Geocomputation with R: from inception to physical book

Lecture and book launch, part of Leeds Digital Festival This talk will introduce Geocomputation with R, a new book on R for geographic data that has just been published. Aimed at people new to the subject, I’ll provide some examples to show the power of programming with open source software for high-performance, reproducible mapping and geographic analyis. I will cover a bit of the history but focus on visualisation, which is key to using data to understand the ...

Source: Blog on rOpenSci - open tools for open science

Link: Relaunching the qualtRics package

rOpenSci is one of the first organizations in the R community I ever interacted with, when I participated in the 2016 rOpenSci unconf. I have since reviewed several rOpenSci packages and been so happy to be connected to this community, but I have never submitted or maintained a package myself. All that changed when I heard the call for a new maintainer for the qualtRics package. “IT’S GO TIME,” I ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: We shouldn’t’ve called it “Stan”; I should’ve listened to Bob and Hadley

Hadley told me that one reason he came up with the name ggplot was that it would be uniquely findable on Google. When we were writing Stan and I suggested naming it Stan, Bob pointed out the googling argument but I just loved the name Stan, I loved the Ulam connection and having this friendly ...

Source: Statistical Modeling, Causal Inference, and Social Science

A couple weeks ago, Uri Simonsohn and Joe Simmons sent me and others a note that they were writing a blog post citing some of our work and asking for us to point out anything that we find “inaccurate, unfair, snarky, misleading, or in want of a change for any reason.” I took a quick ...

Source: Homepage on Yihui Xie | 谢益辉

Link: Naming Software Packages with Common Words

<p>Andrew Gelman <a href="https://statmodeling.stat.columbia.edu/2019/04/29/we-shouldntve-called-it-stan-i-shouldve-listened-to-bob-and-hadley/">regretted the name “Stan”</a>, because it is a common word, which makes it hard to find relevant results on Google. Actually I was taught the same lesson by Hadley when I first created the <strong>knitr</strong> package in 2011. Fortunately I listened. In the very beginning, I simply named it “knit”, which is certainly not very “Googleable”. Appending “R” after it made it much easier and ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: plot 10x scATAC coverage by cluster/group

<p>This post was inspired by <a href="https://twitter.com/ahill_tweets">Andrew Hill</a>’s <a href="http://andrewjohnhill.com/blog/2019/04/12/streamlining-scatac-seq-visualization-and-analysis/">recent blog post</a>.</p> <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Inspired by some nice posts by <a href="https://twitter.com/timoast?ref_src=twsrc%5Etfw">@timoast</a> and <a href="https://twitter.com/tangming2005?ref_src=twsrc%5Etfw">@tangming2005</a> and work from <a href="https://twitter.com/10xGenomics?ref_src=twsrc%5Etfw">@10xGenomics</a>. Would still definitely have to split BAM ...

Source: Simply Statistics

Link: Generative and Analytical Models for Data Analysis

<p>Describing how a data analysis is created is a topic of keen interest to me and there are a few different ways to think about it. Two different ways of thinking about data analysis are what I call the “generative” approach and the “analytical” approach. Another, more informal, way that I like to think about these approaches is as the “biological” model and the “physician” model. Reading through the literature on the process of data analysis, I’ve noticed that many seem to focus on the former rather than the latter and I think that presents an opportunity for new and interesting work.</p> ...

Source: Statistical Modeling, Causal Inference, and Social Science

Last year we reported on an article by sociologist Steve Morgan, criticizing a published paper by political scientist Diana Mutz. A couple months later we updated with Mutz’s response to Morgan’s critique. Finally, Morgan has published a reply to Mutz’s response to Morgan’s comments on Mutz’s paper. Here’s a passage that is of methodological interest: ...

Source: Econometrics and Free Software

Link: Fast food, causality and R packages, part 1

<div style="text-align:center;"> <p><a href="https://en.wikipedia.org/wiki/Joke"> <img src="./img/distracted_economist.jpg" title = "Soon, humanity will only communicate in memes"></a></p> </div> <p>I am currently working on a package for the R programming language; its initial goal was to simply distribute the data used in the Card and Krueger 1994 paper that you can read <a href="http://davidcard.berkeley.edu/papers/njmin-aer.pdf">here</a> (PDF warning).</p> <p>The gist of the paper is to try to answer the following question: <em>Do increases in minimum wages reduce employment?</em> Accordin ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “How many years do we lose to the air we breathe?” Or not.

From this Washington Post article: But . . . wait a second. The University of Chicago’s Energy Policy Institute . . . what exactly is that? Let’s do a google, then we get to the relevant page. I’m concerned because this is the group that did this report, which featured this memorable graph: See this ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Automatic voter registration impact on state voter registration

Sean McElwee points us to this study by Kevin Morris and Peter Dunphy, who write: Automatic voter registration or AVR . . . features two seemingly small but transformative changes to how people register to vote: 1. Citizens who interact with government agencies like the Department of Motor Vehicles are registered to vote, unless they ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Conditioning on post-treatment variables when you expect self-selection

Sadish Dhakal writes: I am struggling with the problem of conditioning on post-treatment variables. I was hoping you could provide some guidance. Note that I have repeated cross sections, NOT panel data. Here is the problem simplified: There are two programs. A policy introduced some changes in one of the programs, which I call the ...

Source: blog.sellorm.com

Link: Download, build and install R from source

A new version of the R source was released today and so, as is customary, I download and install it on my personal Linux servers. My main server runs Ubuntu and the other run CentOS. To download, build and install R I use the below script. It relies on you having ‘sudo’ enabled for your account as well as already having the build dependencies installed (see this post from RStudio for more ...

Source: Homepage on Liechi | 張列弛

Link: NgAgo 翻案了吗？

为了避免标题误导人，先回答再解释：没有翻。 最近看到好几篇这样的文章，比如翻案了？当年满城风雨的韩春雨NgAgo基因编辑技术或许真的有效，或者 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Incentives to Learn”: How to interpret this estimate of a varying treatment effect?

Germán Jeremias Reyes writes: I am currently taking a course on Applied Econometrics and would like to ask you about how you would interpret a particular piece of evidence. Some background: In 2009, Michael Kremer et al. published an article called “Incentives to learn.” This is from the abstract (emphasis is mine): We study a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: What’s the upshot?

Yair points us to this page, The Upshot, Five Years In, by the New York Times data journalism team, listing their “favorite, most-read or most distinct work since 2014.” And some of these are based on our research: There Are More White Voters Than People Think. That’s Good News for Trump. (Story by Nate Cohn. ...

Source: Homepage on Liechi | 張列弛

Link: 一个有意思的研究

我对语言的进化感兴趣，对利用研究生物进化的方法来研究语言感兴趣。在前两天的一篇博文里，我提到可以把词汇看做语言的“基因”，研究它们如何进化。 ...

Source: Nan-Hung Hsieh on Nan-Hung Hsieh

Link: GNU MCSim tutorial 1 - Walk-through of working models

<iframe src="/slide/190425_tutorial.html#1" width="672" height="400px"></iframe> ...

Source: Rob J Hyndman

Link: Spatial modelling of the two-party preferred vote in Australian federal elections: 2001-2016

We examine the relationships between electoral socio-demographic characteristics and two-party preference in the six Australian federal elections held between 2001 to 2016. Socio-demographic information is derived from the Australian Census, which occurs every five years. Since a Census is not directly available for each election, spatio-temporal imputation is employed to estimate Census data for the electorates at the time of each election. This accounts for both spatial and temporal changes in electoral characteristics between Censuses.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHy ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Postdoctoral position in Vancouver! Using Stan! Working on wine! For reals.

Lizzie Wolkovich writes that she is hiring someone to help build Stan models for winegrapes. Here’s the ad: Postdoctoral Fellow in Winegrape Research—University of British Columbia The Temporal Ecology Lab is looking for a bright, motivated and collaborative researcher to join the lab and develop new winegrape models using Stan (mc-stan.org). The project combines decades ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Ballot order update

Darren Grant writes: Thanks for bringing my work on ballot order effects to the attention of a wider audience via your recent blog post. The final paper, slightly modified from the version you posted, was published last year in Public Choice. Like you, I am not wedded to traditional hypothesis testing, but think it is ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Several post-doc positions in probabilistic programming etc. in Finland

There are several open post-doc positions in Aalto and University of Helsinki in 1. probabilistic programming, 2. simulator-based inference, 3. data-efficient deep learning, 4. privacy preserving and secure methods, 5. interactive AI. All these research programs are connected and collaborating. I (Aki) am the coordinator for the project 1 and contributor in the others. Overall ...

Source: L. Collado-Torres

Link: recount-brain: a curated repository of human brain RNA-seq datasets metadata

<!-- More detail can easily be written here using *Markdown* and $\rm \LaTeX$ math code. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Appendix: Why we are publishing this here instead of as a letter to the editor in the journal”

David Allison points us to this letter he wrote with Cynthia Kroeger and Andrew Brown: Unsubstantiated conclusions in randomized controlled trial of binge eating program due to Differences in Nominal Significance (DINS) Error Cachelin et al. tested the effects of a culturally adapted, Cognitive Behavioral Therapy-based, guided self-help (CBTgsh) intervention on binge eating reduction . ...

Source: Homepage on Liechi | 張列弛

Link: 惜别以后

鲁迅先生写过一篇文章，纪念自己在日本仙台医学专门学校学医时的老师藤野严九郎。这篇文章的名字就叫《藤野先生》。 ...

Source: Rob J Hyndman

Link: Translations of "Forecasting: principles and practice"

There are now translations of my forecasting textbook (coauthored with George Athanasopoulos) into Chinese and Korean. The Chinese translation was produced by a team led by Professor Yanfei Kang (Beihang University) and Professor Feng Li (Central University of Finance and Economics). The following students were also involved: Cheng Fan, Liu Yu, Long Xiaoyu, Wang Xiaoqian, Zeng Jiayue, Zhang Bohan, and Zhu Shuaidong. The Korean translation was produced by Dr Daniel Young Ho Kim.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/4gsq9tEgo6s" height="1" width="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: R-squared for multilevel models

Brandon Sherman writes: I just was just having a discussion with someone about multilevel models, and the following topic came up. Imagine we’re building a multilevel model to predict SAT scores using many students. First we fit a model on students only, then students in classrooms, then students in classrooms within district, the previous case ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Why “statistical significance” doesn’t work: An example.

Reading some of the back-and-forth in this thread, it struck me that some of the discussion was about data, some was about models, some was about underlying reality, but none of the discussion was driven by statements that this or that pattern in data was “statistically significant.” Here’s the problem with “statistical significance” as I ...

Source: Rob J Hyndman

Link: Revealing high-frequency trading provision of liquidity with visualization

Liquidity is crucial for successful financial markets. It ensures that all investors are able to buy and sell assets quickly at a fair price. High Frequency Traders (HFTs) utilize sophisticated algorithms operating with extreme speed and are frequently cited as liquidity providers. The objective of this paper is to investigate the liquidity provision of a number of HFTs to determine their effects on aggregate marketplace liquidity. We consider a large data set collected from the Australian Securities Exchange throughout 2013, providing a near complete picture of all trading activity.<img ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Wanted: Statistical success stories

Bill Harris writes: Sometime when you get a free moment, it might be great to publish a post that links to good, current exemplars of analyses. There’s a current discussion about RCTs on a program evaluation mailing list I monitor. I posted links to your power=0.06 post and your Type S and Type M post, ...

Source: Pat's blog (data science)

Link: Using ccache to speed up R package checks on Travis CI

<div id="TOC"> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#the-problem">The problem</a><ul> <li><a href="#the-mlr-use-case">The mlr use case</a></li> </ul></li> <li><a href="#the-solution">The solution</a></li> <li><a href="#code">Code</a></li> </ul> </div> <div id="introduction" class="section level1"> <h1>Introduction</h1> <p>Continuous integration checking for R packages is usually done on <a href="https://travis-ci.org/">Travis CI</a> because the R community has established a community driven <a href="https://github.com/travis-ci/travis-build/blob/master/lib/travi ...

Source: Statistical Modeling, Causal Inference, and Social Science

Hans van Maanen writes: Mag ik je weer een statistische vraag voorleggen? If I ask my frequentist statistician for a 95%-confidence interval, I can be 95% sure that the true value will be in the interval she just gave me. My visualisation is that she filled a bowl with 100 intervals, 95 of which do ...

Source: Homepage on Liechi | 張列弛

Link: “化石”词源考笔记

翻看矢島道子《化石の記憶―古生物学の歴史をさかのぼる》的时候，读到里面有一段关于“化石”一词起源的话，我将其大意翻译如下： ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Claims about excess road deaths on “4/20” don’t add up

Sam Harper writes: Since you’ve written about similar papers (that recent NRA study in NEJM, the birthday analysis) before and we linked to a few of your posts, I thought you might be interested in this recent blog post we wrote about a similar kind of study claiming that fatal motor vehicle crashes increase by 12% after 4:20pm ...

Source: Homepage on Joseph Stachelek

Link: Are Google Earth Engine analyses reproducible?

<blockquote> <p>Detailed setup files and runnable python scripts for reproducing this blog post are at: <a href="https://github.com/jsta/earthengine">https://github.com/jsta/earthengine</a></p> </blockquote> <p>More and more research papers are making use of Google Earth Engine (EE) to do geocomputation with gridded data and satellite (remote sensing) output. Are these analyses reproducible? Will they be reproducible in 2-3 years? In the following blog post, I explore these questions and conclude that:</p> <ul> <li><p>If a paper uses EE to simply pull/crop/extract data the answer is likely ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A question about the piranha problem as it applies to A/B testing

Wicaksono Wijono writes: While listening to your seminar about the piranha problem a couple weeks back, I kept thinking about a similar work situation but in the opposite direction. I’d be extremely grateful if you share your thoughts. So the piranha problem is stated as “There can be some large and predictable effects on behavior, ...

Source: Nan-Hung Hsieh on Nan-Hung Hsieh

Link: GNU MCSim tutorial 0 - Introductory

<iframe src="/slide/190418_tutorial.html#1" width="672" height="400px"></iframe> ...

Source: Blog on rOpenSci - open tools for open science

Link: When Standards Go Wild - Software Review for a Manuscript

Stefanie Butland, rOpenSci Community Manager Some things are just irresistible to a community manager – PhD student Hugo Gruson’s recent tweets definitely fall into that category. I was surprised and intrigued to see an example of our software peer review guidelines being used in a manuscript review, independent of our formal collaboration with the journal Methods in Ecology and Evolution (MEE). This is exactly the kind of thing rOpenSci is working to enable by developing a good set of practices that broadly apply to research ...

Source: Statistical Modeling, Causal Inference, and Social Science

4:10pm Monday, April 22 in Social Work Bldg room 903: Data is getting weirder. Statistical models and techniques are more complex than they have ever been. No one understand what code does. But at the same time, statistical tools are being used by a wider range of people than at any time in the past. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Lessons about statistics and research methods from that racial attitudes example

Yesterday we shared some discussions of recent survey results on racial attitudes. For students and teachers of statistics or research methods, I think the key takeaway should be that you don’t want to pull out just one number from a survey; you want to get the big picture by looking at multiple questions, multiple years, ...

Source: Simply Statistics

Link: Tukey, Design Thinking, and Better Questions

<p>Roughly once a year, I read John Tukey’s paper <a href="https://projecteuclid.org/euclid.aoms/1177704711">“The Future of Data Analysis”</a>, originally published in 1962 in the <em>Annals of Mathematical Statistics</em>. I’ve been doing this for the past 17 years, each time hoping to really understand what it was he was talking about. Thankfully, each time I read it I seem to get <em>something</em> new out of it. For example, in 2017 I wrote <a href="https://youtu.be/qFtJaq4TlqE">a whole talk</a> around some of the basic ideas.</p> <p>Well, it’s that time of year again, and I’ve been doing ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Changing racial differences in attitudes on changing racial differences

Elin Waring writes: Have you been following the release of GSS results this year? I had been vaguely aware that there was reporting on a few items but then I happened to run the natrace and natracey variables (I use these in my class to look at question wording), they are from the are we ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Parliamentary Constituency Factsheet for Indicators of Nutrition, Health and Development in India

S. V. Subramanian writes: In India, data on key developmental indicators that formulate policies and interventions are routinely available for the administrative units of districts but not for the political units of Parliamentary Constituencies (PC). Members of Parliament (MPs) in the Lok Sabha, each representing 543 PCs as per the 2014 India map, are the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Abandoning statistical significance is both sensible and practical

Valentin Amrhein, Sander Greenland, Blakeley McShane, and I write: Dr Ioannidis writes against our proposals [here and here] to abandon statistical significance in scientific reasoning and publication, as endorsed in the editorial of a recent special issue of an American Statistical Association journal devoted to moving to a “post ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The network of models and Bayesian workflow, related to generative grammar for statistical models

Ben Holmes writes: I’m a machine learning guy working in fraud prevention, and a member of some biostatistics and clinical statistics research groups at Wright State University in Dayton, Ohio. I just heard your talk “Theoretical Statistics is the Theory of Applied Statistics” on YouTube, and was extremely interested in the idea of a model-space ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: State-space models in Stan

Michael Ziedalski writes: For the past few months I have been delving into Bayesian statistics and have (without hyperbole) finally found statistics intuitive and exciting. Recently I have gone into Bayesian time series methods; however, I have found no libraries to use that can implement those models. Happily, I found Stan because it seemed among ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: All statistical conclusions require assumptions.

Mark Palko points us to this 2009 article by Itzhak Gilboa, Andrew Postlewaite, and David Schmeidler, which begins: This note argues that, under some circumstances, it is more rational not to behave in accordance with a Bayesian prior than to do so. The starting point is that in the absence of information, choosing a prior ...

Source: Homepage on Liechi | 張列弛

Link: 几个不相关的故事

今天看《野草》，看到《颓败线的颤动》这篇。 这是一篇我小时候没有看明白的：不懂得为何“女儿”被开门声惊醒后，“母亲”会惊惶；也不明白为何长大后 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Works of art that are about themselves

I watched Citizen Kane (for the umpteenth time) the other day and was again struck by how it is a movie about itself. Kane is William Randolph Hearst, but he’s also Orson Welles, boy wonder, and the movie Citizen Kane is self-consciously a masterpiece. Some other examples of movies that are about themselves are La ...

Source: Homepage on Liechi | 張列弛

Link: 想起霍金

最近科学家们公布了黑洞的照片，这是人类第一次直接观察到了黑洞。 我盯着这照片看了一会儿，没有看出什么东西来，只是觉得有点像甜甜圈，然后脑子里突 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Active learning and decision making with varying treatment effects!

In a new paper, Iiris Sundin, Peter Schulam, Eero Siivola, Aki Vehtari, Suchi Saria, and Samuel Kaski write: Machine learning can help personalized decision support by learning models to predict individual treatment effects (ITE). This work studies the reliability of prediction-based decision-making in a task of deciding which action a to take for a target ...

Source: Statistical Modeling, Causal Inference, and Social Science

A few months ago I sent the following message to some people: Dear philosophically-inclined colleagues: I’d like to organize an online discussion of Deborah Mayo’s new book. The table of contents and some of the book are here at Google books, also in the attached pdf and in this post by Mayo. I think that ...

Source: Statistical Modeling, Causal Inference, and Social Science

Don MacLeod writes: Perhaps you know this study which is being taken at face value in all the secondary reports: “Air pollution causes ‘huge’ reduction in intelligence, study reveals.” It’s surely alarming, but the reported effect of air pollution seems implausibly large, so it’s hard to be convinced of it by a correlational study alone, ...

Source: L. Collado-Torres on L. Collado-Torres

Link: The evolution of my academic career as seen through posters and talks thanks to hugo academic 4.1

<p>The <a href="https://github.com/gcushen/hugo-academic"><code>hugo-academic</code></a> theme which powers my website is active and frequently updated. I don’t update my website that frequently anymore, but I recently found about many of their changes when I made the <a href="https://comunidadbioinfo.github.io/">CDSB website</a>.</p> <blockquote class="twitter-tweet"><p lang="es" dir="ltr">We are delighted to share with you our new webpage at <a href="https://t.co/rNuiRlNixV">https://t.co/rNuiRlNixV</a> with both English and Spanish support<br><br>Estamos encantados de compartirles nuestra ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Prestigious journal publishes sexy selfie study

Stephen Oliver writes: Not really worth blogging about and a likely candidate for multiverse analysis, but the beginning of the first sentence in the 2nd paragraph made me laugh: In the study – published in prestigious journal PNAS . . . The researchers get extra points for this quote from the press release: The researchers ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: What is the most important real-world data processing tip you’d like to share with others?

This question was in today’s jitts for our communication class. Here are some responses: Invest the time to learn data manipulation tools well (e.g. tidyverse). Increased familiarity with these tools often leads to greater time savings and less frustration in future. Hmm it’s never one tip.. I never ever found it useful to begin writing ...

Source: Blog on rOpenSci - open tools for open science

Link: Community Call - Security for R

“Security” can be a daunting, scary, and (frankly) quite often a very boring topic. BUT!, we promise that this Community Call on May 7th will be informative, engaging, and enlightening (or, at least not boring)! Applying security best practices is essential not only for developers or sensitive data storage but also for the everyday R user installing R packages, contributing to open source, working with APIs or remote servers. However, keeping up-to-date with security best practices and applying them meticulously requires significant effort and is difficult without expert ...

Source: Statistical Modeling, Causal Inference, and Social Science

Chuck Jackson points to two items of possible interest: Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions, by Richard Harris. Review here by Leonard Freedman. Retractions do not work very well, by Ken Cor and Gaurav Sood. This post by Tyler Cowen brought this paper to my attention. Here’s a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Emile Bravo and agency

I was reading Tome 4 of the adventures of Jules (see the last item here), and it struck me how much agency the characters had. They seemed to be making their own decisions, saying what they wanted to say, etc. Just as a contrast, I’m also reading an old John Le Carre book, and here ...

Source: Econometrics and Free Software

Link: Historical newspaper scraping with {tesseract} and R

<div style="text-align:center;"> <p><a href="https://en.wikipedia.org/wiki/Cliometrics"> <img src="./img/clio.jpg" title = "Historical newspapers as a source to practice cliometrics?"></a></p> </div> <p>I have been playing around with historical newspapers data for some months now. The “obvious” type of analysis to do is NLP, but there is also a lot of numerical data inside historical newspapers. For instance, you can find these tables that show the market prices of the day in the <em>L’Indépendance Luxembourgeoise</em>:</p> <p><img src="./img/market_price_table.png" /><!-- --></p> <p>I ...

Source: Pat's blog (data science)

Link: Emoji support for Notion.so on Linux

<p><a href="https://www.notion.so/">Notion.so</a> is a great tool for various tasks. I use it as a personal wiki but also for work-related notes.</p> <p>Unfortunately there is no native support for Linux and even though this point has been mentioned quite often by the community, the Notion team did not provide a Linux Desktop App yet. Maybe it will never be shipped.</p> <p>The Linux Desktop world can be evil when you have to make money selling applications. A lot of distributions with many different packaging standards and a small user base (compared to MacOS and Windows).</p> <blockquote ...

Source: Statistical Modeling, Causal Inference, and Social Science

1. An estimate of the geography of partisan prejudice My colleagues David Rothschild and Tobi Konitzer recently published this MRP analysis, “The Geography of Partisan Prejudice: A guide to the most—and least—politically open-minded counties in America,” written up by Amanda Ripley, Rekha Tenjarla, and Angela He. Ripley et al. write: In general, the most politically ...

Source: test on Robin Lovelace's website. Energy. Transport. Technology. Change the World.

Link: Keynote talk on geocomputation, SatRdays Newcastle

Keynote talk at SatRdays Newcastle Geocomputation with R: Reproducible Geo* workflows, from getting data to making maps This talk will introduce Geocomputation with R, a new book on R for geographic data. It will demonstrate how far R has evolved as an environment for geographic data analysis and visualisation, and provide a taster of what is in the book and, more importantly, what is possible when ‘data science’ and ‘GIS’ ...

Source: Statistical Modeling, Causal Inference, and Social Science

David Rea and Tony Burton write: The Heckman Curve describes the rate of return to public investments in human capital for the disadvantaged as rapidly diminishing with age. Investments early in the life course are characterised as providing significantly higher rates of return compared to investments targeted at young people and adults. This paper uses ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Treatment interactions can be hard to estimate from data.

Brendan Nyhan writes: Per #3 here, just want to make sure you saw the Coppock Leeper Mullinix paper indicating treatment effect heterogeneity is rare. My reply: I guess it depends on what is being studied. In the world of evolutionary psychology etc., interactions are typically claimed to be larger than main effects (for example, that ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: StanCon 2019: 20–23 August, Cambridge, UK

It’s official. This year’s StanCon is in Cambridge. For details, see StanCon 2019 Home Page What can you expect? There will be two days of tutorials at all levels and two days of invited and submitted talks. The previous three StanCons (NYC 2017, Asilomar 2018, Helsinki 2018) were wonderful experiences for both their content and ...

Source: Jay's Notes

Link: R Markdown in Vim

<div id="TOC"> <ul> <li><a href="#two-ways-to-render-r-markdown-documents">Two ways to render R Markdown documents</a><ul> <li><a href="#render-r-markdown-from-vim-without-opening-r">Render R Markdown from Vim (without opening R)</a></li> <li><a href="#render-r-markdown-and-send-email-from-r">Render R Markdown (and send email) from R</a></li> </ul></li> </ul> </div> <div id="two-ways-to-render-r-markdown-documents" class="section level2"> <h2>Two ways to render R Markdown documents</h2> <p>I saw this tweet a couple of days ago and decided to look for ways to use R Markdown more at ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Some Stan and Bayes short courses!

Robert Grant writes: I have a couple of events coming up that people might be interested in. They are all at bayescamp.com/courses Stan Taster Webinar is on 15 May, runs for one hour and is only £15. I’ll demo Stan through R (and maybe PyStan and CmdStan if the interest is there on the day), ...

Source: Statistical Modeling, Causal Inference, and Social Science

Tyler Cowen links to a research article by Brenden Timpe, “The Long-Run Effects of America’s First Paid Maternity Leave Policy,” that begins as follows: This paper provides the first evidence of the effect of a U.S. paid maternity leave policy on the long-run outcomes of children. I exploit variation in access to paid leave that ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: N=1 survey points to Beto O’Rourke as Democratic nominee in 2020

Last year we did an N=1 poll on the Democratic primary election for governor of New York. And the poll worked pretty well. To recap: A survey with N=1! And not even a random sample. How could we possibly learn anything useful from that? We have a few things in our favor: – Auxiliary information ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: What’s a good default prior for regression coefficients? A default Edlin factor of 1/2?

The punch line “Your readers are my target audience. I really want to convince them that it makes sense to divide regression coefficients by 2 and their standard errors by sqrt(2). Of course, additional prior information should be used whenever available.” The background It started with an email from Erik van Zwet, who wrote: In ...

Source: L. Collado-Torres on L. Collado-Torres

Link: How to write academic documents with GoogleDocs

<p>These past months I’ve been mostly working on one <em>huge</em> project which might be close to an end, hopefully! This project involves a massive manuscript with many supplementary figures and tables. Today we sent it out to other members in our team, and to celebrate, I’m now writing more 😅: though this is a blog post. I’m allowing myself to do so before I dive into the pile of tasks I haven’t completed<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a>. So I’m going to share with you the tools I’ve been using since 2018 or so for writing academic documents shared via <a ...

Source: Blog on rOpenSci - open tools for open science

Link: Getting your toes wet in R: Hydrology, meteorology, and more

Importance of Hydrology Given that liquid water is essential to life on Earth, water research cuts across numerous disciplines including hydrology, meteorology, geography, climate science, engineering, ecology, and more. Numerous R packages have emerged from this diversity of approaches, and we recently gathered many of them into a new rOpenSci task view which we broadly titled ‘Hydrology’ and published to CRAN. Our intent is to be exhaustive and compile R packages to access, model, and summarise information related to the movement of water across the Earth’s ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: An R package for multiverse analysis and counting researcher degrees of freedom

Joachim Gassen writes: In a recent blog post I introduce an in-development R package that helps researchers to identify, document and exhaust inherent research design choices in work based on observational data. As the analysis that I propose is similar in notion to a multiverse analysis that you suggested, I thought that maybe the package ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Thinking about “Abandon statistical significance,” p-values, etc.

We had some good discussion the other day following up on the article, “Retire Statistical Significance,” by Valentin Amrhein, Sander Greenland, and Blake McShane. I have a lot to say, and it’s hard to put it all together, in part because my collaborators and I have said much of it already, in various forms. For ...

Source: Homepage on Liechi | 張列弛

Link: 大伴旅人上新闻

今天日本发布了下任天皇的年号，曰“令和”。 日本天皇的年号一般都是从中国古代的典籍里去选字的（从大化（645年）开始到平成（2019）247个 ...

Source: Simply Statistics

Link: Interview with Abhi Datta

<p><em>Editor’s note: This is the next in our series of interviews with early career statisticians and data scientists. Today we are talking to Abhi Datta about his work in large scale spatial analysis and his interest in soccer! Follow him on Twitter at <a href="https://twitter.com/datta_science">@datta_science</a>. If you have recommendations of an (early career) person in academics or industry you would like to see promoted, reach out to Jeff (@jtleek) on Twitter!</em></p> <p><em>SS: Do you consider yourself a statistician, biostatistician, data scientist, or something else?</em></p> ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Impact of published research on behavior and avoidable fatalities

In a paper entitled, “Impact of published research on behavior and avoidable fatalities,” Addison Kramer, Alexandra Kirk, Faizaan Easton, and Bertram Hester write: There has long been speculation of an “informational backfire effect,” whereby the publication of questionable scientific claims can lead to behavioral changes that are counterproductive in the aggregate. Concerns of informational backfire ...

Source: Econometrics and Free Software

Link: Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

<div style="text-align:center;"> <p><a href="https://en.wikipedia.org/wiki/Michel_Rodange"> <img src="./img/michelrodange.jpg" title = "The high school I attended was named after this gentleman"></a></p> </div> <p>In this blog post I’m going to show you how you can extract text from scanned pdf files, or pdf files where no text recognition was performed. (For pdfs where text recognition was performed, you can read my <a href="https://www.brodrigues.co/blog/2018-06-10-scraping_pdfs/">other blog post</a>).</p> <p>The pdf I’m going to use can be downloaded from <a href="http://www.luxemburgensia. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A comment about p-values from Art Owen, upon reading Deborah Mayo’s new book

The Stanford statistician writes: One of the fun parts of this was reading some of what Meehl wrote. I’d seen him quoted but had not read him before. What he says reminds me a lot of how p values were presented when I was an undergraduate at Waterloo. They emphasized large p values as a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Another bit from Art Owen, this time dunking on ripoff publishers

From Owen’s review of Mayo’s book: Going through this put me in mind of Jim Zidek’s early 1980s work on multi-Bayesian theory. The most cited paper there is his JRSS-A paper with Weerahandri from 1981. From the abstract it looks more like it addresses formation of a consensus posterior or decision choice and is not ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Here’s an idea for not getting tripped up with default priors . . .

I put this in the Prior Choice Recommendations wiki awhile ago: “The prior can often only be understood in the context of the likelihood”: http://www.stat.columbia.edu/~gelman/research/published/entropy-19-00555-v2.pdf Here’s an idea for not getting tripped up with default priors: For each parameter (or other qoi), compare the posterior sd to the prior sd. If the posterior sd for ...

Source: blog.sellorm.com

Link: Build your own CRAN-like repo

In my last post, “Lifting the lid on CRAN”, we took a look at how R and CRAN interact to enable R users to install packages. In this post we’re going to dig a little deeper by building our own CRAN-like repo that we can install packages from. Enterprise R package management Before we get started, I just want to stress that what we’ll learn about here is no substitute for using a product like RStudio Package ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: David Weakliem on the U.S. electoral college

The sociologist and public opinion researcher has a series of excellent posts here, here, and here on the electoral college. Here’s the start: The Electoral College has been in the news recently. I [Weakliem] am going to write a post about public opinion on the Electoral College vs. popular vote, but I was diverted into ...

Source: blog.sellorm.com

Link: Lifting the lid on CRAN

CRAN is one of the many things that makes R such a great language. For those that don’t know, it’s where R users get the vast majority of the add-on packages that they use with the core language. CRAN also hosts downloads of the language itself, source code, tools and so on, but it’s most well know among users as the place where all the packages come from. The “CRAN team” are responsible for the ongoing maintenance of CRAN and also handle new and updated package submissions and generally ensure everything is running ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Ben Lambert. 2018. A Student’s Guide to Bayesian Statistics.

Ben Goodrich, in a Stan forums survey of Stan video lectures, points us to the following book, which introduces Bayes, HMC, and Stan: Ben Lambert. 2018. A Student’s Guide to Bayesian Statistics. SAGE Publications. If Ben Goodrich is recommending it, it’s bound to be good. Amazon reviewers seem to really like it, too. You may ...

Source: Statistical Modeling, Causal Inference, and Social Science

tl;dr: Someone asks me a question, I can’t really tell what he’s talking about, so I offer some generic advice. Joe Hoover writes: An issue has come up in my subsequent analyses, which uses my MrsP estimates to explore the relationship between county-level moral values and the county-level distribution of hate groups, as defined by ...

Source: Home on Another Random Blog

Link: tabletest

<pre><code class="language-r">library(ggplot2) ggplot(mtcars,aes(x = mpg , y =cyl)) + geom_point()+ facet_grid(gear ~ . )+ # ggtitle("test")+ labs(title="test")+theme(plot.title = element_text()) </code></pre> <p><img src="/post/2019-03-28-tabletest_files/figure-html/unnamed-chunk-1-1.png" width="672" /></p> ...

Source: Home on Another Random Blog

Link: Tufte Handout

<div id="introduction" class="section level1"> <h1>Introduction</h1> <p>The Tufte handout style is a style that Edward Tufte uses in his books and handouts. Tufte’s style is known for its extensive use of sidenotes, tight integration of graphics with text, and well-set typography. This style has been implemented in LaTeX and HTML/CSS<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a>, respectively. We have ported both implementations into the <a href="https://github.com/rstudio/tufte"><strong>tufte</strong> package</a>. If you want LaTeX/PDF output, you may use the ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: Reproducible research in bioinformatics

<p>I was invited to give a talk on reproducible bioinformatics research to the students in the <a href="https://www.bhcc.edu/" target="_blank">Bunker Hill Community College</a> in Boston, MA. I was so glad to introduce bioinformatics to the students and share my own perspectives on reproducible research.</p> <p>The movie <a href="https://en.wikipedia.org/wiki/Good_Will_Hunting" target="_blank">Good Will Hunting</a> was shot there :)</p> <p><img src="https://divingintogeneticsandgenomics.rbind.io/img/bunkerhill-talk.png" alt="" /></p> <p>Embed your slides or video here using <a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: An interview with Tina Fernandes Botts

Hey—this is cool! What happened was, I was scanning this list of Springbrook High School alumni. And I was like, Tina Fernandes? Class of 1982? I know that person. We didn’t know each other well, but I guess we must have been in the same homeroom a few times? All I can remember from back ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Understanding how Anova relates to regression

Analysis of variance (Anova) models are a special case of multilevel regression models, but Anova, the procedure, has something extra: structure on the regression coefficients. As I put it in the rejoinder for my 2005 discussion paper: ANOVA is more important than ever because we are fitting models with many parameters, and these parameters can ...

Source: Rob J Hyndman

Link: FFORMA: Feature-based Forecast Model Averaging

We propose an automated method for obtaining weighted forecast combinations using time series features. The proposed approach involves two phases. First, we use a collection of time series to train a meta-model to assign weights to various possible forecasting methods with the goal of minimizing the average forecasting loss obtained from a weighted forecast combination. The inputs to the meta-model are features extracted from each series. In the second phase, we forecast new series using a weighted forecast combination where the weights are obtained from our previously trained meta-model.<img ...

Source: Statistical Modeling, Causal Inference, and Social Science

Paul Alper writes: A couple of time at my suggestion, you’ve blogged about Paulo Macchiarini. Here is an update from Susan Perry in which she interviews the director of the Swedish documentary about Macchiarini: Indeed, Macchiarini made it sound as if his patients had recovered their health when, in fact, the synthetic tracheas he had ...

Source: Homepage on Liechi | 張列弛

Link: 何为萧萧

“萧萧”是个比较忙的词，常现身于诗词之中。诗词作者，体裁，主题尽可不同，但“萧萧”依旧。 荆轲要去刺杀秦王了，在易水边唱“风萧萧兮易水寒”，《 ...

Source: Home on Another Random Blog

Link: Project File management with R

<h1 id="intro">Intro</h1> <p>The main motivation of writing this post came from <a href="https://d.cosx.org/d/420539-r-studio">this post</a>, the OP asked how to manage a ~300 lines long R script in a hierarchical way in Rstudio, just like we can fold/unfold a section to its headings in a well-structured rmarkdown document.</p> <p>There’s no quick answer to this question that came out of my mind instantly. When thinking about it again, I start to wondering whether structuring a ~300 lines long R script is necessary. Nah, it isn’t. an all-in-one R script will quickly become a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Most Americans like big businesses.

Tyler Cowen asks: Why is there so much suspicion of big business? Perhaps in part because we cannot do without business, so many people hate or resent business, and they love to criticize it, mock it, and lower its status. Business just bugs them. . . . The short answer is, No, I don’t think ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Jonathan (another one) does Veronica Geng does Robert Mueller

Frequent commenter Jonathan (another one) writes: I realize that so many people bitch about the seminar showdown that you might need at one thank you. This year, I managed to re-read the bulk of Geng, and for that I thank you. I have not yet read any Sattouf, but it clearly has made an impression ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Markov chain Monte Carlo doesn’t “explore the posterior”

First some background, then the bad news, and finally the good news. Spoiler alert: The bad news is that exploring the posterior is intractable; the good news is that we don’t need to explore all of it. Sampling to characterize the posterior There’s a misconception among Markov chain Monte Carlo (MCMC) practitioners that the purpose ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Mister P for surveys in epidemiology — using Stan!

Jon Zelner points us to this new article in the American Journal of Epidemiology, “Multilevel Regression and Poststratification: A Modelling Approach to Estimating Population Quantities From Highly Selected Survey Samples,” by Marnie Downes, Lyle Gurrin, Dallas English, Jane Pirkis, Dianne Currier, Matthew Spittal, and John Carlin, which begins: Large-scale population health studies face increasing difficulties ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Should we talk less about bad social science research and more about bad medical research?

Paul Alper pointed me to this news story, “Harvard Calls for Retraction of Dozens of Studies by Noted Cardiac Researcher: Some 31 studies by Dr. Piero Anversa contain fabricated or falsified data, officials concluded. Dr. Anversa popularized the idea of stem cell treatment for damaged hearts.” I replied: Ahhh, Harvard . . . the reporter ...

Source: Homepage on Liechi | 張列弛

Link: 君子固穷

陈国遇到兵祸，楚国派兵救陈。在这战乱之时，孔子和弟子们正在行到陈国和蔡国之间，进退不得。很多天没有吃饭了，弟子们又饿又病，心感绝望；而孔子却 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Yes, I really really really like fake-data simulation, and I can’t stop talking about it.

Rajesh Venkatachalapathy writes: Recently, I had a conversation with a colleague of mine about the virtues of synthetic data and their role in data analysis. I think I’ve heard a sermon/talk or two where you mention this and also in your blog entries. But having convinced my colleague of this point, I am struggling to ...

Source: The R-Podcast

Link: Episode 29: Chicago R Unconference Recap

<h3 id="chicago-r-unconference">Chicago R Unconference</h3> <ul> <li>Unconf website: <a href="chirunconf.github.io/">https://chirunconf.github.io/</a></li> <li>Issue board: <a href="https://github.com/chirunconf/chirunconf19/issues">github.com/chirunconf/chirunconf19/issues</a></li> <li><code>#chirunconf</code> (Sharla Gelfand): <a href="https://sharla.party/posts/chirunconf/">sharla.party/posts/chirunconf/</a></li> <li>Discover and share your supeR powers (Mauro Lepore): <a href="https://maurolepore.github.io/confs/articles/2019_chirunconf_experience.html">maurolepore.github.io/confs/article ...

Source: Homepage on Liechi | 張列弛

Link: 孔子东游

大概是初中的时候，学过一篇文章叫《两小儿辩日》。两个小孩子争论太阳离我们什么时候远，什么时候近，并各自提出了自己的论点和论据，但互相都说服不 ...

Source: Antequated - maps for the rest of us

Link: Get antequated with SOmap

Polar maps! Last time we got this far by creating a very simplistic polar map and discussing some of the difficulties in customizing and finishing it. Since then Dale Maschette discussed these problems at useR Brisbane 2018 and stirred up multiple discussions on twitter about the joys of polar maps. Behold SOmap. SOmap::SOmap() ## Loading required namespace: rgeos SOmap To install the SOmap package use remotes::install_github("AustralianAntarcticDivision/SOmap") and see the package readme and documentation for further ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: New golf putting data! And a new golf putting model!

Part 1 Here’s the golf putting data we were using, typed in from Don Berry’s 1996 textbook. The columns are distance in feet from the hole, number of tries, and number of successes: x n y 2 1443 1346 3 694 577 4 455 337 5 353 208 6 272 149 7 256 136 8 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Postdoc in Chicago on statistical methods for evidence-based policy

Beth Tipton writes: The Institute for Policy Research and the Department of Statistics is seeking applicants for a Postdoctoral Fellowship with Dr. Larry Hedges and Dr. Elizabeth Tipton. This fellowship will be a part of a new center which focuses on the development of statistical methods for evidence-based policy. This includes research on methods for ...

Source: Rob J Hyndman

Link: Developing good research habits

Presentation for the 2019 honours and masters students Magic button for library access to papers Drag this Monash proxy link to your bookmarks. Links Mendeley Zotero Paperpile Google Scholar Rmarkdown Happy git with R Rmarkdown thesis template<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/5t8PipMNRR4" height="1" width="1" ...

Source: Econometrics and Free Software

Link: Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

<div style="text-align:center;"> <p><a href="https://youtu.be/R2u0sN9stbA?t=69"> <img src="./img/pivot.jpg" title = "You know where this leads"></a></p> </div> <p>There’s a lot going on in the development version of <code>{tidyr}</code>. New functions for pivoting data frames, <code>pivot_wide()</code> and <code>pivot_long()</code> are coming, and will replace the current functions, <code>spread()</code> and <code>gather()</code>. <code>spread()</code> and <code>gather()</code> will remain in the package though:</p> <blockquote class="twitter-tweet"><p lang="en" dir="ltr">You may have heard a ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: KRAS-IRF2 Axis Drives Immune Suppression and Immune Therapy Resistance in Colorectal Cancer

<p>More detail can easily be written here using <em>Markdown</em> and $\rm \LaTeX$ math ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Maybe it’s time to let the old ways die; or We broke R-hat so now we have to fix it.

“Otto eye-balled the diva lying comatose amongst the reeds, and he suddenly felt the fire of inspiration flood his soul. He ran back to his workshop where he futzed and futzed and futzed.” –Bette Midler Andrew was annoyed. Well, annoyed is probably too strong a word. Maybe a better way to start is with The ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: He asks me a question, and I reply with a bunch of links

Ed Bein writes: I’m hoping you can clarify a Bayesian “metaphysics” question for me. Let me note I have limited experience with Bayesian statistics. In frequentist statistics, probability has to do with what happens in the long run. For example, a p value is defined in terms of what happens if, from now till eternity, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: My two talks in Montreal this Friday, 22 Mar

McGill University Biostatistics seminar, Purvis Hall, 102 Pine Ave. West, Room 25, 1-2pm Fri 22 Mar: Resolving the Replication Crisis Using Multilevel Modeling In recent years we have come to learn that many prominent studies in social science and medicine, conducted at leading research institutions, published in top journals, and publicized in respected news outlets, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Retire Statistical Significance”: The discussion.

So, the paper by Valentin Amrhein, Sander Greenland, and Blake McShane that we discussed a few weeks ago has just appeared online as a comment piece in Nature, along with a letter with hundreds (or is it thousands?) of supporting signatures. Following the first circulation of that article, the authors of that article and some ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: C’est le fin! Riad Sattouf gagne.

Le mec japonais qui gagnait la competition pour manger les saucisses—alors, ça sonne mieux en anglais—M. Kobayashi était un grand « underdog », le cheval sombre de cet « mars fou », mais en fait je dois avancer le dessinateur, grâce à le poème de Dzhaughn: Please don’t ignore this dour crie de couer at ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: When and how do politically extreme candidates get punished at the polls?

In 2016, Tausanovitch and Warshaw performed an analysis “using the largest dataset to date of voting behavior in congressional elections” and found: Ideological positions of congressional candidates have only a small association with citizens’ voting behavior. Instead, citizens cast their votes “as if” based on proximity to parties rather than individual candidates. The modest degree ...

Source: Statistical Modeling, Causal Inference, and Social Science

Brad Greenwood, Seth Carnahan, and Laura Huang write: A large body of medical research suggests that women are less likely than men to survive traumatic health episodes like acute myocardial infarctions. In this work, we posit that these difficulties may be partially explained, or exacerbated, by the gender match between the patient and the physician. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: It’s the finals! The Japanese dude who won the hot dog eating contest vs. Riad Sattouf

I chose yesterday‘s winner based on this comment from Re’el: Hey, totally not related to this, but could offer any insight into this study: https://www.nytimes.com/2019/03/15/well/eat/eggs-cholesterol-heart-health.html It seems like something we go back and forth on and this study didn’t offer any insight. Thanks. Egg = oeuf, so we should choose the man whose name ends ...

Source: blog.sellorm.com

Link: Sellorm is WFH - Notes on the last 6 months of working from home

Photo by “Thought Catalog” on Unsplash.com For the last 6 months I’ve been working from home. My job, like that of most other knowledge workers, isn’t really location dependent and can essentially be done from anywhere with an internet connection. About 6 months ago my project commitments started to ramp up and I arranged to work from home to better manage my time. I can honestly say it’s been ...

Source: Homepage on Liechi | 張列弛

Link: 猪年的猛进

今年是乙亥猪年。在今年伊始的时候，我发现两件关于猪年的有趣的事情：猪年是一轮十二生肖的最后一年；我应该过过一次猪年了，但是我对上一轮猪年一点 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Someone points to my paper with Gary King from 1998, Estimating the probability of events that have never occurred: When is your vote decisive?, and writes: In my area of early childhood intervention, there are certain outcomes which are rare. Things like premature birth, confirmed cases of child-maltreatment, SIDS, etc. They are rare enough that ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Riad Sattouf (1) vs. Pele; the Japanese dude who won the hot dog eating contest advances

Lots of good arguments in favor of Bruce, but then this came from Noah: Hot-dog-garbled speech from Kobayashi recounting disgusting stories about ingesting absurdly large numbers of unchewed sausages and wet buns vs the gravelly, dulcet tones of New Jersey’s answer to John Mellencamp telling touching, timeless tales of musical world tours? The Boss in ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: It’s the semifinals! The Japanese dude who won the hot dog eating contest vs. Bruce Springsteen (1)

For our first semifinal match, we have an unseeded creative eater, up against the top-seeded person from New Jersey. It’s Coney Island vs. Asbury Park: the battle of the low-rent beaches. Again, we’re trying to pick the best seminar speaker. Here are the rules and here’s the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Statistical-significance thinking is not just a bad way to publish, it’s also a bad way to think

Eric Loken writes: The table below was on your blog a few days ago, with the clear point about p-values (and even worse the significance versus non-significance) being a poor summary of data. The thought I’ve had lately, working with various groups of really smart and thoughtful researchers, is that Table 4 is also a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: One more reason I hate letters of recommendation

Recently I reviewed a bunch of good reasons to remove letters of recommendation when evaluating candidates for jobs or scholarships. Today I was at a meeting and thought of one more issue. Letters of recommendation are not merely a noisy communication channel; they’re also a biased channel. The problem is that letter writers are strategic: ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Pele wins. On to the semifinals!

Like others, I’m sad that Veronica Geng is out of the running, so I’ll have to go with Diana: Jonathan’s post-hoc argument for Geng was so good that I now have to vote for Pele, given that his name can be transformed into Geng’s through a simple row matrix operation (a gesture that just might ...

Source: Homepage on Liechi | 張列弛

Link: 苔诗与植物进化

袁枚有首小诗，名《苔》： 白日不到处， 青春恰自来。 苔花如米小， 亦学牡丹开。 这首诗令人看后莞尔，为着小苔展示自己的努力，也为着袁枚体察这努力的慧 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Pele vs. Meryl Streep; Riad Sattouf advances

Yesterday Dzhaughn gave a complicated argument but ultimately I couldn’t figure out if it was pro- or anti-Geng, so I had to go with Dalton’s straight shot: Geng has been accused of being “subtle to the point of unintelligibility.” So apparently ole V puts the “b” in subtle. So here’s to our man, Riad who ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Something I noticed about this college admissions scandal

Most of the parents are roughly my age! William McGlashan, 55 . . . Agustin Huneeus Jr., 53 . . . Elizabeth, 56, and Manuel Henriquez, 55 . . . Jane Buckingham, 50 . . . Gordon Caplan, 52 . . . Marcia Abbott, 59 . . . Robert Zangrillo, 52 . . . Stephen ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Raghuram Rajan: “The Third Pillar: How Markets and the State Leave the Community Behind”

A few months ago I receive a copy of the book, “The Third Pillar: How Markets and the State Leave the Community Behind,” by economist Raghuram Rajan. The topic is important and the book is full of interesting thoughts. It’s hard for me to evaluate Rajan’s economics and policy advice, so I’ll leave that to ...

Source: Simply Statistics

Link: 10 things R can do that might surprise you

<p>Over the last few weeks I’ve had a couple of interactions with folks from the computer science world who were pretty disparaging of the R programming language. A lot of the critism focused on perceived limitations of R to statistical analysis.</p> <p>It’s true, R does have a hugely comprehensive list of analysis packages on <a href="https://cran.r-project.org/">CRAN</a>, <a href="http://bioconductor.org/">Bioconductor</a>, <a href="https://neuroconductor.org/">Neuroconductor</a>, and <a href="https://ropensci.org/">ROpenSci</a> as well as great package management. As I was ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Riad Sattouf (1) vs. Veronica Geng; Bruce Springsteen advances

Personally, I’d rather hear Dorothy Parker, but I had to go with Dalton’s pitch: Ah, but Dorothy Parker is actually from New Jersey. In fact, both Bruce and Dorothy are members of the official New Jersey hall of fame (https://njhalloffame.org/hall-of-famers/). Both were born in Long Branch, NJ. But Bruce is backed up (literally) by another ...

Source: Statistical Modeling, Causal Inference, and Social Science

Bob came across the above quote in this thread. More generally, though, I want to recommend the Stan Forums. As you can see from the snapshot below, the topics are varied: The discussions are great, and anyone can jump in. Lots of example code and all sorts of things. Also of interest: the Stan case ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: stanc3: rewriting the Stan compiler

I’d like to introduce the stanc3 project, a complete rewrite of the Stan 2 compiler in OCaml. Join us! With this rewrite and migration to OCaml, there’s a great opportunity to join us on the ground floor of a new era. Your enthusiasm for or expertise in programming language theory and compiler development can help ...

Source: Nan-Hung Hsieh on Nan-Hung Hsieh

Link: pksensi: an R package to apply sensitivity analysis in pharmacokinetic modeling

NA ...

Source: Peng Zhao on Peng Zhao

<p>Given by Meng Li and Peng ...

Source: Homepage on Kang Yu

Link: 今年应该Mark一下

<p>今天算是把该走的折腾的track走完了（吧），以后应该可以讲给孩子（们）听。人生贵在经历，的确如此！以后回来补上为啥mark吧</p> ...

Source: Homepage on Kang Yu

Link: 今年应该Mark一下

<p>今天算是把该走的折腾的track走完了（吧），以后应该可以讲给孩子（们）听。人生贵在经历，的确如此！以后回来补上为啥mark吧</p> ...

Source: Statistical Modeling, Causal Inference, and Social Science

Dalton made an impressive argument, too complicated to summarize, in favor of Jim Thorpe, “the destroyer of hot dog vendors,” but this was countered by Thomas’s logic: Since Jim Thorpe is top dog in whatever he tries his hand at, his demise is now inevitable. And ultimately I had to go with Albert, who made ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: R package for Type M and Type S errors

Andy Garland Timm writes: My package for working with Type S/M errors in hypothesis testing, ‘retrodesign’, is now up on CRAN. It builds on the code provided by Gelman and Carlin (2014) with functions for calculating type S/M errors across a variety of effect sizes as suggested for design analysis in the paper, a function ...

Source: Blog on rOpenSci - open tools for open science

Link: Community Call - Research Applications of rOpenSci Taxonomy and Biodiversity Tools

Our next Community Call, on March 27th, aims to help people learn about using rOpenSci’s R packages to access and analyze taxonomy and biodiversity data, and to recognize the breadth and depth of their applications. We also aim to learn from the discussion how we might improve these tools. Presentations will start with an introduction to the topic and details on some specific packages and we’ll hear from several people about their “use cases in the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Jim Thorpe (1) vs. the Japanese dude who won the hot dog eating contest

OK, now it starts to get interesting . . . Again, we’re trying to pick the best seminar speaker. Here are the rules and here’s the bracket so ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Junk science + Legal system = Disaster

Javier Benitez points us to this horrifying story from Liliana Segura: “Junk Arson Science Sent Claude Garrett to Prison for Murder 25 Years Ago. Will Tennessee Release ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Meryl Streep advances; it’s down to the quarterfinals!

As Manuel put it, as Stephen Hawking might put it: It’s duets with Pierce Brosnan all the way ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Political Polarization and Gender Gap: I Don’t Get Romer’s Beef with Bacon.

Gur Huberman writes: Current politics + statistical analysis, the Paul Romer v. 538 edition: https://paulromer.net/more-to-the-gender-gap/ Economist Paul Romer is criticizing a news article by Perry Bacon, Jr. entitled, “The Biggest Divides On The Kavanaugh Allegations Are By Party — Not Gender.” My reaction: I don’t get Romer’s beef. Bacon’s article seems reasonable to me: He ...

Source: Rob J Hyndman

Organizations such as government departments and financial institutions provide online service facilities accessible via an increasing number of internet connected devices which make their operational environment vulnerable to cyber attacks. Consequently, there is a need to have mechanisms in place to detect cyber security attacks in a timely manner. A variety of Network Intrusion Detection Systems (NIDS) have been proposed and can be categorized into signature-based NIDS and anomaly-based NIDS. The signature-based NIDS, which identify the misuse through scanning the activity signature ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: LeBron James (3) vs. Meryl Streep; Pele advances

Yesterday I was gonna go with Turing following Dalton’s eloquent argument: To his credit, Pele didn’t have to play in the era of VAR (video assistant referees). As last Tuesday’s Champions League game between Paris – Saint Germain and Manchester United (as well as the recent World Cup final between Croatia and France) demonstrate, there ...

Source: Statistical Modeling, Causal Inference, and Social Science

So. The other day I came across a link by Palko to this post from 2012, where he wrote: Pollsters had long tracked campaigns by calling random samples of potential voters. As campaign became more drawn out and journalistic focus shifted to the horse race aspects of election, these phone polls proliferated. At the same ...

Source: Rob J Hyndman

Link: A brief history of forecasting competitions

Forecasting competitions are now so widespread that it is often forgotten how controversial they were when first held, and how influential they have been over the years. I briefly review the history of forecasting competitions, and discuss what we have learned about their design and implementation, and what they can tell us about forecasting. I also provide a few suggestions for potential future competitions, and for research about forecasting based on competitions.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/L5eP0FJ7wQY" height="1" width="1" ...

Source: ewen

Link: ...xPoints?

Translating advanced player performance metrics into fantasy football points ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Alan Turing (4) vs. Pele; Veronica Geng advances

I gotta go with Geng, based on this from Jonathan: I was all in on Geng, as you know, but I have no idea what she sounded like. But it’s not the voice is it? It’s the content. And listen to what Geng could do (Remorse, April 7, 1986) “I will also spend one hundred ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Not Dentists named Dennis, but Physicists named Li studying Li

Charles Jackson writes: I was spurred to do this search by reading an article in the 30 Mar 2018 issue of Science. The article was: Self-heating–induced healing of lithium dendrites by Lu Li et al. Wikipedia says that more than 93 million people in China have the surname Li. I found 62 articles on Lithium ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The neurostatistical precursors of noise-magnifying statistical procedures in infancy

David Allison points us to this paper, The neurodevelopmental precursors of altruistic behavior in infancy, by Tobias Grossmann, Manuela Missana, and Kathleen Krol, which states: The tendency to engage in altruistic behavior varies between individuals and has been linked to differences in responding to fearful faces. The current study tests the hypothesis that this link ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Veronica Geng vs. Nora Ephron; Riad Sattouf advances

Not much going on in yesterday‘s Past vs. Future battle. Maybe we should’ve brought in Michael J. Fox as a guest judge . . . Anyway, the best argument in the comments came from Ethan: Since we can’t have Mr P let’s have Mr B. Ahhh, but we can have Mr P. We can always ...

Source: Peng Zhao on Peng Zhao

Link: sinx: R fortunes in Chinese

<p>One of the funniest things I found when I (as a Windows user) leant Ubuntu was that there is a command <code>fortune</code>, which prints a random/pseudorandom message from a database of quotations. It is said that this old feature has been available since 1970s. It was a pity that this feature was unavailable in boring Windows OS, until the R community developed in 2012 a package called ‘fortunes,’ which displays funny messages taken from the talks or communications in the R community. It supports external database as well. Unfortunately, it does not support Chinese ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Abandon / Retire Statistical Significance”: Your chance to sign a petition!

Valentin Amrhein, Sander Greenland, and Blake McShane write: We have a forthcoming comment in Nature arguing that it is time to abandon statistical significance. The comment serves to introduce a new special issue of The American Statistician on “Statistical inference in the 21st century: A world beyond P < 0.05”. It is titled "Retire Statistical ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A corpus in a single survey!

This was something we used a few years ago in one of our research projects and in the paper, Difficulty of selecting among multilevel models using predictive accuracy, with Wei Wang, but didn’t follow up on. I think it’s such a great idea I want to share it with all of you. We were applying ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Riad Sattouf (1) vs. Mel Brooks; Bruce Springsteen advances

Dalton crisply solved yesterday’s problem right away with, The real test is would you rather have Bruce Springsteen cook for you or have Julia Child sing to you? To ask this question is to answer it. I didn’t even read any comments after that one. Today’s matchup is more exciting. It’s The Arab of the ...

Source: Econometrics and Free Software

<div style="text-align:center;"> <p><a href="https://youtu.be/BilPXIt0R2w?t=41"> <img src="./img/wabbit_reading.jpg" title = "Vowpal Wabbit is fast as heck"></a></p> </div> <p>In <a href="https://www.brodrigues.co/blog/2019-03-03-historical_vowpal/">part 1</a> of this series I set up Vowpal Wabbit to classify newspapers content. Now, let’s use the model to make predictions and see how and if we can improve the model. Then, let’s train the model on the whole data.</p> <div id="step-1-prepare-the-data" class="section level2"> <h2>Step 1: prepare the data</h2> <p>The first step consists in ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: (back to basics:) How is statistics relevant to scientific discovery?

Someone pointed me to this remark by psychology researcher Daniel Gilbert: Publication is not canonization. Journals are not gospels. They are the vehicles we use to tell each other what we saw (hence “Letters” & “proceedings”). The bar for communicating to each other should not be high. We can decide for ourselves what to make ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Julia Child (2) vs. Bruce Springsteen (1); Dorothy Parker advances

Yesterday it was Dorothy Parker in a landslide. Commenters just couldn’t resist dissing the Wild and Crazy Guy. Noah came in with a limerick: There once was a Martin named Steven whose humor we used to believe in. His outlook got starker. He’s no Dorothy Parker. In this matchup, then, Steven be leavin’. And Dzhaughn ...

Source: Homepage on Yihui Xie | 谢益辉

Link: formatR

<script type="text/javascript"> // redirect from CRAN to my personal website if (location.protocol === 'https:' && location.href.match('yihui.name') === null) location.href = 'https://yihui.name/formatR'; </script> <h1 id="1-installation">1. Installation</h1> <p>You can install <strong>formatR</strong> from <a href="https://cran.r-project.org/package=formatR">CRAN</a>, or <a href="https://xran.yihui.name">XRAN</a> if you want to test the latest development version:</p> <pre><code class="language-r">install.packages("formatR", repos = "http://cran.rstudio.com") #' to ...

Source: Statistical Modeling, Causal Inference, and Social Science

The culinary athlete wins this round courtesy of this sonnet from Jeff: Does not the dude of Japanese descent Who won the eating contest have a name? He does, of course, but thanks to that event We ponder on the stuff of Nathan’s fame. What kind of man, in glint of morning, thinks “This day ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Yes on design analysis, No on “power,” No on sample size calculations

Kevin Lewis points us to this paper, “Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty,” by Samantha Anderson, Ken Kelley, and Scott Maxwell. My reaction: Yes, it’s reasonable, but I have two big problems with the general approach: 1. I don’t like talk of power ...

Source: Rob J Hyndman

Link: Time Series Data Library

The Time Series Data Library is no longer hosted on this website. You can get the data from DataMarket, or from the tsdl R package.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/s9faL20mfqw" height="1" width="1" ...

Source: Econometrics and Free Software

Link: Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit

<div style="text-align:center;"> <p><a href="https://youtu.be/BilPXIt0R2w?t=41"> <img src="./img/wabbit_reading.jpg" title = "Vowpal Wabbit is fast as heck"></a></p> </div> <p>Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four (<a href="https://www.brodrigues.co/blog/2019-01-04-newspapers/">1</a>, <a href="https://www.brodrigues.co/blog/2019-01-13-newspapers_mets_alto/">2</a>, <a href="https://www.brodrigues.co/blog/2019-01-31-newspapers_shiny_app/">3</a> and <a href="https://www.brodrigues.co/blog/2019-02-04-newspapers_shiny_app_tutorial/">4</a>) blog ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Albert Brooks vs. the Japanese dude who won the hot dog eating contest; Jim Thorpe advances

Best argument yesterday came from Jonathan: Thorpe was played in the movies by Burt Lancaster. Caesar was played by Joseph Bologna. Ham vs bologna? Thorpe. Also, Thorpewards. Today the contest is between two people who, a commenter reminded us last round, both have names. Wit or creative eating? Your call. Again, here are the rules ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Journalist seeking scoops is as bad as scientist doing unreplicable research

Tom Scocca shares this dispiriting story: Yesterday, as a news day, was an even worse cascade of lies and confusion and gibberish than usual. Yet what stood out the most was a single word: “Clarification.” It appeared at the bottom of a very short Axios post by reporter Jonathan Swan, introducing a note that read, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Round 3 begins: Jim Thorpe (1) vs. Sid Caesar

Today’s contest is a tough call. Caesar is the king of live TV, a real originator—but he did not write his own material, so it could be risky to invite him without getting Carl Reiner, Woody Allen, etc., as support staff. Thorpe is a legend but probably not much of a performer. Both Caesar and ...

Source: Statistical Modeling, Causal Inference, and Social Science

Someone pointed me with suspicion to a newspaper article that reported a cool-looking social science result, and asked me for my thoughts. I replied, Yes, not only am I suspicious of the claims in that article, I’m also suspicious of all the individual claims from these links. And I pointed to a bunch of links ...

Source: Statistical Modeling, Causal Inference, and Social Science

In a letter to the Journal of Nursing Research, Brown and Allison write: We question the conclusions that a health promotion model “was highly effective for gaining healthy life behaviors and the control of BMI of the participants” in an article recently published in The Journal of Nursing Research (Fidanci, Akbayrak, & Arslan, 2017). The ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Meryl Streep advances and the second round is over!

Best comment yesterday came from Dalton: Yakov Smirnoff’s relevance died with the Soviet Union. And he knows it, which is why he’s apparently now giving motivational talks entitled “Happily Ever Laughter: The Neuroscience of Romantic Relationships”. You can even watch in on PBS: https://www.pbssocal.org/programs/yakov-smirnoffs-happily-ever-laughter/ When I want to experience breathlessly credulous regurgitation of bullshit science ...

Source: L. Collado-Torres on L. Collado-Torres

Link: CDSBMexico: remember to apply for BioC2019 travel scholarships

<p><em>This blog post was first published at the <a href="http://www.comunidadbioinfo.org/cdsbmexico-remember-to-apply-for-bioc2019-travel-scholarships/">CDSBMexico website</a>.</em></p> <blockquote class="twitter-tweet"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/CDSBMexico?src=hash&ref_src=twsrc%5Etfw">#CDSBMexico</a>: remember to apply for BioC2019 travel scholarships!!<br><br>Due date is March 15th<a href="https://t.co/iegG0qQzwu">https://t.co/iegG0qQzwu</a><br><br>Let us help you! Here we give you some ideas 💡We can also give you feedback via Slack ✅<a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: George Orwell meets statistical significance: “Politics and the English Language” applied to science

1. Political writing: imprecision as a tool for obscuring the indefensible In his classic essay, “Politics and the English Language,” the political journalist George Orwell drew a connection between cloudy writing and cloudy content. The basic idea was: if you don’t know what you’re saying, or if you’re trying to say something you don’t really ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Meryl Streep vs. Yakov Smirnoff; LeBron James advances

Yesterday‘s contest was lively. The two best arguments both favor LeBron. From Manuel: And King James said unto them, What seemeth you best I will do. And the king stood by the gate side, and all the people came out by hundreds and by thousands. And from Zbicyclist followed by Dalton: LeBron has been in ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: My talk this coming Monday in the Columbia statistics department

Monday 4 Mar, 4pm in room 903 Social Work Bldg: We’ve Got More Than One Model: Evaluating, comparing, and extending Bayesian predictions Methods in statistics and data science are often framed as solutions to particular problems, in which a particular model or method is applied to a dataset. But good practice typically requires multiplicity, in ...

Source: The R-Podcast

Link: Episode 28: Tidymodels with Max Kuhn (rstudio::conf 2019)

<h3 id="conversation-with-max-kuhn">Conversation with Max Kuhn</h3> <ul> <li>Tidy models: <a href="https://github.com/tidymodels">github.com/tidymodels</a></li> <li><code>parsnip</code> - A tidy interface to models: <a href="https://tidymodels.github.io/parsnip/">tidymodels.github.io/parsnip/</a></li> <li><code>caret</code> package: <a href="http://topepo.github.io/caret/index.html">topepo.github.io/caret/index.html</a></li> <li>Conventions for R Modeling Packages: <a href="https://tidymodels.github.io/model-implementation-principles/">tidymodels.github.io/model-implementation-principles/</a> ...

Source: The R-Podcast

Link: Max Kuhn

<p>Max was a nonclinical statistician for 12 years in the pharmaceutical industry and for 6 years in the medical diagnostic industry. His degrees are in Biostatistics (Ph.D.) and Mathematics (B.S.). He has released several R packages for predictive modeling and machine learning, including <a href="http://topepo.github.io/caret/index.html"><code>caret</code></a>, <code>C50</code>, and <code>Cubist</code>. He is the author of the Springer book <a href="http://appliedpredictivemodeling.com/">Applied Predictive Modeling</a> (with Kjell Johnson), which won the American Statistical Association’s ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Ellen DeGeneres vs. LeBron James (3); Pele advances

Not a lot yesterday; maybe Phil‘s right that the formation of the brackets is sometimes more fun than the actual competitions. Anyway, best argument came from Ethan: We’ve not thought at all about language. I think modern Portuguese might do better in Columbia’s neighborhood than classic French. Laplace would write equations in a universal language ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Light Privilege? Skin Tone Stratification in Health among African Americans”

Kevin Lewis points us to this article by Taylor Hargrove, which states: Although skin color represents a particularly salient dimension of race, its consequences for health remains unclear. The author uses four waves of panel data from the Coronary Artery Risk Development in Young Adults study and random-intercept multilevel models to address three research questions ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Statistical-significance filtering is a noise amplifier.

The above phrase just came up, and I think it’s important enough to deserve its own post. Well-meaning researchers do statistical-significance filtering all the time—it’s what they’re trained to do, it’s what they see in published papers in top journals, it’s what reviewers for journals want them to do—so I can understand why they do ...

Source: Blog on rOpenSci - open tools for open science

Link: stats19: a package for road safety research

Introduction stats19 is a new R package enabling access to and working with Great Britain’s official road traffic casualty database, STATS19. We started the package in late 2018 following three main motivations: The release of the 2017 road crash statistics, which showed worsening road safety in some areas, increasing the importance of making the data more accessible. The realisation that many researchers were writing ad hoc code to clean the data, with a huge amount of duplicated (wasted) effort and potential for mistakes to lead to errors in the labelling of the data (more on that ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Data For Progress’s RuPaul-Predict-a-Looza

Data for Progress launched the RuPaul-Predict-a-Looza (and winner), the first ever RuPaul’s Drag Race prediction competition. Statistical models versus NYC Council Speaker Corey Johnson. The prize: bragging rights and the ability to add one policy question on the next Data for Progress survey. First predictions are due this Thursday (February 28). I made a notebook ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Pele vs. Pierre Simon Laplace (2); Alan Turing advances

Best comment yesterday came from Manuel: Turing did not know how to train a machine to pass the Turing test. I’m sure Oprah knows how train a person to pass the Oprah test. But there is no Oprah test. So Turing will advance. Maybe next time we do this competition we can include Alison Bechdel. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “We’ve Got More Than One Model: Evaluating, comparing, and extending Bayesian predictions”

I was asked to speak at the American Association of Pharmaceutical Scientists Predictive Modeling Workshop, and a title was needed. This is what I came up with: We’ve Got More Than One Model: Evaluating, comparing, and extending Bayesian predictions It’s the Bayesian Workflow stuff we’ve been pushing for awhile. But I like this new ...

Source: Robin Lovelace's website. Energy. Transport. Technology. Change the World.

Link: stplanr paper published

I am very happy to announce that the paper stplanr: A package for transport planning has been published in The R Journal (Lovelace and Ellison 2018) 🎉. This is the result of around 3 years of work: it took us (co-author Richard Ellison and me) over a year to get round to writing the paper after the stplanr package was first released on CRAN in November 2015 (see its archive on CRAN for details); it wasn’t until March 2017 that the paper was formally ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Don’t worry, the post will be coming . . . eventually

Jordan Anaya sends along a link and writes: Not sure if you’re planning on covering this, but I noticed this today. This could also maybe be another example of the bullshit asymmetry principle since the original paper has an altmetric of 1300 and I’m not sure the rebuttal will get as much attention. I replied ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Evidence distortion in clinical trials

After seeing our recent post, “Seeding trials”: medical marketing disguised as science, Till Bruckner sent me this message: I’ve been working on clinical trial transparency issues for over two years now, first for AllTrials and now for TranspariMED, and can assure you that this is only the tip of the iceberg. This report by Transparency ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Oprah Winfrey (1) vs. Alan Turing (4); Nora Ephron advances

Yesterday Diana gave an eloquent argument in favor of Voltaire, but then I came across this comment from Dzhaughn: I am concened, as Dalton was earlier, about the risk of uniting the Ephron-Streep-Child triumvirate. This could lead to bad things. Julie vs. Julia, courtroom drama. Jules or Julia, cashing in on the franchise. Meatless in ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Does diet soda stop cancer? Two Yale Cancer Center docs have diametrically opposite views!

Check out these two quotes regarding a recent study, “Associations of artificially sweetened beverage intake with disease recurrence and mortality in stage III colon cancer.” First there’s the claim: Artificially sweetened drinks have a checkered reputation in the public because of the purported health risks that have never really been documented. Our study clearly shows ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Voltaire (4) vs. Nora Ephron; Veronica Geng advances

Jonathan informs us: Geng’s first contribution to the New Yorker was a July 12th 1976 parody of Martin Gardner. With Gardner eliminated, the Gardner partisans among us can jump on a new wagon. (Alternatively, Geng could simply be replaced by Gardner, but the structure of this contest is anarchic enough already.) In any case, here ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: George H. W. Bush (2) vs. Veronica Geng; Mel Brooks advances

Manuel writes: Old Frankenstein vs Young Frankenstein? Like Prior Frankenstein vs Posterior Frankenstein? Posterior Frankenstein would incorporate more information, so in my opinion it would be preferable for the seminar. But who is Posterior here? Old Frankenstein is older, it should incorporate more information. But he came first. If we exclude time travel, Young Frankenstein ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: HMC step size: How does it scale with dimension?

A bunch of us were arguing about how the Hamiltonian Monte Carlo step size should scale with dimension, and so Bob did the Bob thing and just ran an experiment on the computer to figure it out. Bob writes: This is for standard normal independent in all dimensions. Note the log scale on the x ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: Use docopt to write command line R utilities

<p>I was writing an R script to plot the ATACseq fragment length distribution and wanted to turn the R script to a command line utility.</p> <p>I then (re)discovered this awesome <a href="https://github.com/docopt/docopt.R" target="_blank">docopt.R</a>. One just needs to write the help message the you want to display and <code>docopt()</code> will parse the options, arguments and return a named list which can be accessed inside the R script. check <a href="http://docopt.org/" target="_blank">http://docopt.org/</a> for more information as well.</p> <p>See below for an example. You can download ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Boris Karloff (3) vs. Mel Brooks; Riad Sattouf advances

In yesterday’s contest, Dalton asks: Lance Armstrong isn’t even a GOAT. Did he cheat to get included on the list at the expense of Eddy Merckx? But then Jrc points out: Lance isn’t in for Cycling GOAT, he’s in for NGO-bracelet GOAT. I’m pretty sure he didn’t juice the bracelets. Although now that I think ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Kevin Lewis has a surefire idea for a project for the high school Science Talent Search

Here’s his idea: If I were a student, I’d do a study on how Science Talent Search judges are biased. That way, they can’t reject it, otherwise it’s self-confirming. That’s a great idea! Maybe it’s possible to go meta on this one by adding some sort of game-theoretic model or simulation of talent search submission ...

Source: Home on Another Random Blog

Link: Differences of Rmd/Rmarkdown/md in blogdown

<blockquote> <p>Too many options is a curse.</p> </blockquote> <p>Today there’s a related <a href="https://github.com/rstudio/blogdown/issues/358">Github issue</a> that spurs some interesting discussion regarding the different options in blogdown. In particular, the three different format(Rmd/Rmarkdown/md) when writing a blogdown post.</p> <p>In fact, the differences I’m going to discuss reflect the differences between two markdown processors with different “extended” features. One processor is <a href="https://pandoc.org/">pandoc</a>, which works under the hood of ...

Source: Rob J Hyndman

Link: Post-docs in wind and solar power forecasting

We currently have two postdoc opportunities together with an industry partner in the field of wind and solar power forecasting (full time, Level B). They are suitable for recently graduated PhD students that can start between now and June-July. The opportunities are as follows: Wind power forecasting: 1 year contract Good programming skills in R and/or Python Solid background in Machine Learning and/or Statistics Background in time series forecasting desirable Solar power forecasting: 6 months contract Good programming skills in R and/or Python Solid background in Machine Learning and/or ...

Source: Simply Statistics

Link: Open letter to journal editors: dynamite plots must die

<p>Statisticians have been pointing out the problem with dynamite plots, also known as bar and line graphs, for years. Karl Broman lists them as one of the <a href="https://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/">top ten worst graphs</a>. The problem has even been documneted in the peer reviewed literature. For example, <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3087125/">this British Journal of Pharmacology</a> paper titled <em>Show the data, don’t conceal them</em> was published in 2011.</p> <p>However, despite all these efforts, dynamite plots continue to be ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “News Release from the JAMA Network”

A couple people pointed me to this: Here’s the Notice of Retraction: On May 8, 2018, notices of Expression of Concern were published regarding articles published in JAMA and the JAMA Network journals that included Brian Wansink, PhD, as author. At that time, Cornell University was contacted and was requested to conduct an independent evaluation ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Riad Sattouf (1) vs. Lance Armstrong; Bruce Springsteen advances

Best comment yesterday came from Jan: Now we have opportunity to see in the next round whether Julia is really that much better than Python! But that doesn’t resolve anything! So to pick a winner we’ll have to go with Tom: Python foresaw the replication crisis with their scientific method of proving someone is a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Geoff Pullum, the linguist who hates Strunk and White, is speaking at Columbia this Friday afternoon

The title of the talk is Grammar, Writing Style, and Linguistics, and here’s the abstract: Some critics seem to think that English grammar is just a brief checklist of linguistic table manners that every educated person should already know. Others see grammar as a complex, esoteric, and largely useless discipline replete with technical terms that ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Monty Python vs. Bruce Springsteen (1); Julia Child advances

From Jeff: If they meet in the semi-final the Japanese dude will eat Frank for lunch: All vs. Nothing at All. Though it appears she also had a soft spot for hot dogs, if Julia makes it that far it would be a matchup of gourmet vs gourmand, which seems a better contest. Today it’s ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Statmodeling Retro

As many of you know, this blog auto-posts on twitter. That’s cool. But we also have 15 years of old posts with lots of interesting content and discussion! So I had this idea of setting up another twitter feed, Statmodeling Retro, that would start with our very first post in 2004 and then go forward, ...

Source: Homepage on Yihui Xie | 谢益辉

Link: The Ultimate Infinite Moon Reader for xaringan Slides

<p>Recently I have been playing with WebSocket, partly due to the <code>chrome_print()</code> function in <a href="https://github.com/rstudio/pagedown">the <strong>pagedown</strong> package</a>. Last Friday, it suddenly occurred to me that there could be a very interesting way to improve the user experience of the “Infinite Moon Reader” in the <strong>xaringan</strong> package (i.e., <code>xaringan::inf_mr()</code>). After three days’ work, I have finally become happy with it:</p> <p><img src="https://user-images.githubusercontent.com/163582/53144527-35f7a500-3562-11e9-862e- ...

Source: Rob J Hyndman

Link: Anomaly detection in streaming nonstationary temporal data

This article proposes a framework that provides early detection of anomalous series within a large collection of non-stationary streaming time series data. We define an anomaly as an observation that is very unlikely given the recent distribution of a given system. The proposed framework first forecasts a boundary for the system’s typical behavior using extreme value theory. Then a sliding window is used to test for anomalous series within a newly arrived collection of series.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/8gmXqqqhYwM" height="1" width="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: I believe this study because it is consistent with my existing beliefs.

Kevin Lewis points us to ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Julia Child (2) vs. Frank Sinatra (3); Dorothy Parker

For yesterday‘s contest, Jonathan gave a strong argument: First New Yorker showdown, just to see who will be taking on Veronica Geng in the finals. All the other contestants are just for show. I’m going with Liebling, because Parker wasn’t even the best New Yorker writer of her generation, being edged out by Benchley. Liebling ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: My talk today (Tues 19 Feb) 2pm at the University of Southern California

At the Center for Economic and Social Research, Dauterive Hall (VPD), room 110, 635 Downey Way, Los Angeles: The study of American politics as a window into understanding uncertainty in science Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University We begin by discussing recent American elections in the context of political ...

Source: Blog on Credibly Curious

Link: Announcing 'Just Three Things'

<p>A little while ago I showed <a href="http://inundata.org/">Karthik Ram</a> the <code>percent</code> function from <code>scales</code>, and he said something along the lines of:</p> <blockquote> <p>There should be a high quality screencase where someone shows a couple of rstats tricks and thats it.</p> </blockquote> <p>So this is it! I just uploaded a screencast to youtube called “Just Three Things”. I can’t promise that it is as high quality as I would have liked, but, the idea is this:</p> <blockquote> <p>Just three #rstats things, in (ideally) under three ...

Source: Rob J Hyndman

Link: Hierarchical forecasting

Accurate forecasts of macroeconomic variables are crucial inputs into the decisions of economic agents and policy makers. Exploiting inherent aggregation structures of such variables, we apply forecast reconciliation methods to generate forecasts that are coherent with the aggregation constraints. We generate both point and probabilistic forecasts for the first time in the macroeconomic setting. Using Australian GDP we show that forecast reconciliation not only returns coherent forecasts but also improves the overall forecast accuracy in both point and probabilistic frameworks.<img ...

Source: Simply Statistics

Link: Interview with Stephanie Hicks

<p><em>Editor’s note: For a while we ran an interview series for statisticians and data scientists, but things have gotten a little hectic around here so we’ve dropped the ball! But we are re-introducing the series, starting with Stephanie Hicks. If you have recommendations of a (junior) person in academics or industry you would like to see promoted, reach out to Jeff (@jtleek) on Twitter!</em></p> <p><em>Stephanie Hicks received her PhD in statistics in 2013 at Rice University and has already made major contributions to the analysis of single cell sequencing data and the theory ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A. J. Liebling vs. Dorothy Parker (2); Steve Martin advances

As Dalton wrote: On one hand, Serena knows how to handle a racket. But Steve Martin knows how to make a racket with some strings stretched taught over a frame. Are you really gonna bet against the dude who went to toe-to-toe Kermit the Frog in racket making duel? Today we have an unseeded eater ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: R fixed its default histogram bin width!

I remember hist() in R as having horrible defaults, with the histogram bars way too wide. (See this discussion: A key benefit of a histogram is that, as a plot of raw data, it contains the seeds of its own error assessment. Or, to put it another way, the jaggedness of a slightly undersmoothed histogram ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Update on that study of p-hacking

Ron Berman writes: I noticed you posted an anonymous email about our working paper on p-hacking and false discovery, but was a bit surprised that it references an early version of the paper. We addressed the issues mentioned in the post more than two months ago in a version that has been available online since ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Do you have any recommendations for useful priors when datasets are small?”

Someone who wishes to remain anonymous writes: I just read your paper with Daniel Simpson and Michael Betancourt, The Prior Can Often Only Be Understood in the Context of the Likelihood, and I find it refreshing to read that “the practical utility of a prior distribution within a given analysis then depends critically on both ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Serena Williams vs. Steve Martin (4); The Japanese dude who won the hot dog eating contest advances

We didn’t have much yesterday, so I went with this meta-style comment from Jesse: I’m pulling for Kobayashi if only because the longer he’s in, the more often Andrew will have to justify describing him vs using his name. The thought of Andrew introducing the speaker as “and now, here’s that Japanese dude who won ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: P-hacking in study of “p-hacking”?

Someone who wishes to remain anonymous writes: This paper [“p-Hacking and False Discovery in A/B Testing,” by Ron Berman, Leonid Pekelis, Aisling Scott, and Christophe Van den Bulte] ostensibly provides evidence of “p-hacking” in online experimentation (A/B testing) by looking at the decision to stop experiments right around thresholds for the platform presenting confidence that ...

Source: blog.sellorm.com

Link: My interview on the Datacast podcast

I was recently interviewed for James Le’s excellent Datacast podcast. I talked to James about my career: from how I got started all the way up to the present day, as well as my passion for data and helping organisations professionalise their data science capability. You can’t even tell how cold I was during the interview - the heating was out in my office! The podcast is only on it’s 9th episode, but it’s off to a great start and I’d encourage you to subscribe to the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: More on that horrible statistical significance grid

Regarding this horrible Table 4: Eric Loken writes: The clear point or your post was that p-values (and even worse the significance versus non-significance) are a poor summary of data. The thought I’ve had lately, working with various groups of really smart and thoughtful researchers, is that Table 4 is also a model of their ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The Japanese dude who won the hot dog eating contest vs. Oscar Wilde (1); Albert Brooks advances

Yesterday I was going to go with this argument from Ethan: Now I’m morally bound to use the Erdos argument I said no one would see unless he made it to this round. Andrew will take the speaker out to dinner, prove a theorem, publish it and earn an Erdos number of 1. But then ...

Source: Robin Lovelace's website. Energy. Transport. Technology. Change the World.

Link: Aggregating lines, part II

The previous post demonstrated a new method to aggregate overlapping lines. It showed how to combine 2 lines that have an area of overlap. More excitingly, it led to the creation of a new function in stplanr, overline_sf(), that lives in the development version of the package. The purpose of this post is to provide an update on the status of the work to refactor the overline() function, in a human friendly alternative to discussion in the relevant GitHub issue: ...

Source: Rob J Hyndman

Link: A new tidy data structure to support exploration and modeling of temporal data

Mining temporal data for information is often inhibited by a multitude of formats: irregular or multiple time intervals, point events that need aggregating, multiple observational units or repeated measurements on multiple individuals, and heterogeneous data types. On the other hand, the software supporting time series modeling and forecasting, makes strict assumptions on the data to be provided, typically requiring a matrix of numeric data with implicit time indexes. Going from raw data to model-ready data is painful.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/SL3DHveaSBg ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: Split a 10xscATAC bam file by cluster

<p>I want to split the PBMC scATAC bam from 10x by cluster id. So, I can then make a bigwig for each cluster to visualize in <code>IGV</code>.</p> <p>The first thing I did was googling to see if anyone has written such a tool (Do not reinvent the wheels!). People have done that because I saw figures from the scATAC papers. I just could not find it. Maybe I need to refine my googling skills.</p> <p>I decided to write one myself. The following is my journey for this small task.</p> <p>download the 5k pbmc scATAC data from <a href="https://support.10xgenomics.com/single-cell-atac/datasets/1.0.1/ ...

Source: Statistical Modeling, Causal Inference, and Social Science

The Talk I’m going to be previewing the book I’m in the process of writing at the Ann Arbor R meetup on Monday. Here are the details, including the working title: Probability and Statistics: a simulation-based introduction Bob Carpenter Monday, February 18, 2019 Ann Arbor SPARK, 330 East Liberty St, Ann Arbor I’ve been to ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Paul Erdos vs. Albert Brooks; Sid Caesar advances

The key question yesterday was, can Babe Didrikson Zaharias do comedy or can Sid Caesar do sports. According to Mark Palko, Sid Caesar was by all accounts extremely physically strong. And I know of no evidence that Babe was funny. So Your Show of Shows will be going into the third round. And now we ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Simulation-based statistical testing in journalism

Jonathan Stray writes: In my recent Algorithms in Journalism course we looked at a post which makes a cute little significance-type argument that five Trump campaign payments were actually the $130,000 Daniels payoff. They summed to within a dollar of $130,000, so the simulation recreates sets of payments using bootstrapping and asks how often there’s ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Michael Crichton on science and storytelling

Javier Benitez points us to this 1999 interview with techno-thriller writer Michael Crichton, who says: I come before you today as someone who started life with degrees in physical anthropology and medicine; who then published research on endocrinology, and papers in the New England Journal of Medicine, and even in the Proceedings of the Peabody ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Sid Caesar vs. Babe Didrikson Zaharias (2); Jim Thorpe advances

Best comment from yesterday came from Dalton: Jim Thorpe isn’t from Pennsylvania, and yet a town there renamed itself after him. DJ Jazzy Jeff is from Pennsylvania, and yet Will Smith won’t even return his phone calls. Until I can enjoy a cold Yuengling in Jazzy Jeff, PA it’s DJ Jumpin’ Jim for the win. ...

Source: Rob J Hyndman

Outliers due to technical errors in water-quality data from in situ sensors can reduce data quality and have a direct impact on inference drawn from subsequent data analysis. However, outlier detection through manual monitoring is unfeasible given the volume and velocity of data the sensors produce. Here, we proposed an automated framework that provides early detection of outliers in water-quality data from in situ sensors caused by technical issues.The framework was used first to identify the data features that differentiate outlying instances from typical behaviours.<img ...

Source: L. Collado-Torres on L. Collado-Torres

Link: Why do you like living where you live?

<p>In the past months I’ve had a recurrent conversation with many people. This conversation is typically started with the question: why do you like living where you live? Some of them might be considering moving to the city I live in for work, some of them are thinking about leaving, some are happy here.</p> <p>Ultimately, everyone is different and what makes some happy might not be for the rest. Some friends want to live in larger cities, others want different climates, others want to move in with their long distance relationship partners, etc. Back in 2015 I was finishing my Ph.D. and ...

Source: Home on Sungpil Han, M.D/Ph.D

Link: 임상약리학: 1상 임상시험 및 초기 약물 개발

Clinical Pharmacology 1: Phase 1 Studies and Early Drug Development by Gerlie Gieser, FDA https://www.fda.gov/downloads/training/clinicalinvestigatortrainingcourse/ucm340007.pdf 1상 임상시험 및 초기 약물 개발에 대해 슬라이드에 잘 정리해 놓았다. SAD, MAD와 Renal & hepatic impairment, DDI, TQTc study 및 BA/BE의 소개가 되어 있다. Atazanavir, doripenem, raltegravir, bosutinib 약물 개발의 예도 마지막에 나와있는데 ...

Source: Blog on rOpenSci - open tools for open science

Link: Community Call Follow-up - Governance of Open Source Research Software Organizations

We tend to know a good open source research software project when we see it: The code is well-documented, users contribute back to the project, the software is licensed and citable, and the community interacts and co-produces in a healthy, productive fashion. The academic literature 1 and community discourse 2 around research software development offer insight into how to promote the technical best-practices needed to produce some of these project attributes; however, the management of non-technical, social components of software projects are less visible and therefore less often discussed in ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Halftime! And Jim Thorpe (1) vs. DJ Jazzy Jeff

So. Here’s the bracket so far: Our first second-round match is the top-ranked GOAT—the greatest GOAT of all time, as it were—vs. an unseeded but appealing person whose name ends in f. Again here are the rules: We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Should he go to grad school in statistics or computer science?

Someone named Nathan writes: I am an undergraduate student in statistics and a reader of your blog. One thing that you’ve been on about over the past year is the difficulty of executing hypothesis testing correctly, and an apparent desire to see researchers move away from that paradigm. One thing I see you mention several ...

Source: blog.sellorm.com

Link: Gartner-style charts in R with ggplot2

If you’re a regular reader of this blog you may have noticed the distinct lack of charts. I’ve written about my journey with R before, but not too much about what I do actually use R for. In short, I use it mainly for software engineering type work in order to automate other things in R, or to support the work of data scientists. There’s perhaps a bit more to it than that, but that’s the main thrust of ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Global warming? Blame the Democrats.

An anonymous blog commenter sends the above graph and writes: I was looking at the global temperature record and noticed an odd correlation the other day. Basically, I calculated the temperature trend for each presidency and multiplied by the number of years to get a “total temperature change”. If there was more than one president ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Yakov Smirnoff advances, and Halftime!

Best argument yesterday came from Yuling: I want to learn more about missing data analysis from the seminar so I like Harry Houdini. But Yakov Smirnoff is indeed better for this topic — both Vodka and the Soviet are treatments that guarantee everyone to be Missing Completely at Random, and as statistican we definitely prefer ...

Source: The R-Podcast

Link: Episode 27: Get the {gt} tables! (rstudio::conf 2019)

<h3 id="conversation-with-rich-iannone">Conversation with Rich Iannone</h3> <ul> <li><code>DiagrammeR</code> package: <a href="http://visualizers.co/diagrammer/">visualizers.co/diagrammer/</a></li> <li><code>gt</code> package: <a href="https://gt.rstudio.com/">gt.rstudio.com/</a></li> <li><code>blastula</code> package: <a href="https://github.com/rich-iannone/blastula">github.com/rich-iannone/blastula</a></li> </ul> <h3 id="preparing-for-the-shiny-modules-talk">Preparing for the Shiny Modules talk</h3> <ul> <li>Karl Browman’s rstudio::conf 2019 resources repo: <a href="https://github.co ...

Source: Home on Sungpil Han, M.D/Ph.D

Link: 과학기술통신부 장관상 수상

[동정] 임상약리학과 의료진 경진대회서 수상 임상약리학과 한성필·윤석규·조용순·김형섭 레지던트(지도교수: 임상약리학과 배균섭 교수)가 1월 25일 에디슨 중앙센터가 주최하는 제8회 에디슨 소프트웨어 활용 경진대회 전산의학 부문에서 `에디슨 사이언스 앱을 사용한 비구획 분석과 생물학적 동등성 분석의 통합´이라는 주제로 논문을 발표해 대상인 과학기술정보통신부 장관상을 수상했다. 관련자료 Github: https://github.com/asancpt/edison-BE 신문기사: http://news.hankyung.com/article/201901291491h ...

Source: Econometrics and Free Software

Link: Manipulating strings with the {stringr} package

<div style="text-align:center;"> <p><a href="https://b-rodrigues.github.io/modern_R/descriptive-statistics-and-data-manipulation.html#manipulate-strings-with-stringr"> <img src="./img/string.jpg" title = "Click here to go the ebook"></a></p> </div> <p>This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free <a href="https://b-rodrigues.github.io/modern_R/">here</a>. This is taken from Chapter 4, in which I introduce the <code>{stringr}</code> package.</p> <div id="manipulate-strings-with-stringr" class="section level2"> <h2>Manipulate strings with ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Harry Houdini (1) vs. Yakov Smirnoff; Meryl Streep advances

Best argument yesterday came from Jonathan: This one’s close. Meryl Streep and Alice Waters both have 5 letters in the first name and 6 in the last name. Tie. Both are adept at authentic accents. Tie. Meryl has played a international celebrity cook; Alice has never played an actress. Advantage Streep. Waters has taught many ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Using 26,000 diary entries to show ovulatory changes in sexual desire and behavior”

Kevin Lewis points us to this research paper by Ruben Arslan, Katharina Schilling, Tanja Gerlach, and Lars Penke, which begins: Previous research reported ovulatory changes in women’s appearance, mate preferences, extra- and in-pair sexual desire, and behavior, but has been criticized for small sample sizes, inappropriate designs, and undisclosed flexibility in analyses. Examples of such ...

Source: The R-Podcast

Link: Rich Iannone

<p>My background is in programming, data analysis, and data visualization. Much of my current work involves a combination of data acquisition, statistical programming, tools development, and visualizing the results. I love creating software that helps people accomplish things. I regularly update several R package projects (all available on GitHub). One such package is called <a href="http://visualizers.co/diagrammer/">DiagrammeR</a> and it’s great for creating network graphs and performing analyses on the graphs. One of the big draws for open-source development is the collaboration that ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Alice Waters (4) vs. Meryl Streep; LeBron James advances

It’s L’Bron. Only pitch for Mr. Magic was from DanC: guy actually is ultra-tall, plus grand than that non-Cav who had play’d for Miami. But Dalton brings it back for Bron: LeBron James getting to the NBA Final with J.R. Smith as his best supporting cast member is a more preposterous escape than anything David ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Our hypotheses are not just falsifiable; they’re actually false.

Everybody’s talkin bout Popper, Lakatos, etc. I think they’re great. Falsificationist Bayes, all the way, man! But there’s something we need to be careful about. All the statistical hypotheses we ever make are false. That is, if a hypothesis becomes specific enough to make (probabilistic) predictions, we know that with enough data we will be ...

Source: Nan-Hung Hsieh on Nan-Hung Hsieh

Link: Sobol Sensitivity Analysis for PK Model

<div id="TOC"> <ul> <li><a href="#reproducible-analysis"><span class="toc-section-number">1</span> Reproducible analysis</a></li> <li><a href="#methods-and-defined-functions"><span class="toc-section-number">2</span> Methods and defined functions</a></li> <li><a href="#results-1"><span class="toc-section-number">3</span> Results</a></li> <li><a href="#take-away"><span class="toc-section-number">4</span> Take away</a></li> </ul> </div> <p>I find a great <a href="https://github.com/mrgsolve/gallery/blob/master/application/sobol.md">example</a> about performing Sobol sensitivity analysis within ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Fitting multilevel models when the number of groups is small

Matthew Poes writes: I have a question that I think you have answered for me before. There is an argument to be made that HLM should not be performed if a sample is too small (too small level 2 and too small level 1 units). Lot’s of papers written with guidelines on what those should ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: LeBron James (3) vs. Eric Antoine; Ellen DeGeneres advances

Optimum quip Thursday was from Dzhaughn: Mainly, that woman’s tag has a lot of a most common typographical symbol in it, which would amount to a big difficulty back in days of non-digital signs on halls of drama and crowd-laughing. Should that fact boost or cut a probability appraisal of said woman writing an amazing ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Ian McKellen (2) vs. Ellen DeGeneres; Pierre-Simon Laplace advances

The arguments yesterday in favor of Laplace were valid, earnest, and boring. Dalton reinforced the contrast with this comment: Belushi’s demons are a whole lot more interesting than Laplace’s demon. With the latter, you always know what you’re gonna get forever and ever evermore. The former offers heaps of exciting uncertainty, and if you remember ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Wanted: Statistics-related research projects for high school students

So. I sometimes get contacted by high school students who want to work on research projects involving statistics or social science. I’ve supervised several such students, and what works best is when they have their own idea, and I can read what they’ve written and give comments. I’m more of a sounding board than anything ...

Source: Rob J Hyndman

River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data. After identifying end-user needs and defining anomalies, we ranked their importance and selected suitable detection methods.<img src="http://feeds.feedburner. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Pierre-Simon Laplace (2) vs. John Belushi; Pele advances

For yesterday I was leaning toward Penn and Teller based on Bobbie’s reasoning: Penn & Teller not only create interesting, often politically-relevant, magic. They are also visible skeptics who critique the over-claiming of magicians/mystics/paranormal advocates and they use empirical arguments/demonstrations when they speak to debunk pseudoscience. For those of us who care about such things ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The Stan Core Roadmap

Here’s the plan for Stan core development that Bob presented at Stancon last week (that is, back at the end of August, 2018): Part I. Rear-View Mirror Stan 2.18 Released Multi-core Processing has Landed! Multi-Process Parallelism Map Function New Built-in Functions Manuals to HTML Improved Effective Sample Size Foreach Loops Data-qualified Arguments Bug Fixes and ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: Evaluating single cell RNAseq cluster stability

<p>The goal of scclusteval(Single Cell Cluster Evaluation) is to evaluate the single cell clustering stability by boostrapping/subsampling the cells and provide many visualization methods for comparing clusters.</p> <p>for Theory behind the method, see Christian Henning, “Cluster-wise assessment of cluster stability,” Research Report 271, Dept. of Statistical Science, University College London, December 2006)</p> <p>You can find the package at my <a href="https://github.com/crazyhottommy/scclusteval" target="_blank">github</a>.</p> <p><img src="https://divingintogeneticsandgenomics.rbind.io/im ...

Source: Pat's blog (data science)

Link: i3wm: Introducing my Linux desktop setup

<div id="TOC"> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#the-advantages-of-i3wm">The advantages of i3wm</a></li> <li><a href="#the-disadvantages-of-i3wm">The disadvantages of i3wm</a></li> <li><a href="#my-i3wm-setup">My I3wm setup</a></li> </ul> </div> <div id="introduction" class="section level1"> <h1>Introduction</h1> <p>After switching from Windows/macOS to Linux about 1.5 years ago I tried many different Linux distributions and desktop environments. I do not want to go into details why I am running Arch Linux (Manjaro) in this post but rather talk about the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Facial feedback is back

Fritz Strack points us to this new paper, A multi-semester classroom demonstration yields evidence in support of the facial feedback effect, by Abigail Marsh, Shawn Rhoads, and Rebecca Ryan, which begins with some background: The facial feedback effect refers to the influence of unobtrusive manipulations of facial behavior on emotional outcomes. That manipulations inducing or ...

Source: Econometrics and Free Software

Link: Building a shiny app to explore historical newspapers: a step-by-step guide

<div style="text-align:center;"> <p><a href="https://brodriguesco.shinyapps.io/newspapers_app/"> <img src="./img/tf_idf.png" title = "Click here to go the app"></a></p> </div> <div id="introduction" class="section level2"> <h2>Introduction</h2> <p>I started off this year by exploring a world that was unknown to me, the world of historical newspapers. I did not know that historical newspapers data was a thing, and have been thoroughly enjoying myself exploring the different datasets published by the National Library of Luxembourg. You can find the data <a href="https://data.bnl.lu/data/historic ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: New estimates of the effects of public preschool

Tom Daula writes: You blogged about Heckman and the two 1970s preschool studies a year ago here and here. Apparently there are two papers on a long-term study of Tennessee’s preschool program. In case you had an independent interest in the topic, a summary of the most recent paper is here, and the paywalled paper ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Penn and Teller (3) vs. Pele; Alan Turing advances

Sorry, but did Turing ever have a chance of losing to David Blaine?? Forget about it. This contest is supposed to be Turing complete, no? Best argument in favor of the showman was from Jonathan: OK. Here’s a Blaine seminar. He delivers the entire lecture locked inside a trunk with 40 minutes of air. He ...

Source: Statistical Modeling, Causal Inference, and Social Science

I love writing textbooks; you get to explain the things that otherwise never get spelled ...

Source: L. Collado-Torres on L. Collado-Torres

Link: The power of tapping into your community for support

<p>This week the owner of my favorite Mexican restaurant in Baltimore, Rosalyn Vera, got death and arson<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a> threats. I could have been a bystander, but I tapped into my network and asked for help and she has received it. It’s been great to see the power of the community in action.</p> <div id="the-backstory" class="section level3"> <h3>The backstory</h3> <p>So, I use <a href="https://cran.r-project.org/">R</a> and <a href="https://support.bioconductor.org/">Bioconductor</a> for work and I get to witness the warmth and mostly ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Alan Turing (4) vs. David Blaine; Oprah Winfrey advances

Yesterday, Martin Gardner seemed like he’d be sailing in on a gentle wave of nostalgia, but then Dzhaughn brought us back to reality: I cannot believe we are having this conversation. Self-made multi-billionaire philanthropist African American warrior saint v. nerd game writer. Let. me. think. Copies of O per copies of Sci Am? I am ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: If you want to measure differences between groups, measure differences between groups.

Les Carter points to the article, Coming apart? Cultural distances in the United States over time, by Marianne Bertrand and Emir Kamenica, which states: There is a perception that cultural distances are growing, with a particular emphasis on increasing political polarization. . . . Carter writes: I am troubled by the inferences in the paper. ...

Source: Wannabe Rstats-fu

Link: Enhancing gather() and spread() by Using "Bundled" data.frames

<style> table { width: 50%; font-size: 80%; } </style> <p>Last month, I tried to explain <code>gather()</code> and <code>spread()</code> by gt package (<a href="https://yutani.rbind.io/post/gather-and-spread-explained-by-gt/">https://yutani.rbind.io/post/gather-and-spread-explained-by-gt/</a>). But, after I implemented experimental multi-<code>gather()</code> and multi-<code>spread()</code>, I realized that I need a bit different way of explanation… So, please forget the post, and read this with fresh eyes!</p> <h2 id="wait-what-is-multi-gather-and-multi-spread">Wait, what is ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Of multiple comparisons and multilevel models

Kleber Neves writes: I’ve been a long-time reader of your blog, eventually becoming more involved with the “replication crisis” and such (currently, I work with the Brazilian Reproducibility Initiative). Anyway, as I’m now going deeper into statistics, I feel like I still lack some foundational intuitions (I was trained as a half computer scientist/half experimental ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Oprah Winfrey (1) vs. Martin Gardner; Nora Ephron advances

For yesterday’s contest, Steve writes: I’m going with Gauss. Ephron would show up in his office, and say, “I’ve got this great idea for a screenplay”; she’d really lay on the charm and work on her sales pitch. After she’d finish, Gauss would go back to his filing cabinet, aimlessly rifle through his least interesting ...

Source: Blog on rOpenSci - open tools for open science

Link: rOpenSci Software Peer Review: Still Improving

rOpenSci’s suite of packages is comprised of contributions from staff engineers and the wider R community, bringing considerable diversity of skills, expertise and experience to bear on the suite. How do we ensure that every package is held to a high standard? That’s where our software review system comes into play: packages contributed by the community undergo a transparent, constructive, non adversarial and open review process. For that process relying mostly on volunteer work, associate editors manage the incoming flow and ensure progress of submissions; authors create, submit ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Carl Friedrich Gauss (1) vs. Nora Ephron; Voltaire advances

Yesterday I was all set to go with fractal-man, following Zbicyclist’s comment: Why go with a guy whose most famous for something he didn’t say? Let’s go with a guy who can give a short, pithy lecture that can blossom into a whole structure of knowledge as we repeat it! But then I was persuaded ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Stan This Month

So much is going on with Stan that it can be hard to keep track, so we (the Stan project) are starting a monthly update and newsletter. If you want to be included in the monthly mailing list, just type in your email here. Charles Margossian is the editor of Stan This Month and indeed ...

Source: Econometrics and Free Software

Link: Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

<div style="text-align:center;"> <p><a href="https://brodriguesco.shinyapps.io/newspapers_app/"> <img src="./img/tf_idf.png" title = "Click here to go the app"></a></p> </div> <p>I have been playing around with historical newspaper data (see <a href="https://www.brodrigues.co/blog/2019-01-04-newspapers/">here</a> and <a href="https://www.brodrigues.co/blog/2019-01-13-newspapers_mets_alto/">here</a>). I have extracted the data from the largest archive available, as described in the previous blog post, and now created a shiny dashboard where it is possible to visualize the most common words per ...

Source: Peng Zhao on Peng Zhao

Link: rosr News: a Shiny GUI and RStudio addin for choosing and creating sub-projects

<h3 id="brief-intro-and-curriculum">Brief Intro and curriculum</h3> <p>‘rosr’ is an R package for creating reproducible academic project with integrated various academic elements, including data, bibliography, codes, images, manuscripts, dissertations, slides and so on. These elements are well connected so that they can be easily synchronized and updated. Users don’t have to repeat copying and pasting their results and figures from time to time. It will be easy for the scientific researchers to use, even if they are R beginners, or even non-R-users.</p> <h3 ...

Source: About on Zhan, Likan | 战立侃

Link: A better cross-lagged panel model, from Hamaker et al. (2015)

NOTE: This post is copied from John C. Flournoy’s post A better cross-lagged panel model, from Hamaker et al. (2015) with minor modifications. Copyright belongs to the orginal author. This walk-through explains, briefly, why and how to run a RI-CLPM in R. Critique of cross-lagged pannel models This post summarizes critiques of the traditional cross-lagged panel model (CLPM), and an improved model by Hamaker, Kuiper, and Grasman Hamaker, Kuiper, & Grasman ...

Source: Blog on rOpenSci - open tools for open science

Link: Announcing new software peer review editors: Melina Vidoni and Brooke Anderson

We are pleased to welcome Brooke Anderson and Melina Vidoni to our team of Associate Editors for rOpenSci Software Peer Review. They join Scott Chamberlain, Anna Krystalli, Lincoln Mullen, Karthik Ram, Noam Ross and Maëlle Salmon. With the addition of Brooke and Melina, our editorial board now includes four women and four men, located in North America, South America and Europe. Our open Software Peer Review system for community-contributed R tools is a key component of our mission to create technical infrastructure that lowers barriers to working with data sources on the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Principal Stratification on a Latent Variable (fitting a multilevel model using Stan)

Adam Sales points to this article with John Pane on principal stratification on a latent variable, and writes: Besides the fact that the paper uses Stan, and it’s about principal stratification, which you just blogged about, I thought you might like it because of its central methodological contribution. We had been trying to use computer ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Voltaire (4) vs. Benoit Mandelbrot; Veronica Geng advances

Yesterday‘s contest was surprisingly tough. I thought of Santa-man and the inventor of the Monte Carlo method as both being strong candidates—but the best comments on both were negative. Phil argued convincingly that there’s no point in inviting Sedaris to speak at Columbia as there are lots of other opportunities to hear the guy, and ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Autodiff! (for the C++ jockeys in the audience)

Here’s a cool thread from Bob Carpenter on the Stan forums, “A new continuation-based autodiff by refactoring,” on the inner workings of Stan. Enjoy. P.S. Just for laffs, see this post from 2010 which is the earliest mention of autodiff I could find on the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: David Sedaris (3) vs. Stanislaw Ulam; George H. W. Bush advances

Best comment yesterday came from J Storrs Hall: I have eaten the money that was in the piggybank which you were probably saving for retirement Forgive me it was delicious so sweet read my lips But it’s not clear if this is an endorsement of Bush, for his economic policies, or Williams, for his poetry. ...

Source: Blog on Credibly Curious

Link: Introducing 'R-Miss-Tastic'

<p>Missing data has been something I’ve been working on in my research since I started my PhD in 2013. This was largely because I was clued in to the whole <em>idea</em> of missing data in my fourth year stats unit in Psychology where we had a few lectures on missing data. Data was always super clean for all of our pracs in SPSS. So, it was a pretty profound moment for me, missing data was the start of the whole idea that data could be <em>not clean</em>.</p> <p>When I started my PhD I had a pretty huge amount of missing data in the first dataset I looked at, which led me to write my ...

Source: Blog on rOpenSci - open tools for open science

Link: Interacting with The Demographic and Health Surveys (DHS) Program data

There seem to be a lot of ways to write about your R package, and rather than have to decide on what to focus on I thought I’d write a little bit about everything. To begin with I thought it best to describe what problem rdhs tries to solve, why it was developed and how I came to be involved in this project. I then give a brief overview of what the package can do, before continuing to describe how writing my first proper package and the rOpenSci review process ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: George H. W. Bush (2) vs. William Carlos Willams; Mel Brooks advances

All of yesterday’s comments favored Mr. Blazing Saddles. Jeff had a good statistics-themed comment: Mel Brooks created Get Smart (along with Buck Henry), which suggests a number of seminar topics of interest to readers of this blog. “Missed It By That Much: Why Predictive Models Don’t Always Pick the Winner” “Sorry About That, Chief: Unconscious ...

Source: Statistical Modeling, Causal Inference, and Social Science

Mark Tuttle points us to this project by Martijn Schuemie and Patrick Ryan: Large-Scale Population-Level Evidence Generation Objective: Generate evidence for the comparative effectiveness for each pairwise comparison of depression treatments for a set of outcomes of interest. Rationale: In current practice, most comparative effectiveness questions are answered individually in a study per question. This ...

Source: Peng Zhao on Peng Zhao

Link: rosr: Create academic R markdown projects for open science and reproducible research

<h3 id="what-is-the-project-about">What is the project about?</h3> <p>Weeks ago, I gave a short training course at one of the top institutes in the world. The course was called ‘<em>R, Open Science and Reproducible Research</em>’, abbreviated as <em>ROSS</em>. It was given to the academic researchers who were interested in R and reproducible research. The R markdown family, including ‘rticles’, ‘bookdown’, ‘xaringan’ etc., were introduced. The audience were excited in the course. They felt, however, confused after the course by using these ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: How to make a transcript to gene mapping file

<p>I need a transcript to gene mapping file for <code>Salmon</code>. I am aware of annotation <code>bioconductor</code> packages that can do this job. However, I was working on a species which does not have the annotation in a package format (I am going to use Drosphila as an example for this blog post). I had to go and got the gtf file and made such a file from scratch.</p> <p>Please read the <a href="https://useast.ensembl.org/info/website/upload/gff.html">specifications</a> of those two file formats.</p> <div id="download-drosophila-gtf-file-from-ensemble-and-gff-file-from-ncbi" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The bullshit asymmetry principle

Jordan Anaya writes, “We talk about this concept a lot, I didn’t realize there was a name for it.” From the wikipedia entry: Publicly formulated the first time in January 2013 by Alberto Brandolini, an Italian programmer, the bullshit asymmetry principle (also known as Brandolini’s law) states that: The amount of energy needed to refute ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Chris Christie (2) vs. Mel Brooks; Boris Karloff advances

We had some good arguments in favor of Karloff. If I had to choose just one, it would be from J Storrs Hall, who writes: Well, the main problem with Anastasia is … she’s dead. However, we can be relatively certain that 31 or so pretenders would show up in her place. One of them ...

Source: Rbind Support

Link: EnTyrely Too Much

Looking at the fantastic pages put together by everyone else leaves me agog at the great company here on rbind.io! My own efforts are sporadic and mostly serve as a vehicle for my own learning about the internet. And oh! what a learning process that is. Painfully slow, with many steps sideways and back. Where are the tags?!1 Why are the chapter links in my bookdown book suddenly ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Boris Karloff (3) vs. Anastasia Romanoff; Lance Armstrong advances

I’m still feeling bad about my ruling the other day. . . . I mean, sure, Robin Williams doing Elmer Fudd doing Bruce Springsteen was amazing, but Veronica Geng—she was one of a kind. Anyway, yesterday’s winner is another dark horse. There’s little doubt in my mind that Bobby Fischer, if in a good mood, ...

Source: Statistical Modeling, Causal Inference, and Social Science

OK, you all remember the story, arguably the single event that sent the replication crisis into high gear: the decision of the Journal of Personality and Social Psychology to publish a paper on extra-sensory perception (ESP) by Cornell professor Daryl Bem and the subsequent discussion of this research in the New York Times and elsewhere. ...

Source: Peng Zhao on Peng Zhao

Link: [New Features on beginr] Automatically generate a self-contained package

<p>Most R beginners think that developing an R package is a mission impossible. It is not true. With the new function <code>packr()</code> , users can create a user-defined useful R packages easily. They can specify in <code>packr()</code>a group of packages (e.g. foo_1, foo_2, foo_x) which they often use and the new package name, say <code>foobar</code>, then a new package called <code>foobar</code> will be generated.</p> <p>When loading the <code>foobar</code> function, the package group, i.e. foo_1, foo_2, and foo_x are loaded simultaneously. Moreover, a few functions are available in the ...

Source: Peng Zhao on Peng Zhao

Link: mindr v.1.2.0 released: universal function and directory tree

<p>The new version 1.2.0 mainly brings four exciting features.</p> <ol> <li>Suggested by <a href="https://github.com/yihui" target="_blank">@yihui</a>, an argument of ‘method’ was added to each main functions. Users can choose the method of regular expression or pandoc to pick out the outline of a markdown file.</li> <li>Suggested by <a href="https://github.com/pzhaonet/mindr/issues/11" target="_blank">the issue from the users</a>, now mindr can save the mind map as an HTML widget file and share the mind map on web.</li> <li>A new function <code>tree()</code> can create a mind map ...

Source: Peng Zhao on Peng Zhao

Link: rmd: Easily Install, Load and Explore the R Markdown Family

<h3 id="introduction">Introduction</h3> <p>Since ‘rmarkdown’ and ‘knitr’, more and more members (rticles, bookdown, mindr…) have been joining the R Markdown family. Users can write elegant reproducible documents, manuscripts, dissertations, books, blog posts, posters, and slides within the framework of R markdown. It is exciting, while the installation and maintenance becomes annoying. In the meanwhile, there are plenty of useful RStudio addins, which equip the RStudio IDE as a powerful markdown editor. However, these little tools are often hidden somewhere ...

Source: Statistical Modeling, Causal Inference, and Social Science

Forget Pizzagate. This is the stuff we really care about. John Ioannidis writes: Assuming the meta-analyzed evidence from cohort studies represents life span–long causal associations, for a baseline life expectancy of 80 years, eating 12 hazelnuts daily (1 oz) would prolong life by 12 years (ie, 1 year per hazelnut) [1], drinking 3 cups of ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Bobby Fischer (4) vs. Lance Armstrong; Riad Sattouf advances

Our best argument from the last one comes from Bobbie: I used to believe that Euler could draw circles around anyone but after some investigation I now believe that Sattouf could draw anything around anyone (and write about it beautifully as well). And today we have a battle of two GOATs, with Fischer seeded fourth ...

Source: Rob J Hyndman

Link: Advice to PhD applicants

For students who are interested in doing a PhD at Monash under my supervision. First, check that you satisfy the following criteria: You must have completed a degree in statistics that involved some research component (e.g., an honours or masters thesis). A degree in computer science, mathematics or econometrics might be acceptable if it contained a substantial amount of statistics. A degree in any other field is not sufficient background to work with me.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/78PCOYK4UTM" height="1" width="1" ...

Source: Rob J Hyndman

Link: Feature-based forecasting algorithms for large collections of time series

Talk given at ACEMS workshop on “Statistical Methods for the Analysis of High-Dimensional and Massive Data Sets” I will discuss two algorithms used in forecasting large collections of diverse time series. Each of these algorithms uses a meta-learning approach with vectors of features computed from time series to guide the way the forecasts are computed. In FFORMS (Feature-based FORecast Model Selection), we use a random forest classifier to identify the best forecasting method using only time series features.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/uohe4ANFd ...

Source: Statistical Modeling, Causal Inference, and Social Science

This comes up from time to time. We were discussing a published statistical blunder, an innumerate overconfident claim arising from blind faith that a crude regression analysis would control for various differences between groups. Martha made the following useful comment: Another factor that I [Martha] believe tends to promote the kind of thing we’re talking ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Transforming parameters in a simple time-series model; debugging the Jacobian

So. This one is pretty simple. But the general idea could be useful to some of you. So here goes. We were fitting a model with an autocorrelation parameter, rho, which was constrained to be between 0 and 1. The model looks like this: eta_t ~ normal(rho*eta_{t-1}, sigma_res), for t = 2, 3, ... T ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 第六届靠谱厮奖：朱雪宁

<p>距<a href="https://yihui.name/cn/2018/03/copss-5/">上次</a>靠谱厮颁奖典礼又已过去八九个月，本来这奖早该颁了，但我太话痨，博客里写了太多别的话题，把这个话题压到今天才写。其实现在写这篇颁奖文章时机已经不太合适，因为两个多月前我没料到雪宁在我给他们写的<a href="https://mp.weixin.qq.com/s?__biz=MzA5MjEyMTYwMg==&mid=2650243111&idx=1&sn=4eedb133c2ffb76b207c10c7c900fc42">推荐序前面又加了个注</a>，这么一来搞得好像我们在开表扬与自我表扬大会。</p> <p>我已经不记得雪宁是哪年从涛妹手里接过锅、走马上任统计之都主站编辑的，感觉好像有 ...

Source: Homepage on Yihui Xie | 谢益辉

Link: Back from rstudio::conf 2019

<p>There was an obvious reason for me being quiet in my blog this month: the rstudio::conf(2019) took place last week. I (co-)taught a workshop and gave a talk there, so I needed a lot of time to make the preparation. In this post, I want to share some observations and experiences. Note that this is not a summary of the conference. Many others have blogged about the conference, and you can find those links in <a href="https://github.com/kbroman/RStudioConf2019Slides">Karl’s Github repo</a>.</p> <h2 id="some-people-i-met">Some people I met</h2> <p>Usually I don’t have much ...

Source: Wannabe Rstats-fu

Link: A Survival Guide To Install rlang From GitHub On Windows

<p>I don’t have any strong feelings about OSs. They are just tools. I had been a Mac user for 10+ years since I was 10, and now I’m using Windows for no reason. All OSs have their pros and cons. For example, I like Mac, but, in the late 90s, I was very disappointed at Mac because it didn’t have fonts to display <a href="https://en.wikipedia.org/wiki/Shift_JIS_art">Shift_JIS art</a> nicely.</p> <p>Anyway, I’m using Windows and I need to survive. Here’s an error I often see when I try to install <a href="https://rlang.r-lib.org/">rlang</a> package from GitHub by ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: No, I don’t buy that claim that Fox news is shifting the vote by 6 percentage points

Tyler Cowen writes: This is only one estimate, from Gregory J. Martin and Ali Yurukoglu, but nonetheless it is backed by a plausible identification stragegy and this is very interesting research: We find that in a hypothetical world without Fox News but with no other changes, the Republican vote share in the 2000 election would ...

Source: Statistical Modeling, Causal Inference, and Social Science

This is just one more sexual harassment story, newsworthy only in the man-bites-dog sense. But it reminded me of something that gets discussed from time to time, which is that we should stop using letters of recommendation to evaluate candidates for jobs or scholarships. Here’s a list of hoops that people recommend you jump through. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Riad Sattouf (1) vs Leonhard Euler; Springsteen advances

I really wanted to go with Geng, partly because I’m a big fan of hers and partly because of Dzhaughn’s Geng-tribute recommendation: In the way that many search their memories for significant aromas when they read Proust, re-reading Geng led me to recollect my youth in Speech Club, of weekends of interpretive readings and arguments ...

Source: Wannabe Rstats-fu

Link: gather() and spread() Explained By gt

<style> table { width: 50%; font-size: 80%; } </style> <p>This is episode 0 of my long adventure to <a href="https://github.com/tidyverse/tidyr/issues/149">multi-spread</a> and <a href="https://github.com/tidyverse/tidyr/issues/150">multi-gather</a> (this is my homework I got at <a href="https://www.tidyverse.org/articles/2018/11/tidyverse-developer-day-2019/">the tidyverse developer day</a>…). This post might seem to introduce the different semantics from the current tidyr’s one, but it’s probably just because my idea is still vague. So, I really appreciate any ...

Source: Statistical Modeling, Causal Inference, and Social Science

We’ve been here before. Back in 2002, political scientists Chris Achen and Larry Bartels presented a paper “Blind Retrospection – Electoral Responses to Drought, Flu and Shark Attacks.” Here’s a 2012 version in which the authors trace “the electoral impact of a clearly random event—a dramatic series of shark attacks in New Jersey in 1916” ...

Source: The R-Podcast

Link: Episode 26: The Podcast Trifecta (rstudio::conf 2019)

<h3 id="conversation-with-hilary-parker-and-nick-tierney">Conversation with Hilary Parker and Nick Tierney</h3> <ul> <li>Credibly Curious podcast: <a href="https://soundcloud.com/crediblycurious">soundcloud.com/crediblycurious</a></li> <li>Not So Standard Deviations podcast: <a href="http://nssdeviations.com/">nssdeviations.com</a></li> <li>Apache Arrow: <a href="https://arrow.apache.org/">arrow.apache.org</a></li> <li>Tidy Evaluation online book: <a href="https://tidyeval.tidyverse.org/">tidyeval.tidyverse.org</a></li> <li>Tidy models family of packages: <a href="https://github.com/tidymodel ...

Source: Blog on rOpenSci - open tools for open science

Link: wateRinfo - Downloading tidal data to understand the behaviour of a migrating eel

Do you know what that sound is, Highness? Those are the Shrieking Eels — if you don’t believe me, just wait. They always grow louder when they’re about to feed on human flesh. If you swim back now, I promise, no harm will come to you. I doubt you will get such an offer from the Eels. Vizzini, The Princess Bride European eels (Anguilla anguilla) have it ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Darrell Huff (4) vs. Monty Python; Frank Sinatra advances

In yesterday’s battle of the Jerseys, Jonathan offered this comment: Sinatra is an anagram of both artisan and tsarina. Apgar has no English anagram. Virginia is from New Jersey. Sounds confusing. And then we got this from Dzhaughn: I got as far as “Nancy’s ancestor,” and then a Youtube clip of Joey Bishop told me, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Science as an intellectual “safe space”? How to do it right.

I don’t recall hearing the term “safe space” until recently, but now it seems to be used all the time, by both the left and the right, to describe an environment where people can feel free to express opinions that might be unpopular in a larger community, without fear of criticism or contradiction. Sometimes a ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Frank Sinatra (3) vs. Virginia Apgar; Julia Child advances

My favorite comment from yesterday came from Ethan, who picked up on the public TV/radio connection and rated our two candidate speakers on their fundraising abilities. Very appropriate for the university—I find myself spending more and more time raising money for Stan, myself. A few commenters picked up on Child’s military experience. I like the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The butterfly effect: It’s not what you think it is.

John Cook writes: The butterfly effect is the semi-serious claim that a butterfly flapping its wings can cause a tornado half way around the world. It’s a poetic way of saying that some systems show sensitive dependence on initial conditions, that the slightest change now can make an enormous difference later . . . Once ...

Source: The R-Podcast

Link: Hilary Parker

<p>Hilary Parker is a data scientist at <a href="https://stitchfix.com">StitchFix</a> as well as co-host of the excellent <a href="http://nssdeviations.com/">Not So Standard Deviations</a> podcast. She is an R and statistics enthusiast determined to bring rigor to analysis wherever she goes. At Stitch Fix she works on teasing apart correlation from causation, with a strong dose of reproducibility. Formerly a Senior Data Analyst at Etsy, she received a PhD in Biostatistics from the Johns Hopkins Bloomberg School of Public ...

Source: The R-Podcast

Link: Nick Tierney

<p>Nick Tierney is a Research Fellow in statistics at Monash University working with Rob Hyndman and Di Cook, as well as co-host of the insightful <a href="https://soundcloud.com/crediblycurious">Credibly Curious</a> podcast. Nick earned his PhD from Queensland University of Technology in 2018. His research aims to improve an overall data analysis workflow, especially towards understand a data set effectively with his <a href="http://visdat.njtierney.com/">visdat</a> package in addition to exploring and modeling missing data with his <a href="http://naniar.njtierney.com/">naniar</a> package. ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 少年不识成功味

<p>不记得是高三还是大一的时候我看到爱默生的这段关于成功的定义，颇为所动，所以当时背了下来（现在当然已经背不了了）：</p> <blockquote> <p>To laugh often and much; To win the respect of intelligent people and the affection of children; To earn the appreciation of honest critics and endure the betrayal of false friends; To appreciate beauty, to find the best in others; To leave the world a bit better, whether by a healthy child, a garden patch, or a redeemed ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Julia Child (2) vs. Ira Glass; Dorothy Parker advances

Yesterday we got this argument from Manuel in favor of Biles: After suffering so many bad gymnastics (mathematical, logical, statistical, you name it) at seminars, to have some performed by a true champion would be a welcome change. But Parker takes it away, based on this formidable contribution of Dzhaughn: Things I Have Learned From ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Moneyball for evaluating community colleges

From an interesting statistics-laden piece by “Dean Dad”: Far more community college students transfer prior to completing the Associate’s degree than actually complete first. According to a new report from the National Student Clearinghouse Research Center, about 350,000 transfer before completion, compared to about 60,000 who complete first. That matters in several ways. Most basically, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Dorothy Parker (2) vs. Simone Biles; Liebling advances

I was surprised to see so little action in the comments yesterday. Sure, Liebling’s an obscure figure—I guess at this point he’d be called a “cult writer,” and I just happen to be part of the cult, fan as I am of mid-twentieth-century magazine writing—but I’d’ve thought Bourdain would’ve aroused more interest. Anyway, the best ...

Source: Statistical Modeling, Causal Inference, and Social Science

Daniel Lakeland points us to this news article by David Hambling from 2014, entitled “Nasa validates ‘impossible’ space drive.” Here’s Hambling: Nasa is a major player in space science, so when a team from the agency this week presents evidence that “impossible” microwave thrusters seem to work, something strange is definitely going on. Either the ...

Source: Rob J Hyndman

Link: forecast 8.5

The latest minor release of the forecast package has now been approved on CRAN and should be available in the next day or so. Version 8.5 contains the following new features Updated tsCV() to handle exogenous regressors. Reimplemented naive(), snaive(), rwf() for substantial speed improvements. Added support for passing arguments to auto.arima() unit root tests. Improved auto.arima() stepwise search algorithm (some neighbouring models were missed previously). We haven’t done a major release for two years, and there is unlikely to be another one now.<img src="http://feeds.feedburner.com/~r/Prof ...

Source: Simply Statistics

Link: The Tentpoles of Data Science

<p>What makes for a good data scientist? This is a question I asked <a href="https://simplystatistics.org/2012/05/07/how-do-you-know-if-someone-is-great-at-data-analysis/">a long time ago</a> and am still trying to figure out the answer. Seven years ago, I wrote:</p> <blockquote> <p>I was thinking about the people who I think are really good at data analysis and it occurred to me that they were all people I knew. So I started thinking about people that I don’t know (and there are many) but are equally good at data analysis. This turned out to be much harder than I thought. And I’m sure it’s ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Anthony Bourdain (3) vs. A. J. Liebling; Steve Martin advances

Yesterday‘s decision was pretty easy, as almost all the commenters talked about Steve Martin, pro and con. Letterman was pretty much out of the picture. Indeed, the best argument in favor of Letterman came from Jonathan, who wrote: I’ll go with Letterman because he looks like he could use the work. Conversely, the strongest argument ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Google on Responsible AI Practices

Great and beautifully written advice for any data science setting: Google. Responsible AI Practices. ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A ladder of responses to criticism, from the most responsible to the most destructive

In a recent discussion thread, I mentioned how I’m feeling charitable toward David Brooks, Michael Barone, and various others whose work I’ve criticized over the years, because their responses have been so civilized and moderate. Consider the following range of responses to an outsider pointing out an error in your published work: 1. Look into ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Steve Martin (4) vs. David Letterman; Serena Williams advances

Yesterday‘s matchup featured a food writer vs. a tennis player, two professions that are not known for public speaking. The best arguments came in the very first two comments. Jeff wrote: Fisher’s first book was “Serve It Forth,” which seems like good advice in tennis, as well. So, you’d get a two-fer there. That was ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A thought on the hot hand in basketball and the relevance of defense

I was reading about basketball the other day and a thought came to me about the hot hand . . . There are a bunch of NBA players who could shoot with great accuracy even from long distance if they’re not guarded, right? For example, what would Steph Curry’s success rate be for 30-footers if ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Causal inference data challenge!

Susan Gruber, Geneviève Lefebvre, Tibor Schuster, and Alexandre Piché write: The ACIC 2019 Data Challenge is Live! Datasets are available for download (no registration required) at https://sites.google.com/view/ACIC2019DataChallenge/data-challenge (bottom of the page). Check out the FAQ at https://sites.google.com/view/ACIC2019DataChallenge/faq The deadline for submitting results is April 15, 2019. The fourth Causal Inference Data Challenge is taking place ...

Source: Statistical Modeling, Causal Inference, and Social Science

In a discussion of our stacking paper, the point came up that LOO (leave-one-out cross validation) requires a partitioning of data—you can only “leave one out” if you define what “one” is. It is sometimes said that LOO “relies on the data-exchangeability assumption,” but I don’t think that’s quite the right way to put it, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: M. F. K. Fisher (1) vs. Serena Williams; Oscar Wilde advances

The best case yesterday was made by Manuel: Leave Joe Pesci at home alone. Wilde’s jokes may be very old, but he can use slides from The PowerPoint of Dorian Gray. As Martha put it, not great, but the best so far in this thread. On the other side, Jonathan wrote, “I’d definitely rather hear ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Does Harvard discriminate against Asian Americans in college admissions?

Sharad Goel, Daniel Ho and I looked into the question, in response to a recent lawsuit. We wrote something for the Boston Review: What Statistics Can’t Tell Us in the Fight over Affirmative Action at Harvard Asian Americans and Academics “Distinguishing Excellences” Adjusting and Over-Adjusting for Differences The Evolving Meaning of Merit Character and Bias ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Oscar Wilde (1) vs. Joe Pesci; the Japanese dude who won the hot dog eating contest advances

Raghuveer gave a good argument yesterday: “The hot dog guy would eat all the pre-seminar cookies, so that’s a definite no.” But this was defeated by the best recommendation we’ve ever had in the history of the Greatest Seminar Speaker contest, from Jeff: Garbage In, Garbage Out: Mass Consumption and Its Aftermath Takeru Kobayashi Note: ...

Source: Blog on rOpenSci - open tools for open science

Link: rOpenSci's new Code of Conduct

We are pleased to announce the release of our new Code of Conduct. rOpenSci’s community is our best asset and it’s important that we put strong mechanisms in place before we have to act on a report. As before, our Code applies equally to members of the rOpenSci team and to anyone from the community at large participating in in-person or online activities. What’s new? A Code of Conduct Committee: Stefanie Butland (rOpenSci Community Manager), Scott Chamberlain (rOpenSci Co-founder and Technical Lead) and Kara Woo (independent community ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Storytelling: What’s it good for?

A story can be an effective way to send a message. Anna Clemens explains: Why are stories so powerful? To answer this, we have to go back at least 100,000 years. This is when humans started to speak. For the following roughly 94,000 years, we could only use spoken words to communicate. Stories helped us ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Carol Burnett (4) vs. the Japanese dude who won the hot dog eating contest; Albert Brooks advances

Yesterday was a tough matchup, but ultimately John “von” Neumann was no match for a very witty Albert Einstein. The deciding argument, from Martha: I’d like to see Von Neumann given four parameters and making an elephant wiggle his trunk. And if he could do it, there would be the chance that Jim Thorpe could ...

Source: Econometrics and Free Software

Link: Making sense of the METS and ALTO XML standards

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=V1qpvpH26fo"> <img src="./img/union.png" title = "The 19th century was a tough place"></a></p> </div> <p>Last week I wrote a <a href="https://www.brodrigues.co/blog/2019-01-04-newspapers/">blog post</a> where I analyzed one year of newspapers ads from 19th century newspapers. The data is made available by the <a href="https://data.bnl.lu/data/historical-newspapers/">national library of Luxembourg</a>. In this blog post, which is part 1 of a 2 part series, I extract data from the 257gb archive, which contains 10 years ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: Understanding p value, multiple comparisons, FDR and q value

<p>This was an <a href="http://crazyhottommy.blogspot.com/2015/03/understanding-p-value-multiple.html">old post</a> I wrote 3 years ago after I took HarvardX: <a href="https://courses.edx.org/courses/course-v1:HarvardX+PH525.3x+1T2018/0b42cffa7c6e4c559bf74f93fb864a59/">PH525.3x Advanced Statistics for the Life Sciences on edx</a> taught by <a href="http://rafalab.github.io/">Rafael Irizarry</a>. It is still one of the best courses to get you started using R for genomics. I am very thankful to have those high quality classes available to me when I started to learn. I am reposting it here using ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Coursera course on causal inference from Michael Sobel at Columbia

Here’s the description: This course offers a rigorous mathematical survey of causal inference at the Master’s level. Inferences about causation are of great importance in science, medicine, policy, and business. This course provides an introduction to the statistical literature on causal inference that has emerged in the last 35-40 years and that has revolutionized the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: John van Neumann (3) vs. Albert Brooks; Paul Erdos advances

We had some good arguments on both sides yesterday. For Erdos, from Diana Senechal: From an environmental perspective, Erdos is the better choice; his surname is an adjectival form of the Hungarian erdő, “forest,” whereas “Carson” clearly means “son of a car.” Granted, the son of a car, being rebellious and all, might prove especially ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: How post-hoc power calculation is like a shit sandwich

Damn. This story makes me so frustrated I can’t even laugh. I can only cry. Here’s the background. A few months ago, Aleksi Reito (who sent me the adorable picture above) pointed me to a short article by Yanik Bababekov, Sahael Stapleton, Jessica Mueller, Zhi Fong, and David Chang in Annals of Surgery, “A Proposal ...

Source: Jay's Notes

Link: Coping with worst loss at home

<p>It’s been a tough weekend, not least because Tar Heels lost at home. Sometimes I feel like I’m vested too much in the outcome of the Heel’s basketball games, and if my emotional rollercoaster the rest of the weekend gives any clue, I probably am, just a litttle bit. I tried to shake it off, yet it wasn’t particularly easy, not only because we lost at home, but because the loss was the worst one at home under Roy Williams. So I turn to blogging, after 100+ days of hiatus, as a last resort to restoring my inner peace and calm, before the new workday begins.</p> <p>First I look at a couple ...

Source: Homepage on Liechi | 張列弛

Link: 李白的诗

写完《静夜思》的版本变迁后，觉得意犹未尽。除了颠覆童年的原版外，其实《静夜思》本身没有太多好说的。《静夜思》自然是好诗，只是对我们来说太过熟 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Johnny Carson (2) vs. Paul Erdos; Babe Didrikson Zaharias advances

OK, our last matchup wasn’t close. Adam Schiff (unseeded in the “people whose name ends in f” category) had the misfortune to go against the juggernaut that was Babe Didrikson Zaharias (seeded #2 in the GOATs category). Committee chair or not, the poor guy never had a chance. As Diana Senechal wrote, “From an existential ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: This is one offer I can refuse

OK, so this came in the email today: Dear Contributor, ADVANCES IN POLITICAL METHODOLOGY [978 1 78347 485 1] Regular price: $455.00 Special Contributor price: $113.75 (plus shipping) We are pleased to announce the publication of the above title. Due to the limited print run of this collection and the high number of contributing authors, ...

Source: Homepage on Liechi | 張列弛

Link: 床前看月光

如果一个中国人只听说过一首唐诗，那么这首唐诗最有可能是哪首？李白的《静夜思》或许是很多人心目中的答案吧。 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Becker on Bohm on the important role of stories in science

Tyler Matta writes: During your talk last week, you spoke about the role of stories in scientific theory. On page 104 of What Is Real: The Unfinished Quest for the Meaning of Quantum Physics, Adam Becker talks about stories and scientific theory in relation to alternative conceptions of quantum theory, particularly between Bohm’s pilot-wave interpretation ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: New blog hosting!

Hi all. We’ve been having some problems with the blog caching, so that people were seeing day-old versions of the posts and comments. We moved to a new host and a new address, https://statmodeling.stat.columbia.edu, and all should be better. Still a couple glitches, though. Right now it doesn’t seem to be possible to comment. We ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: NYC Meetup Thursday: Under the hood: Stan’s library, language, and algorithms

I (Bob, not Andrew!) will be doing a meetup talk this coming Thursday in New York City. Here’s the link with registration and location and time details (summary: pizza unboxing at 6:30 pm in SoHo): Bayesian Data Analysis Meetup: Under the hood: Stan’s library, language, and algorithms After summarizing what Stan does, this talk will ...

Source: Wannabe Rstats-fu

Link: A Tip to Debug ggplot2

<p>Since <a href="https://www.tidyverse.org/articles/2018/11/tidyverse-developer-day-2019/">the tidyverse developer day</a> is near, I share my very very secret technique to debug ggplot2. Though this is a very small thing, hope this helps someone a bit.</p> <h2 id="ggplot2-is-unbreakable">ggplot2 is unbreakable!</h2> <p>You might want to <code>debug()</code> the methods of <code>Geom</code>s or <code>Stat</code>s.</p> <pre><code class="language-r">debug(GeomPoint$draw_panel) </code></pre> <p><del>But, this is not effective because the <code>geom_point()</code> generates different instances, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Babe Didrikson Zaharias (2) vs. Adam Schiff; Sid Caesar advances

<p>And our noontime competition continues . . . We had some good arguments on both sides yesterday. Jonathan writes: In my experience, comedians are great when they’re on-stage and morose and unappealing off-stage. Sullivan, on the other hand, was morose and unappealing on-stage, and witty and charming off-stage, or so I’ve heard. This comes down, […]</p> <p>The post <a rel="nofollow" href="http://statmodeling.stat.columbia.edu/2019/01/10/babe-didrikson-zaharias-2-vs-adam-schiff-sid-caesar-advances/">Babe Didrikson Zaharias (2) vs. Adam Schiff; Sid Caesar advances</a> appeared first on ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: MRP (multilevel regression and poststratification; Mister P): Clearing up misunderstandings about

<p>Someone pointed me to this thread where I noticed some issues I’d like to clear up: David Shor: “MRP itself is like, a 2009-era methodology.” Nope. The first paper on MRP was from 1997. And, even then, the component pieces were not new: we were just basically combining two existing ideas from survey sampling: regression […]</p> <p>The post <a rel="nofollow" href="http://statmodeling.stat.columbia.edu/2019/01/10/mrp-multilevel-regression-poststratification-mister-p-clearing-misunderstandings/">MRP (multilevel regression and poststratification; Mister P): Clearing up ...

Source: Andriy V. Koval on Andriy V. Koval

Link: Time to gather, time to spread. Part 1.

<!-- These two chunks should be added in the beginning of every .Rmd that you want to source an .R script --> <!-- The 1st mandatory chunck --> <!-- Set the working directory to the repository's base directory --> <!-- The 2nd mandatory chunck --> <!-- Set the report-wide options, and point to the external code file. --> <div id="tldr" class="section level2"> <h2>TL;DR</h2> <p>A quick demonstration of transforming data from wide to long format using<code>tidyr::gather()</code> and <code>tidyr::spread()</code> functions. Fully reproducible example with a simple dataset.</p> <p>Packages used in ...

Source: Blog on rOpenSci - open tools for open science

Link: vitae: Dynamic CVs with R Markdown

Why vitae? The process of maintaining a CV can be tedious. It’s a task I often forget about - that is until someone requests it and I find that my latest is woefully out of date. To make matters worse, these professional updates often need repeating across variety of sites (such as ORCID and LinkedIn). Having seen several CVs put together into an R Markdown document (including my own, featuring a few quick and dirty hacks to make it work), the need for an R package was ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Ed Sullivan (3) vs. Sid Caesar; DJ Jazzy Jeff advances

<p>Yesterday’s battle (Philip Roth vs. DJ Jazzy Jeff) was pretty low-key. It seems that this blog isn’t packed with fans of ethnic literature or hip-hop. Nobody in comments even picked up on my use of the line, “Does anyone know these people? Do they exist or are they spooks?” Isaac gave a good argument in […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/09/ed-sullivan-3-vs-sid-caesar-dj-jazzy-jeff-advances/">Ed Sullivan (3) vs. Sid Caesar; DJ Jazzy Jeff advances</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">St ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Reproducibility and Stan

<p>Aki prepared these slides which cover a series of topics, starting with notebooks, open code, and reproducibility of code in R and Stan; then simulation-based calibration of algorithms; then model averaging and prediction. Lots to think about here: there are many aspects to reproducible analysis and computation in statistics.</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/09/reproducibility-and-stan/">Reproducibility and Stan</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Simply Statistics

Link: How Data Scientists Think - A Mini Case Study

<p>In episode 71 of <a href="http://nssdeviations.com/">Not So Standard Deviations</a>, Hilary Parker and I inaugurated our first “Data Science Design Challenge” segment where we discussed how we would solve a given problem using data science. The idea with calling it a “design challenge” was to contrast it with common “hackathon” type models where you are presented with an already-collected dataset and then challenged to find something interesting in the data. Here, we wanted to start with a problem and then talk about how data might be collected and analyzed to address the problem. While ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Did she really live 122 years?

<p>Even more famous than “the Japanese dude who won the hot dog eating contest” is “the French lady who lived to be 122 years old.” But did she really? Paul Campos points us to this post, where he writes: Here’s a statistical series, laying out various points along the 100 longest known durations of a […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/07/did-she-really-live-122-years/">Did she really live 122 years?</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “The Book of Why” by Pearl and Mackenzie

<p>Judea Pearl and Dana Mackenzie sent me a copy of their new book, “The book of why: The new science of cause and effect.” There are some things I don’t like about their book, and I’ll get to that, but I want to start with a central point of theirs with which I agree strongly. […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/08/book-pearl-mackenzie/">“The Book of Why” by Pearl and Mackenzie</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Philip Roth (4) vs. DJ Jazzy Jeff; Jim Thorpe advances

<p>For yesterday’s battle (Jim Thorpe vs. John Oliver), I’ll have to go with Thorpe. We got a couple arguments in Oliver’s favor—we’d get to hear him say “Whot?”, and he’s English—but for Thorpe we heard a lot more, including his uniqueness as greatest athlete of all time, and that we could save money on the […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/08/philip-roth-4-vs-dj-jazzy-jeff-jim-thorpe-advances/">Philip Roth (4) vs. DJ Jazzy Jeff; Jim Thorpe advances</a> appeared first on <a rel="nofollow" ...

Source: Blog on rOpenSci - open tools for open science

Link: Continuing to Grow Community Together at ozunconf, 2018

In late November 2018, we ran the third annual rOpenSci ozunconf. This is the sibling rOpenSci unconference, held in Australia. We ran the first ozunconf in Brisbane in 2016, and the second in Melbourne in 2017. Photos taken by Ajay from Fotoholics As usual, before the unconf, we started discussion on GitHub issue threads, and the excitement was building with the number of issues. The day before the unconf we ran “Day 0 training” - an afternoon explaining R packages and ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 未老先衰

<p>一年前<a href="https://yihui.name/cn/2018/01/fitness/">我说</a>我感觉体能在以肉眼可见速度下降，受点凉、切三小时肉都能感冒。洒家今年才二十八岁……零七十八个月而已。<a href="https://yihui.name/cn/2005/08/ride-to-tianjin-in-16-hours/">年少轻狂时</a>好歹也曾单枪匹马、一辆破自行车从北京冒雨猛蹬到天津，如今在球场上奔袭两小时就感觉暮气沉沉，就仿佛到了需要<a href="https://yihui.name/cn/2018/07/lend-me/">借我</a>鲜活与生猛的年纪。</p> <p>体能只是一方面，我更怕的是脑子衰败。今日看到一则 <a href="https://xkcd.com/2093/">XKCD 漫画</a>，觉得很有代表性。电脑变得更聪明不可怕，可怕的是我们的脑子好像啥也记 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: On deck for the first half of 2019

<p>OK, this is what we’ve got for you: “The Book of Why” by Pearl and Mackenzie Reproducibility and Stan MRP (multilevel regression and poststratification; Mister P): Clearing up misunderstandings about Becker on Bohm on the important role of stories in science This is one offer I can refuse How post-hoc power calculation is like a […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/07/on-deck-for-the-first-half-of-2019/">On deck for the first half of 2019</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The seminar speaker contest begins: Jim Thorpe (1) vs. John Oliver

<p>As promised, we’ll be having one contest a day for our Ultimate Seminar Speaker contest, first going through the first round of our bracket, then going through round 2, etc., through to the finals. Here’s the bracket: And now we begin! The first matchup is Jim Thorpe, seeded #1 in the GOATs category, vs. John […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/07/the-seminar-speaker-contest-begins-jim-thorpe-1-vs-john-oliver/">The seminar speaker contest begins: Jim Thorpe (1) vs. John Oliver</a> appeared first on <a rel="nofollow" ...

Source: 一路嘿嘿

Link: ROC曲线与AUC值

混淆矩阵 - confusion matrix 机器学习分类问题中，混淆矩阵(非监督学习中称匹配矩阵match matrix)用于表征算法的性能。如下表所示 True condition Total population Condition positive Condition negative Predicted ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Announcing the ultimate seminar speaker contest: 2019 edition!

<p>Paul Davidson made the bracket for us (thanks, Paul!): Here’s the full list: Wits: Oscar Wilde (seeded 1 in group) Dorothy Parker (2) David Sedaris (3) Voltaire (4) Veronica Geng Albert Brooks Mel Brooks Monty Python Creative eaters: M. F. K. Fisher (1) Julia Child (2) Anthony Bourdain (3) Alice Waters (4) A. J. Liebling […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/06/announcing-the-ultimate-seminar-speaker-contest-2019-edition/">Announcing the ultimate seminar speaker contest: 2019 edition!</a> appeared first on <a rel="nofollow" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Objective Bayes conference in June

<p>Christian Robert points us to this Objective Bayes Methodology Conference in Warwick, England in June. I’m not a big fan of the term “objective Bayes” (see my paper with Christian Hennig, Beyond subjective and objective in statistics), but the conference itself looks interesting, and there are still a few weeks left for people to submit […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/06/objective-bayes-conference-in-june/">Objective Bayes conference in June</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistica ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Dissolving the Fermi Paradox”

<p>Jonathan Falk writes: A quick search seems to imply that you haven’t discussed the Fermi equation for a while. This looks to me to be in the realm of Miller and Sanjurjo: a simple probabilistic explanation sitting right under everyone’s nose. Comment? “This” is a article, Dissolving the Fermi Paradox, by Anders Sandberg, Eric Drexler […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/05/dissolving-fermi-paradox/">“Dissolving the Fermi Paradox”</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistic ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 库里肖夫效应与过度脑补

<p>昨天看到一则有点意思的维基条目，叫<a href="https://en.wikipedia.org/wiki/Kuleshov_effect">库里肖夫效应</a>。这应该不是关于脑补的最早理论，但可能是最早反映人们对同一幅画面在不同对比条件的迥异解读（或脑补）。这给我两点启示：一是我们对他人情绪的判断可能很不靠谱，会过度解读；二是电影只是一种艺术手法，它通过视觉片段的排列组合来刺激我们联想，有时候这种刺激的意图是很明显的，有时候则在不同人脑中有不同的效果。</p> <p>我不是什么艺术家，对这种镜头拼接通常也没什么兴趣。我已经若干年没看过电影。回想一下，在镜头剪辑拼接方面给我留下最深刻印象的还是《疯狂的石头》。我觉得这部片子的拼接实在是太精妙 ...

Source: Econometrics and Free Software

Link: Looking into 19th century ads from a Luxembourguish newspaper with R

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=0xzN6FM5x_E"> <img src="./img/Wales.jpg" title = "Sometimes ads are better than this. Especially if it's Flex Tape ® ads."></a></p> </div> <p>The <a href="https://data.bnl.lu/data/historical-newspapers/">national library of Luxembourg</a> published some very interesting data sets; scans of historical newspapers! There are several data sets that you can download, from 250mb up to 257gb. I decided to take a look at the 32gb “ML Starter Pack”. It contains high quality scans of one year of the <em>L’indépendence ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Robin Pemantle’s updated bag of tricks for math teaching!

<p>Here it is! He’s got the following two documents: – Tips for Active Learning in the College Setting – Tips for Active Learning in Teacher Prep or in the K-12 Setting This is great stuff (see my earlier review here). Every mathematician and math teacher in the universe should read this. So, if any of […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/03/robin-pemantles-updated-bag-of-tricks-for-math-teaching/">Robin Pemantle’s updated bag of tricks for math teaching!</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Back by popular demand . . . The Greatest Seminar Speaker contest!

<p>Regular blog readers will remember our seminar speaker competition from a few years ago. Here was our bracket, back in 2015: And here were the 64 contestants: – Philosophers: Plato (seeded 1 in group) Alan Turing (seeded 2) Aristotle (3) Friedrich Nietzsche (4) Thomas Hobbes Jean-Jacques Rousseau Bertrand Russell Karl Popper – Religious Leaders: Mohandas […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/04/back-by-popular-demand-the-greatest-seminar-speaker-contest/">Back by popular demand . . . The Greatest Seminar Speaker contest!</a> appeared first on <a ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: permutation test for PCA components

<p>PCA is a critical method for dimension reduction for high-dimensional data. High-dimensional data are data with features (p) a lot more than observations (n). However, this is changing with single-cell RNAseq data. Now, we can sequence millions (n) of single cells and each cell has ~20,000 genes/features (p).</p> <p>I suggest you read my <a href="https://divingintogeneticsandgenomics.rbind.io/post/pca-in-action/">previous blog post</a> on using <code>svd</code> to calculate PCs.</p> <div id="single-cell-expression-data-pca" class="section level3"> <h3>Single-cell expression data ...

Source: About on Likan Zhan | 战立侃

Link: An interactive learning widget for R

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhIDwtIFwiSGVsbG9cIlxuYiA8LSBcIldvcmxkXCJcbmMgPC0gXCIhXCJcblxucGFzdGUoYSwgYiwgYykifQ== ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Published in 2018

<p>R-squared for Bayesian regression models. {\em American Statistician}. (Andrew Gelman, Ben Goodrich, Jonah Gabry, and Aki Vehtari) Voter registration databases and MRP: Toward the use of large scale databases in public opinion research. {\em Political Analysis}. (Yair Ghitza and Andrew Gelman) Limitations of “Limitations of Bayesian leave-one-out cross-validation for model selection.” {\em Computational Brain and […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/03/published-in-2018/">Published in 2018</a> appeared first on <a rel="nofollow" ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 噜蚣

<p>前天在王大爷家继续捣鼓他的苹果电视，他讲起他小时候的一些好玩的东西，脱口来了句“余忆童稚时，能张目对日”，说跟那里面描写的一样。我心想我这过去的一年真是奇怪，怎么之前我完全不知道（或已忘记）的《浮生六记》这一年频频出现在我的眼前：最初我是从《古文观止》上看见《闲情记趣》，觉得写得好玩，后来才找到全书看。结果看完之后又偶然在一个<a href="https://yihui.name/cn/2018/11/phd-or-work/">歪果仁厨师大姐</a>的采访里看到这本书的影子。然后一天夜里哄娃听《冬吴相对论》时，吴伯凡又提起<a href="https://yihui.name/cn/2018/08/chen-yun/">陈芸</a>把茶叶放在荷花里这件事，他的看点碰巧跟我一样。到了 ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 珠穆朗玛峰下有一棵树

<p>我读诗词文字的时候，常年有一个痛点，就是不认识里面的动植物名。这一点其实我<a href="https://yihui.name/cn/2017/01/blog/">前年元旦</a>已经隐约提过。当我看见“楝花飘砌”、“<a href="https://yihui.name/cn/2014/06/on-writing/">白苹秋老</a>”、“红蓼花疏”之类的句子时，都不知道是些啥玩意儿。读起来都困难，就跟不必说写了。写的时候一旦涉及到什么动植物就歇菜了，因为叫不出它们的名字。今天读到一篇<a href="https://www.douban.com/note/702427221/">关于一个无名诗人的文章</a>，里面恰好也谈到了这个问题：</p> <blockquote> <p>后来由诗人潘洗尘牵头，又 ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Someone writes: I would like to ask you for an advice regarding obtaining data for reanalysis purposes from an author who has multiple papers with statistical errors and doesn’t want to share the data. Recently, I reviewed a paper that included numbers that had some of the reported statistics that were mathematically impossible. As the […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/02/read-paper-full-errors-author-wont-share-data-open-analysis/">What to do when you read a paper and it’s full of errors and the author won’t share the data or be ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 2018 年：迷茫与倒退

<p>又到了一年一度意气风发立牌坊的时节。诗云：牌坊时节语纷纷，来年路上欲断魂。我以为我每年都会写个年终总结什么的，但往前一翻，发现多数年份都并没有写；不知道为什么我形成了这样一个错觉，可能是看别人的牌坊看多了吧。</p> <p>如果用两个词总结我在过去一年的心境，那就是本文的标题：迷茫、倒退。书和网上杂乱文章翻得越多，心越迷茫，因为发现这盘根错节的世界的复杂度越来越超出我的理解能力；好些事情如果我真要寻根究底的话，都会觉得异常烧脑，而且就算烧脑也寻不到根、究不到底。另一方面，我又感觉这世界嘈杂得越来越让我充满疑虑，我仿佛看着呼啸汹涌的人潮一天天拍打在不同的崖壁礁石上，而我则更想倒退、远离尘嚣。</p> <p>焦虑，焦个屁的虑。养生，养个六的生。世上看不见的手越来越多，在暗地里引发一波又一波的人 ...

Source: Rob J Hyndman

Link: Macroeconomic forecasting for Australia using a large number of predictors

A popular approach to forecasting macroeconomic variables is to utilize a large number of predictors. Several regularization and shrinkage methods can be used to exploit such high-dimensional datasets, and have been shown to improve forecast accuracy for the US economy. To assess whether similar results hold for economies with different characteristics, an Australian dataset containing observations on 151 aggregate and disaggregate economic series as well as 185 international variables, is introduced.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/slb0tIhveEg" height="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Principles of posterior visualization”

<p>What better way to start the new year than with a discussion of statistical graphics. Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles: Principle 1: Uncertainty should be visualized Principle 2: Visualization of variability ≠ Visualization of uncertainty Principle 3: Equal probability = Equal […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2019/01/01/principles-posterior-visualization/">“Principles of posterior visualization”</a> appeared first on <a rel="nofollow" ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Part 1 was here. And here’s Part 2. Jordan Anaya reports: Uli Schimmack posted this on facebook and twitter. I [Anaya] was annoyed to see that it mentions “a handful” of unreliable findings, and points the finger at fraud as the cause. But then I was shocked to see the 85% number for the Many […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/31/authority-figures-in-psychology-spread-more-happy-talk-still-dont-get-the-point-that-much-of-the-published-celebrated-and-publicized-work-in-their-field-is-no-good-part-2/">Authority figures in ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 准备申请材料的一个细节

<p>过去这七八年来，我担任过几次 John Chambers 软件奖的评委，有个小问题我觉得有必要单独拎出来说一下，就是文件命名。这是一个很细节的问题，但我可以厚着脸皮自夸一下，迄今为止我还没看到过一个申请者像我当年那样在这个问题上用心。根据网站上的要求，申请者需要准备一系列电子版材料，而我通常看到的就是一堆乱糟糟命名或命名时惜字如金的 PDF 文件，让我作为评委不知道该先看哪个。有时候同一份文档需要看几遍，所以第一次点过看完之后，第二次想再看一遍时，经常不知道哪个文件是我要看的，又得猜或试。</p> <p>以下是我当年申请时打包的文件列表：</p> <pre><code class="language-md">1. Curriculum Vitae - Yihui XIE.pdf 2. Letter from academic ...

Source: Econometrics and Free Software

Link: R or Python? Why not both? Using Anaconda Python within R with {reticulate}

<div style="text-align:center;"> <p><a href="https://youtu.be/I8vaCrVIR-Q?t=1h2m26s"> <img src="./img/why not both.png" title = "This literally starts playing when you run both R and Python in the same session"></a></p> </div> <p>This short blog post illustrates how easy it is to use R and Python in the same R Notebook thanks to the <code>{reticulate}</code> package. For this to work, you might need to upgrade RStudio to the <a href="https://www.rstudio.com/products/rstudio/download/preview/">current preview version</a>. Let’s start by importing <code>{reticulate}</code>:</p> <pre ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Combining apparently contradictory evidence

<p>I want to write a more formal article about this, but in the meantime here’s a placeholder. The topic is the combination of apparently contradictory evidence. Let’s start with a simple example: you have some ratings on a 1-10 scale. These could be, for example, research proposals being rated by a funding committee, or, umm, […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/30/combining-apparently-contradictory-evidence/">Combining apparently contradictory evidence</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Timothy Brathwaite sends along this wonderfully-titled article (also here, and here’s the replication code), which begins: Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/29/check-wreck-assessing-discrete-choice-models-predictive-simulations/">“Check ...

Source: Mara Averick

Link: Top Tweets of 2018

Top two tweets for each month in 2018, according to Twitter analytics. January Such a 🆒 viz ⚒: "Intro to gghighlight: Highlight ggplot lines & points w/ predicates" by @yutannihilation https://t.co/SPVeKw0UdN #rstats #dataviz pic.twitter.com/VsSpXUZn88 — Mara Averick (@dataandme) January 28, 2018 ICYMI, .@StephdeSilva's tips for getting to know your data (also makes for pretty solid relationship advice) 🤓💔https://t.co/LUZe6tpPNq pic.twitter.com/KEdIot3IGo — Mara Averick (@dataandme) January 21, 2018 February Now on CRAN ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Using multilevel modeling to improve analysis of multiple comparisons

<p>Justin Chumbley writes: I have mused on drafting a simple paper inspired by your paper “Why we (usually) don’t have to worry about multiple comparisons”. The initial idea is simply to revisit frequentist “weak FWER” or “omnibus tests” (which assume the null everywhere), connecting it to a Bayesian perspective. To do this, I focus on […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/28/using-multilevel-modeling-improve-analysis-multiple-comparisons/">Using multilevel modeling to improve analysis of multiple ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: The end of 2018

<p>It is almost the end of 2018. It is a good time to review what I have achieved during the year and look forward to a brand new 2019. I wrote a similar post for 2017 <a href="http://crazyhottommy.blogspot.com/2017/12/" target="_blank">here</a>.</p> <h3 id="some-highlights-of-the-year-2018">Some highlights of the year 2018:</h3> <ul> <li>My son Noah Tang was born in April. He is so lovely and we love him so much. Can’t believe he is almost 9 months old. <img src="https://divingintogeneticsandgenomics.rbind.io/img/noah.jpg" alt="" /></li> <li><p>Our epigenomic project was selected by ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 最省钱的信用卡

<p>两周前有朋友来访，闲谈中谈起什么信用卡最省钱。这方面我当然不是专家，我自己就两张信用卡，都是那种免年费的低档卡。我整天收到广告推销那些高消费、高返现的信用卡，而这种广告我连信封都不会打开就直接扔了，因为我现在作为一位佛系中青年，根本就没那么些花钱的需求，而且我怕一旦开了这种卡，会为凑够年费花销而花钱。为了避免这种蠢蠢欲动的心理，还不如就老老实实用看起来不划算的信用卡算了。我曾听说过有人为了省钱和避免购买不必要的东西，给自己制定了一条购物原则：绝不买打折的东西，要买只买全价的商品。这种反直觉的原则对那些冲动型购物并且贪便宜的人应该能有点用，因为买全价商品总会觉得亏了，从而抑制购物冲动。特意用不划算的信用卡，应该也有类似的效果。</p> <p>在本佛陀看来，世上最省钱的信用卡莫过 ...

Source: Econometrics and Free Software

Link: Some fun with {gganimate}

<div style="text-align:center;"> <video width="864" height="480" controls> <source src="./img/wiid_gganimate.webm" type="video/webm"> Your browser does not support the video tag. </video> </div> <p>In this short blog post I show you how you can use the <code>{gganimate}</code> package to create animations from <code>{ggplot2}</code> graphs with data from UNU-WIDER.</p> <div id="wiid-data" class="section level2"> <h2>WIID data</h2> <p>Just before Christmas, UNU-WIDER released a new edition of their World Income Inequality Database:</p> <blockquote class="twitter-tweet"><p lang="en" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Back to the Wall

<p>Jim Windle writes: Funny you should blog about Jaynes. Just a couple of days ago I was looking for something in his book’s References/Bibliography (it along with “Godel, Escher, Bach” and “Darwin’s Dangerous Idea” have bibliographies which I find not just useful but entertaining), and ran across something I wanted to send you but I […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/27/back-to-the-wall/">Back to the Wall</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: What is probability?

<p>This came up in a discussion a few years ago, where people were arguing about the meaning of probability: is it long-run frequency, is it subjective belief, is it betting odds, etc? I wrote: Probability is a mathematical concept. I think Martha Smith’s analogy to points, lines, and arithmetic is a good one. Probabilities are […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/26/what-is-probability/">What is probability?</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Homepage on Yihui Xie | 谢益辉

Link: On Marketing (on Social Media)

<p>Both Mara and Hadley disagreed with me <a href="https://yihui.name/en/2018/11/dependency-winner/">when I talked about my concerns on marketing</a> last month. Marketing was not the main topic of that post, and I want to clarify my thoughts in this post.</p> <h2 id="the-medium-is-the-message">The medium is the message</h2> <p>I’m not against marketing. I do it myself, too. I definitely agree with Mara’s comment that meritocracy is a fantasy:</p> <blockquote> <p>I think there’s a tendency to imagine a platonic ideal of meritocracy in the absence of something we call ...

Source: ZedR Blog - Data Science . R . Stock Markets

Link: Deploying Metabase through Heroku App

In this mini project, I deployed my first Metabase through the Heroku App. Metabase is an open source business intelligence engine. It allows for company wide sharing of business insights with very little demand for techinical know-how. You can build powerful business data models using SQL and through the menu based report ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>What better day than Christmas, that day of gift-giving, to discuss “loss aversion,” the purported asymmetry in utility, whereby losses are systematically more painful than gains are pleasant? Loss aversion is a core principle of the heuristics and biases paradigm of psychology and behavioral economics. But it’s been controversial for a long time. For example, […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/25/thus-loss-aversion-principle-rendered-superfluous-account-phenomena-introduced-explain/">“Thus, a loss aversion principle ...

Source: Econometrics and Free Software

Link: Objects types and some useful R functions for beginners

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=M-1nTwiHxic"> <img width = "400" src="./img/santa_sanders.jpg" title = "The frydiest time of the year"></a></p> </div> <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"> </script> <p>This blog post is an excerpt of my ebook <em>Modern R with the tidyverse</em> that you can read for free <a href="https://b-rodrigues.github.io/modern_R/">here</a>. This is taken from Chapter 2, which explains the different R objects you can manipulate as ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: June is applied regression exam month!

<p>So. I just graded the final exams for our applied regression class. Lots of students made mistakes which gave me the feeling that I didn’t teach the material so well. So I thought it could help lots of people out there if I were to share the questions, solutions, and common errors. It was an […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/24/june-is-applied-regression-exam-month/">June is applied regression exam month!</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Zak David expresses critical views of some published research in empirical quantitative finance

<p>In honor of Ebenezer Scrooge, what better time than Christmas Eve to discuss the topic of liquidity in capital markets . . . A journalist asked, “I just wanted to know how bad the problem of data mining is in capital markets compared to other fields, and whether the reasons for false postives in finance […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/24/zak-david-expresses-critical-views-much-published-research-empirical-quantitative-finance/">Zak David expresses critical views of some published research in empirical quantitative finance</a> ...

Source: Homepage on Yihui Xie | 谢益辉

Link: On Disagreement

<p>There is a problem with being famous. When you become famous, you may lose some opportunities of hearing other people’s disagreements with you. There are a few possible reasons:</p> <ol> <li><p>They may not be confident enough with their own opinions, and think you must be correct when there is a disagreement.</p></li> <li><p>They may think you are wrong, but choose not to speak up anyway, because they are worried that everyone else would agree with you, which could make them unpopular.</p></li> <li><p>Out of courtesy, they don’t want to disagree. They know you often feel good ...

Source: Jan's Page

Link: Some Incomplete Papers

<p>Not sure if I will ever finish these, but the incomplete versions make it pretty clear what needs to be added to make them complete. It’s all perculating somewhere in my head, pretty much formulated, but writing it down is another matter. In the first paper the notion of sharp quadratic majorization is generalized using <em>fans</em> of majorizers. In the second paper formulas are given for the convergence rate of majorization algorithms with equality and/or inequality ...

Source: Jan's Page

Link: Convergence Rate of Majorization Algorithms with Constraints

NA ...

Source: Jan's Page

Link: Univariate Fans of Majorizers

NA ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “When Both Men and Women Drop Out of the Labor Force, Why Do Economists Only Ask About Men?”

<p>Dean Baker points to this column, where Gregory Mankiw writes: With unemployment at 3.8 percent, its lowest level in many years, the labor market seems healthy. But that number hides a perplexing anomaly: The percentage of men who are neither working nor looking for work has risen substantially over the past several decades. . . […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/23/men-women-drop-labor-force-economists-ask-men/">“When Both Men and Women Drop Out of the Labor Force, Why Do Economists Only Ask About Men?”</a> appeared first on <a ...

Source: Rob J Hyndman

Link: Network for early career researchers in forecasting

The International Institute of Forecasters has established interest group sections, devoted to specialized domains of forecasting. One of the first such sections will be for early career researchers. So if you are a PhD student, post-doc, or otherwise a relatively junior researcher working in forecasting, this is for you! The first events will be during the ISF in Thessaloniki in June 2019, including the following: ECR welcoming event. A meet and great event prior to the ISF welcome reception.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/3J3e6ZwE930" height="1" width="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Carol Nickerson explains what those mysterious diagrams were saying

<p>A few years ago, James Coyne asked, “Can you make sense of this diagram?” and I responded, No, I can’t. At the time, Carol Nickerson wrote up explanations for two of the figures in the article in question. So if anyone’s interested, here they are: Carol Nickerson’s explanation of Figure 2 in Kok et al. […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/22/carol-nickerson-explains-mysterious-diagrams-saying/">Carol Nickerson explains what those mysterious diagrams were saying</a> appeared first on <a rel="nofollow" ...

Source: Econometrics and Free Software

Link: Using the tidyverse for more than data manipulation: estimating pi with Monte Carlo methods

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=kZJY15dyMig"> <img width = "400" src="./img/casino.jpg" title = "Audentes Fortuna Iuvat"></a></p> </div> <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"> </script> <p>This blog post is an excerpt of my ebook <em>Modern R with the tidyverse</em> that you can read for free <a href="https://b-rodrigues.github.io/modern_R/">here</a>. This is taken from Chapter 5, which presents the <code>{tidyverse}</code> packages and how to use them to ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The causal hype ratchet

<p>Noah Haber informs us of a research article, “Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): A systematic review,” that he wrote with Emily Smith, Ellen Moscoe, Kathryn Andrews, Robin Audy, Winnie Bell, Alana Brennan, Alexander Breskin, Jeremy Kane, Mahesh Karra, Elizabeth McClure, and Elizabeth Suarez, and […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/21/causal-hype-ratchet/">The causal hype ratchet</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical ...

Source: 一路嘿嘿

Link: 线性判别分析LDA

线性判别分析（liner discriminant analysis, LDA）一种常用的数据降维方法，目的是在保持分类的前体下把数据投影至低维空间以降低计算复杂度。 LDA VS PCA ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Opher Donchin writes in with a question: We’ve been finding it useful in the lab recently to look at the histogram of samples from the parameter combined across all subjects. We think, but we’re not sure, that this reflects the distribution of that parameter when marginalized across subjects and can be a useful visualization. It […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/20/exploring-model-fit-looking-histogram-posterior-simulation-draw-set-parameters-hierarchical-model/">Exploring model fit by looking at a histogram of a posterior ...

Source: ewen

Link: listening, EOY 2018

Stuff that slapped / soothed this ...

Source: Rob J Hyndman

Link: Early classification of spatio-temporal events using time-varying models

This paper investigates early event classification in spatio-temporal data streams. We propose a framework for early classification that considers the relationship between the features of an event and its age. The framework incorporates an event extraction algorithm as well as two early event classification algorithms, which use a series of logistic regression classifiers with penalty terms and state space models. We apply this framework to synthetic and real world problems and demonstrate its reliability and broad applicability.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/ ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: When “nudge” doesn’t work: Medication Reminders to Outcomes After Myocardial Infarction

<p>Gur Huberman points to this news article by Aaron Carroll, “Don’t Nudge Me: The Limits of Behavioral Economics in Medicine,” which reports on a recent study by Kevin Volpp et al. that set out “to determine whether a system of medication reminders using financial incentives and social support delays subsequent vascular events in patients following […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/19/nudge-doesnt-work-medication-reminders-outcomes-myocardial-infarction/">When “nudge” doesn’t work: Medication Reminders to ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: A tale of two heatmap functions

<p>You probably do not understand heatmap! Please read <a href="http://www.opiniomics.org/you-probably-dont-understand-heatmaps/">You probably don’t understand heatmaps by Mick Watson</a></p> <p>In the blog post, Mick used <code>heatmap</code> function in the <code>stats</code> package, I will try to walk you through comparing <code>heatmap</code>, and <code>heatmap.2</code> from <code>gplots</code> package.</p> <p>Before I start, I want to quote this:</p> <blockquote> <p>“The defaults of almost every heat map function in R does the hierarchical clustering first, then scales the rows then ...

Source: Simply Statistics

Link: The Netflix Data War

<p>A recent article in the Wall Street Journal, <a href="https://www.wsj.com/articles/at-netflix-who-wins-when-its-hollywood-vs-the-algorithm-1541826015?emailToken=43ff1b39ad606a5db59c9fcf2d69741fSCIKNr2MhQ2fDt14GnpJCnpmuOt4cIRNRVTmT3dVTRtcCRfo9MAfxHbyK7XQlCGz9nkhmaBGU/K/gkZ+EeG5tJ6k/mjzxfV4AzIWJiG6g529n+n9dS0XOrKDelzIe3qd&reflink=article_copyURL_share">“At Netflix, Who Wins When It’s Hollywood vs. the Algorithm?”</a> by Shalini Ramachandran and Joe Flint details some of the internal debates within Netflix between the Los Angeles-based content team, which is in charge of ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 渴望

<p>关于渴望这个话题，我想写它已经有一阵子了。这里的渴望没有毛阿敏的《<a href="https://yihui.name/cn/2018/09/traveller/">渴望</a>》那么高端，只是我成长过程中的一些幼稚的渴望罢了。我第一次明确想到这个话题是今年一月说<a href="https://yihui.name/cn/2018/01/craving-exploit/">社交媒体制造的渴望</a>时，而第一次隐约考虑这个话题则大概是<a href="https://yihui.name/cn/2011/05/i-hate-exams/">七年前</a>我说那本《金银岛》时了。我写这篇回忆杀，主要是想对比一下我自身 ...

Source: 一路嘿嘿

Link: 主成分分析

通常高通量数据中含有很多变量，主成分分析是一种数据降维方法，利用正交变换把原始的可能相关的变量转换为一组正交新变量, ...

Source: Rob J Hyndman

Link: Why doesn't auto.arima() return the model with the lowest AICc value?

This question seems to come up frequently on crossvalidated.com or in my inbox. I have this time series, however it yields different results when I use the auto.arima and Arima functions. library(forecast) xd <- ts(c(23786, 25955, 54373, 21561, 14552, 13284, 12714, 11821, 15445, 21307, 17228, 20007, 23065, 32811, 43147, 15127, 13497, 12224, 11412, 11888, 14210,18978, 15782, 17216, 16417, 22861, 42616, 17057, 9741, 10503, 7170, 10686, 9762, 15773, 15280, 13212, 14784, 26104, 29947), frequency = 12, start=c(2014,1), end=c(2017,3)) fit1 <- auto.<img src="http://feeds.feedburner.com/~r/Profe ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Classifying yin and yang using MRI

<p>Zad Chow writes: I wanted to pass along this study I found a while back that aimed to see whether there was any possible signal in an ancient Chinese theory of depression that classifies major depressive disorder into “yin” and “yang” subtypes. The authors write the following, The “Yin and Yang” theory is a fundamental […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/18/classifying-yin-yang-using-mri/">Classifying yin and yang using MRI</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>TV commentator Carlson in 2018 recently raised a stir by saying that immigration makes the United States “poorer, and dirtier, and more divided,” which reminded me of this rant from literary critic Alfred Kazin in 1957: Kazin put it in his diary and Carlson broadcast it on TV, so not quite the same thing. But […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/18/if-only-tucker-carlson-had-been-around-in-the-1950s-he-couldve-been-a-new-york-intellectual/">Comparing racism from different eras: If only Tucker Carlson had been around in the 1950s he ...

Source: DNA confesses Data speak on DNA confesses Data speak

Link: PCA in action

<div id="pca-in-practice." class="section level3"> <h3>PCA in practice.</h3> <p>Principle Component Analysis(PCA) is a very important skill for dimention reduction to analyze high-dimentional data. High-dimentional data are data with features (p) a lot more than observations (n). This types of data are very commonly generated from high-throuput sequencing experiments. For example, an RNA-seq or microarry experiment measures expression of tens of thousands of genes for only 8 samples (4 controls and 4 treatments).</p> <p>Let’s use a microarray data for demonstration. One thing to note is that ...

Source: Blog on rOpenSci - open tools for open science

Link: rcites - The story behind the package

The Ecology Hackathon Almost one year ago now, ecologists filled a room for the “Ecology Hackathon: Developing R Packages for Accessing, Synthesizing and Analyzing Ecological Data” that was co-organised by rOpenSci Fellow, Nick Golding and Methods in Ecology and Evolution. This hackathon was part of the “Ecology Across Borders” Joint Annual Meeting 2017 of BES, GfÖ, NecoV, and EEF in Ghent. At different tables, different people joined each other to work on different ideas to implement as R ...

Source: Posts on datascienceblog.net: R for Data Science

Link: An Introduction to Forecasting

Forecasting is concerned with making predictions about future observations by relying on past measurements. In this article, I will give an introduction how ARMA, ARIMA (Box-Jenkins), SARIMA, and ARIMAX models can be used for forecasting given time-series data. Preliminaries Before we can talk about models for time-series data, we have to introduce two concepts. The backshift operator Given the time series \(y = \{y_1, y_2, \ldots \}\), the backshift operator (also called lag operator) is defined ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Fabio Rojas asks why the academic field of sociology seems so focused on the negative. As he puts it, why doesn’t the semester begin with the statement, “Hi, everyone, this is soc 101, the scientific study of society. In this class, I’ll tell you about how American society is moving in some great directions as […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/17/sociologists-bloggers-focus-negative-6-possible-explanations/">Why do sociologists (and bloggers) focus on the negative? 5 possible explanations. (A post in the style of Fabio ...

Source: 一路嘿嘿

Link: 矩阵分解

特征分解 对于方阵\(A\)和非零向量\(x\), 如果\(Ax = \lambda x\),表征矩阵\(A\)乘以向量\(x\)后不改变向量的值，\(x\)称为 ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Teppo Felin sends along this article with Mia Felin, Joachim Krueger, and Jan Koenderink on “surprise-hacking,” and writes: We essentially see surprise-hacking as the upstream, theoretical cousin of p-hacking. Though, surprise-hacking can’t be resolved with replication, more data or preregistration. We use perception and priming research to make these points (linking to Kahneman and priming, […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/16/surprise-hacking-the-narrative-of-blindness-and-illusion-sells-and-therefore-continues-to-be-the-c ...

Source: Econometrics and Free Software

Link: Manipulate dates easily with {lubridate}

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=FTQbiNvZqaY"> <img width = "400" src="./img/africa.jpg" title = "One of my favourite songs"></a></p> </div> <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"> </script> <p>This blog post is an excerpt of my ebook <em>Modern R with the tidyverse</em> that you can read for free <a href="https://b-rodrigues.github.io/modern_R/">here</a>. This is taken from Chapter 5, which presents the <code>{tidyverse}</code> packages and how to use them ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Youyou Wu writes: I’m a postdoc studying scientific reproducibility. I have a machine learning question that I desperately need your help with. My advisor and I disagree on how we should carry out repeated cross-validation. We would love to have a third expert opinion… I’m trying to predict whether a study can be successfully replicated […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/15/advisor-disagree-carry-repeated-cross-validation-love-third-expert-opinion/">“My advisor and I disagree on how we should carry out repeated ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A couple of thoughts regarding the hot hand fallacy fallacy

<p>For many years we all believed the hot hand was a fallacy. It turns out we were all wrong. Fine. Such reversals happen. Anyway, now that we know the score, we can reflect on some of the cognitive biases that led us to stick with the “hot hand fallacy” story for so long. Jason Collins […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/14/couple-thoughts-regarding-hot-hand-fallacy-argument/">A couple of thoughts regarding the hot hand fallacy fallacy</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>I have a sad story for you today. Jason Collins tells it: In The (Honest) Truth About Dishonesty, Dan Ariely describes an experiment to determine how much people cheat . . . The question then becomes how to reduce cheating. Ariely describes one idea: We took a group of 450 participants and split them into […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/13/oh-hate-work-criticized-case-fails-attempted-replications-original-researchers-dont-even-consider-possibility-maybe-original-work-w/">Oh, I hate it when work is criticized (or, in this case, fails in ...

Source: Rob J Hyndman

Link: Using ggplot2 for functional time series

This week I’ve been attending the Functional Data and Beyond workshop at the Matrix centre in Creswick. I spoke yesterday about using ggplot2 for functional data graphics, rather than the custom-built plotting functionality available in the many functional data packages, including my own rainbow package written with Hanlin Shang. It is a much more powerful and flexible way to work, so I thought it would be useful to share some examples.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/i8FdL-xeV4Y" height="1" width="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Time series of Democratic/Republican vote share in House elections

<p>Yair prepared this graph of average district vote (imputing open seats at 75%/25%; see here for further discussion of this issue) for each House election year since 1976: Decades of Democratic dominance persisted through 1992; since then the two parties have been about even. As has been widely reported, a mixture of geographic factors and […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/12/time-series-of-democratic-republican-vote-share-in-house-elections/">Time series of Democratic/Republican vote share in House elections</a> appeared first on <a ...

Source: Rob J Hyndman

Link: Data visualization for functional time series

Presentation to MATRIX Workshop on Functional Data Analysis and Beyond Any good data analysis begins with a careful graphical exploration of the observed data. For functional time series data, this area of statistical analysis has been largely neglected. I will look at the tools that are available such as rainbow plots and functional box plots, and propose several new tools including functional ACF plots, functional season plots, calendar plots, and embedded pairwise distance plots.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/QFib74WFW6Q" height="1" width="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Do you have any recommendations for useful priors when datasets are small?”

<p>A statistician who works in the pharmaceutical industry writes: I just read your paper (with Dan Simpson and Mike Betancourt) “The Prior Can Often Only Be Understood in the Context of the Likelihood” and I find it refreshing to read that “the practical utility of a prior distribution within a given analysis then depends critically […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/11/recommendations-useful-priors-datasets-small/">“Do you have any recommendations for useful priors when datasets are small?”</a> appeared first on <a rel="nofollow" ...

Source: Blog on rOpenSci - open tools for open science

Link: Generating reasonable starting trees for complex phylogenetic analyses

I never really thought I would write an R package. I use R pretty casually. Then, this year, I was invited to participate during the last week of the Analytical Paleobiology short course, an intensive month-long experience in quantitative paleontology. I was thrilled to be invited. But I got a slight sinking feeling in my stomach when I realized all the materials were in R. And so I, a Pythonista, decided I would spend some of my maternity leave writing R packages to try to blend in with students who had spent the month living and breathing ...

Source: Simply Statistics

Link: The Role of Theory in Data Analysis

<p>In data analysis, we make use of a lot of theory, whether we like to admit it or not. In a traditional statistical training, things like the central limit theorem and the law of large numbers (and their many variations) are deeply baked into our heads. I probably use the central limit theorem everyday in my work, sometimes for the better, and sometimes for the worse. Even if I’m not directly applying a Normal approximation, knowledge of the central limit theorem will often guide my thinking and help me to decide what to do in a given data analytic situation.</p> <p>Theorems like the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Prior distributions for covariance matrices

<p>Someone sent me a question regarding the inverse-Wishart prior distribution for covariance matrix, as it is the default in some software he was using. Inverse-Wishart does not make sense for prior distribution; it has problems because the shape and scale are tangled. See this paper, “Visualizing Distributions of Covariance Matrices,” by Tomoki Tokuda, Ben Goodrich, […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/10/prior-distributions-covariance-matrices/">Prior distributions for covariance matrices</a> appeared first on <a rel="nofollow" ...

Source: ewen

Link: understatr

An understated (*cough*) project to help folks get hold of tidy football ...

Source: Rob J Hyndman

Link: Seasonal functional autoregressive models

Presentation to ACEMS/MATRIX Conference on Functional Data Analysis Functional autoregressive models have been widely used in functional time series analysis, but no attention has been given to handling seasonality within this framework. I will discuss a proposed seasonal functional autoregressive model, and explore some of its statistical properties including stationarity conditions and limiting behaviour. I will also look at methods for estimation and prediction of seasonal functional autoregressive time series of order one.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/wlP ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Someone sent in a question (see below). I asked if I could post the question and my reply on blog, and the person responded: Absolutely, but please withhold my name because this is becoming a touchy issue within my department. The boldface was in the original. I get this a lot. There seems to be […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/09/concerned-mrp-estimates-used-later-analyses-maybe-recommend-checking-using-fake-data-simulation/">Should we be concerned about MRP estimates being used in later analyses? Maybe. I recommend checking using ...

Source: Posts on datascienceblog.net: R for Data Science

Link: Prediction vs Forecasting

In supervised learning, we are often concerned with prediction. However, there is also the concept of forecasting. Here, I will discuss the differences between the two concepts so that we can answer the question why weather forecasting is not called weather prediction. Predicion and forecasting Prediction is concerned with estimating the outcomes for unseen data. For this purpose, you fit a model to a training data set, which results in an estimator \(\hat{f}(x)\) that can make predictions for new samples ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: My footnote about global warming

<p>At the beginning of my article, How to think scientifically about scientists’ proposals for fixing science, which we discussed yesterday, I wrote: Science is in crisis. Any doubt about this status has surely been been dispelled by the loud assurances to the contrary by various authority figures who are deeply invested in the current system […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/08/footnote-global-warming/">My footnote about global warming</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal ...

Source: Posts on datascienceblog.net: R for Data Science

Link: Interpreting ROC Curves, Precision-Recall Curves, and AUCs

Receiver operating characteristic (ROC) curves are probably the most commonly used measure for evaluating the predictive performance of scoring classifiers. The confusion matrix of a classifier that predicts a positive class (+1) and a negative class (-1) has the following structure: Prediction/Reference Class +1 -1 +1 TP FP -1 FN TN Here, TP indicates the number of true positives (model predicts positive class correctly), FP indicates the number of false positives (model incorrectly predicts positive class), FN indicates the number of false negatives (model incorrectly predicts negative ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Latour Sokal NYT

<p>Alan Sokal writes: I don’t know whether you saw the NYT Magazine’s fawning profile of sociologist of science Bruno Latour about a month ago. I wrote to the author, and later to the editor, to critique the gross lack of balance (and even of the most minimal fact-checking). No reply. So I posted my critique […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/07/40828/">Latour Sokal NYT</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Posts on datascienceblog.net: R for Data Science

Link: Inference vs Prediction

The terms inference and prediction both describe tasks where we learn from data in a supervised manner in order to find a model that describes the relationship between the independent variables and the outcome. Inference and prediction, however, diverge when it comes to the use of the resulting model: Inference: Use the model to learn about the data generation process. Prediction: Use the model to predict the outcomes for new data ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A parable regarding changing standards on the presentation of statistical evidence

<p>Now, the P-value Sneetches Had tables with stars. The Bayesian Sneetches Had none upon thars. Those stars weren’t so big. They were really so small. You might think such a thing wouldn’t matter at all. But, because they had stars, all the P-value Sneetches Would brag, “We’re the best kind of Sneetch on the Beaches. […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/06/parable-regarding-changing-standards-presentation-statistical-evidence/">A parable regarding changing standards on the presentation of statistical evidence</a> appeared first on <a ...

Source: Rob J Hyndman

Link: High-dimensional time series analysis

Presentation to the Australasian Actuarial Education and Research Symposium It is becoming increasingly common for organizations to collect huge amounts of data over time. Traditional time series methods are not well suited to this new paradigm. I will review some new tools that have been developed to analyse large collections of time series including visualization, anomaly detection and forecasting. Data visualization is essential for exploring and understanding structures and patterns, and to identify unusual observations.<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/YZ-Wb ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Niall Ferguson and the perils of playing to your audience

<p>History professor Niall Ferguson had another case of the sillies. Back in 2012, in response to Stephen Marche’s suggestion that Ferguson was serving up political hackery because “he has to please corporations and high-net-worth individuals, the people who can pay 50 to 75K to hear him talk,” I wrote: But I don’t think it’s just […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/05/niall-ferguson-perils-playing-audience/">Niall Ferguson and the perils of playing to your audience</a> appeared first on <a rel="nofollow" href="https://andrewgelm ...

Source: Blog on rOpenSci - open tools for open science

Link: Community Call - Governance strategies for open source research software projects

🎤 Dan Sholler, rOpenSci Postdoctoral Fellow 🕘 Tuesday, December 18, 2018, 10-11AM PST; 7-8PM CET (find your timezone) ☎️ Details for joining the Community Call. Everyone is welcome. No RSVP needed. Researchers use open source software for the capabilities it provides, such as streamlined data access and analysis and interoperability with other pieces of the scientific computing ecosystem. For most complex software, generating these technical capabilities requires building and governing a community via sound management practices, activities that are often less visible than code contributions ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>This is an abstract I wrote for a talk I didn’t end up giving. (The conference conflicted with something else I had to do that week.) But I thought it might interest some of you, so here it is: Bayes, statistics, and reproducibility The two central ideas in the foundations of statistics—Bayesian inference and frequentist […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/04/bayes-statistics-reproducibility-many-serious-problems-statistics-practice-arise-bayesian-inference-not-bayesian-enough-frequentist-evaluation-not-frequentist/">Bayes, ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>7pm in Fayerweather 310: Why is it more rational to vote than to answer surveys (but it used to be the other way around)? How does this explain why we should stop overreacting to swings in the polls? How does modern polling work? What are the factors that predict election outcomes? What’s good and bad […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/04/statistical-insights-into-public-opinion-and-politics-my-talk-for-the-columbia-data-science-society-this-wed-9pm/">“Statistical insights into public opinion and politics” (my talk for the ...

Source: Blog on rOpenSci - open tools for open science

Link: Detecting spatiotemporal groups in relocation data with spatsoc

spatsoc is an R package written by Alec Robitaille, Quinn Webber and Eric Vander Wal of the Wildlife Evolutionary Ecology Lab (WEEL) at Memorial University of Newfoundland. It is the lab’s first R package and was recently accepted through the rOpenSci onboarding process with a big thanks to reviewers Priscilla Minotti and Filipe Teixeira, and editor Lincoln Mullen. spatsoc started as a single function (what would eventually become group_pts) written by Alec in 2017 to help answer some of the questions that Quinn and Eric were asking about how animal social structure is related to ...

Source: Posts on datascienceblog.net: R for Data Science

Link: Performance Measures for Multi-Class Problems

For classification problems, classifier performance is typically defined according to the confusion matrix associated with the classifier. Based on the entries of the matrix, it is possible to compute sensitivity (recall), specificity, and precision. For a single cutoff, these quantities lead to balanced accuracy (sensitivity and specificity) or to the F1-score (recall and precision). For evaluate a scoring classifier at multiple cutoffs, these quantities can be used to determine the area under the ROC curve (AUC) or the area under the precision-recall curve ...

Source: Peng Zhao on Peng Zhao

Link: R, Open Science, and Reproducible Research

NA ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: In which I demonstrate my ignorance of world literature

<p>Fred Buchanan, a student at Saint Anselm’s Abbey School, writes: I’m writing a paper on the influence of Jorge Luis Borges in academia, in particular his work “The Garden of Forking Paths”. I noticed that a large number of papers from a wide array of academic fields include references to this work. Your paper, “The […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/03/demonstrate-ignorance-world-literature/">In which I demonstrate my ignorance of world literature</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: My talk tomorrow (Tues) noon at the Princeton University Psychology Department

<p>Integrating collection, analysis, and interpretation of data in social and behavioral research Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University The replication crisis has made us increasingly aware of the flaws of conventional statistical reasoning based on hypothesis testing. The problem is not just a technical issue with p-values, not can […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/03/my-talk-tomorrow-tues-at-the-princeton-university-psychology-department/">My talk tomorrow (Tues) noon at the Princeton ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: StanCon 2018 Helsinki talk slides, notebooks and code online

<p>StanCon 2018 Helsinki talk slides, notebooks and code have been available for some time in StanCon talks repository, but it seems we forgot to announce this. The StanCon 2018 Helsinki talk list includes also links to videos. StanCon’s version of conference proceedings is a collection of contributed talks based on interactive notebooks. Every submission is […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/03/stancon-2018-helsinki-talk-slides-notebooks-and-code-online/">StanCon 2018 Helsinki talk slides, notebooks and code online</a> appeared first on <a ...

Source: Econometrics and Free Software

Link: What hyper-parameters are, and what to do with them; an illustration with ridge regression

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=13Gd5kpLzsw"> <img width = "400" src="./img/ridge.jpg" title = "Gameboy ridge"></a></p> </div> <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"> </script> <p>This blog post is an excerpt of my ebook <em>Modern R with the tidyverse</em> that you can read for free <a href="https://b-rodrigues.github.io/modern_R/">here</a>. This is taken from Chapter 7, which deals with statistical models. In the text below, I explain what hyper-parameters ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The p-value is 4.76×10^−264

<p>Jerrod Anderson points us to Table 1 of this paper: It seems that the null hypothesis that this particular group of men and this particular group of women are random samples from the same population, is false. Good to know. For a moment there I was worried. On the plus side, as Anderson notes, the […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/02/p-value-4-76x10%e2%88%92264/">The p-value is 4.76×10^−264</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Posts on datascienceblog.net: R for Data Science

Link: Behind the Scenes: The First Month of datascienceblog.net

By now, datascienceblog.net already exists for one month, with the first post dating back to the 16th of October, 2018. I would like to use this opportunity to reflect on how the blog has developed since its inception. Content I am quite happy with the amount of content I could produce over the last couple of weeks. Especially when starting a blog, high-quality content is the most important criterion for developing a user ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “James Watson in his own words”

<p>Here are some thoughts from the noted biologist and writer, collected by Lior Pachter. I’d seen a few of these Watson quotes before, but it’s kinda stunning to see them all in one place. Apparently he recommends never adopting an Irish kid. All right, then.</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/12/01/james-watson-words/">“James Watson in his own words”</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Rob J Hyndman

Link: Forecasting competitions

Presentation for the Monash Executive MBA students<img src="http://feeds.feedburner.com/~r/ProfessorRobJHyndman/~4/KFxTW20zzDo" height="1" width="1" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Stephen Wolfram explains neural nets

<p>It’s easy to laugh at Stephen Wolfram, and I don’t like some of his business practices, but he’s an excellent writer and is full of interesting ideas. This long introduction to neural network prediction algorithms is an example. I have no idea if Wolfram wrote this book chapter himself or if he hired one of […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/30/stephen-wolfram-explains-neural-nets/">Stephen Wolfram explains neural nets</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 老人的故事

<p>读了一篇<a href="https://www.thecut.com/2018/11/im-broke-and-friendless-and-ive-wasted-my-whole-life.html">读者来信和编者回信</a>，讲的是一位绝望的读者，作为一个三十五岁的青年妇女，觉得自己的一生失败透顶、一事无成、形影相吊。读完来信，我停了两分钟，想了一下如果换我会怎样回信，但也没想出什么好的角度。我只注意到这位妇女有一句说自己曾经擅长写作、有诗意、热情和好奇心，这是她在整封信中唯一一处把自己描述为一个正面的形象，但也只是针对过去的自己。我要是来分析这个案例，我可能会问那个正面的形象是何时以及如何迷失的，因为诗意、热情、好奇都是很珍贵的品质。就像《火影忍者》中的风影我爱罗对 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “And when you did you weren’t much use, you didn’t even know what a peptide was”

<p>Last year we discussed the story of an article, “Variation in the β-endorphin, oxytocin, and dopamine receptor genes is associated with different dimensions of human sociality,” published in PNAS that, notoriously, misidentified what a peptide was, among other problems. Recently I learned of a letter published in PNAS by Patrick Jern, Karin Verweij, Fiona Barlow, […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/29/werent-much-use-didnt-even-know-peptide/">“And when you did you weren’t much use, you didn’t even know what a ...

Source: Blog on rOpenSci - open tools for open science

Link: Community Call Summary - Code Review in the Lab

Although there are increasing incentives and pressures for researchers to share code (even for projects that are not essentially computational), practices vary widely and standards are mostly non-existent. The practice of reviewing code then falls to researchers and research groups before publication. With that in mind, rOpenSci hosted a discussion thread and a community call to bring together different researchers for a conversation about current practices, and challenges in reviewing code in the ...

Source: Homepage on Liechi | 張列弛

Link: 是人格造就了伟大的科学家

从昨天开始，一个消息刷屏了：第一个基因编辑婴儿在中国诞生。 昨天上午看到这个消息的时候，心里觉得很震惊，等点开看了内容后，心里升起一股凉气，继 ...

Source: Rob J Hyndman

Link: M4 Forecasting Conference

Following the highly successful M4 Forecasting Competition, there will be a conference held on 10-11 December at Tribeca Rooftop, New York, to discuss the results. The conference will elaborate on the findings of the M4 Competition, with prominent speakers from leading business firms (Amazon, Uber, Google, Microsoft, SAS, and ProLogistica) and top universities. Nassim Nicholas Taleb will deliver a keynote address about uncertainty in forecasting and elaborate on his claims that “tail risks are much worse now than in 2007” while Spyros Makridakis will discuss how organizations can ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Another Stan related job in baseball!

<p>This quick post is for any Stan users out there who are interested in working in baseball. The Los Angeles Angels are looking to hire a Director of Quantitative Analysis and they are particularly interested in candidates with experience fitting models with Stan. If you are interested please see the full job posting.</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/27/another-stan-related-job-in-baseball/">Another Stan related job in baseball!</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Home on Another Random Blog

Link: My sublime text setup (and packages)

<p>I’ve been using Sublime Text as my main editor for a long time. But sometimes when I upgrade my system or use it on a new machine, my perfect setup would be lost.</p> <p>So that’s the aim of this post: to preserve my sublime text setup.</p> <div id="markdown" class="section level2"> <h2>Markdown</h2> <p>Based on <a href="https://blog.mariusschulz.com/2014/12/16/how-to-set-up-sublime-text-for-a-vastly-better-markdown-writing-experience">this fantastic post by Marius Schulz</a></p> <ul> <li><p>Monokai Extended<br /> Good highlighting for markdown editiong</p></li> <li><p>Markdown ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: $ vs. votes

<p>Carlos Cruz writes: Here’s an economics joke. Two economists are walking along when they happen to end up in front of a Tesla showroom. One economist points to a shiny new car and says, “I want that!” The other economist replies, “You’re lying.” The premise of this joke is that if the one economist had […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/27/vs-votes/">$ vs. votes</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Blog on rOpenSci - open tools for open science

Link: Co-localization analysis of fluorescence microscopy images

A few months ago, I wasn’t sure what to expect when looking at fluorescence microscopy images in published papers. I looked at the accompanying graph to understand the data or the point the authors were trying to make. Often, the graph represents one or more measures of the so-called co-localization, but I couldn’t figure out how to interpret them. It turned out; reading the images is simple. Cells are simultaneously stained by two dyes (say, red and green) for two different ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 脱不花的三盏灯

<p>2018 年已经只剩下一个月。这一年我在微信上可以说只读到了一篇好文章，那便是脱不花的自述《<a href="https://www.sohu.com/a/277056018_100002975">照亮我职业生涯的3盏灯——奥美大神、易筋经和大和尚</a>》。<a href="https://yihui.name/cn/2017/01/blog/">众所周知</a>，我已经两年多不听罗辑思维，我就不吃知识付费这一套，但这不妨碍我关注它的幕后 CEO。这篇文章的唯一不足就是作者粥左罗的啰嗦：他大可不必在文章之前狗头续貂劝读者多读几遍，在文章之后狗尾续貂再劝读者多读几遍，然后在文章里面还擅自<a href="https://yihui.name/cn/2018/11/moron-reader ...

Source: Rob J Hyndman

Link: Feature-based time series analysis

Presentation to the Monash international workshop on time series and panel data It is becoming increasingly common for organizations to collect very large amounts of data over time. Data visualization is essential for exploring and understanding structures and patterns, and to identify unusual observations. However, the sheer quantity of data available means that new time series visualisation methods are needed. I will demonstrate an approach to this problem using a vector of features on each time series, measuring characteristics of the series.<img src="http://feeds.feedburner.com/~r/Professo ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Economic predictions with big data” using partial pooling

<p>Tom Daula points us to this post, “Economic Predictions with Big Data: The Illusion of Sparsity,” by Domenico Giannone, Michele Lenza, and Giorgio Primiceri, and writes: The paper wants to distinguish between variable selection (sparse models) and shrinkage/regularization (dense models) for forecasting with Big Data. “We then conduct Bayesian inference on these two crucial parameters—model […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/26/economic-predictions-big-data-using-partial-pooling/">“Economic predictions with big ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 文明及其缺憾

<p>最近翻着一本弗洛伊德的书，叫《<a href="https://en.wikipedia.org/wiki/Civilization_and_Its_Discontents">文明及其缺憾</a>》（这里的缺憾我感觉译作不满好像更好一点），到今天早上看完。上次看弗洛伊德应该是大二的时候了，大约是翻过《梦的解析》之类的书，但全然不记得讲什么了，我估计当时我也没看懂，因为我至今都觉得欧洲的思想类著作贼难读，几乎没有我能读懂的，所以我一般也不会去碰这样的书。这次之所以碰这本我本来不愿意碰的书，是因为前段时间看了一个码农的演讲，里面提到了这本书。我完全是被他的一段引用给吸引到这本书上去的（这书本身我依旧没怎么看懂），但奇怪的是，我翻遍这本书也没有找到对应的中文，不知道是不是译者偷懒给漏掉了。我从谷歌图书中找到了他引用的那段文字，发现那只是一大段中的几句，下面是那一整段：</p> <blockquote> <p>One thing ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 再论如何完成自己不喜欢的事情

<p>几个月前我提过一篇关于<a href="https://yihui.name/cn/2018/06/dread-tasks/">如何做自己不喜欢的事情</a>的文章，今天再来一篇。这篇发表在 <a href="https://queue.acm.org/detail.cfm?id=3280677">ACM Queue</a> 杂志上。这种治疗拖延症的文章<a href="https://yihui.name/en/2017/09/time-management/">我已读了无数</a>，所以闭着眼也能猜出七八分它要说什么（有些讽刺意味的是，这篇文章我其实也是放在浏览器里拖延了好几个星期才看）。里面的五个策略中，只有第四个对我还有点新意，也就是公开谈论你的压力和焦虑。</p> <p>这个策略的新意在于有研究显示，如果你能用言语描述出你面对的挑战，那么你的杏仁区 ...

Source: Econometrics and Free Software

Link: A tutorial on tidy cross-validation with R

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=7T6pgZdFLP0"> <image width = "400" src="./img/cross_validation.gif" title = "Visual representation of cross⁻validation inside your computer *click for virtual weed*"></a></p> </div> <div id="introduction" class="section level2"> <h2>Introduction</h2> <p>This blog posts will use several packages from the <a href="https://github.com/tidymodels"><code>{tidymodels}</code></a> collection of packages, namely <a href="https://tidymodels.github.io/recipes/"><code>{recipes}</code></a>, <a href="https://tidymodels.github.io/rsa ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Paul Alper points to this news article in Health News Review, which says: A news release or story that proclaims a new treatment is “just as effective” or “comparable to” or “as good as” an existing therapy might spring from a non-inferiority trial. Technically speaking, these studies are designed to test whether an intervention is […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/25/3-problems-destroy-many-clinical-trials-context-papers-problems-non-inferiority-trials-problems-clinical-trials-general/">These 3 problems destroy many clinical trials (in ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 文非加粗描红不能读也？

<p>去年我说<a href="https://yihui.name/cn/2017/05/wechat/">淡出微信</a>的时候，槽点里留下了微信公众号没有展开说。其实我后来在说<a href="https://yihui.name/cn/2017/06/illustration/">配图</a>问题的时候已经算是开启了对它的吐槽。如今我是越来越烦公众号文章的排版。我并不是说排版难看好看的问题，而是极度厌恶现在流行的这种每篇文章里都有作者或小编特意加粗或描红的文字，也就是划重点。</p> <p>是我们的脑子都变成马桶了么？</p> <p>这种划重点现象不禁让我想起今年早些时候我看的大前研一的《低智商社会》，它炮轰了日本的丧文化、人们不再独立思考；等明年有空我再总结这本书。我觉得要是每篇文章都经过作者自己划一串重 ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The evolution of pace in popular movies

<p>James Cutting writes: Movies have changed dramatically over the last 100 years. Several of these changes in popular English-language filmmaking practice are reflected in patterns of film style as distributed over the length of movies. In particular, arrangements of shot durations, motion, and luminance have altered and come to reflect aspects of the narrative form. […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/24/evolution-pace-popular-movies/">The evolution of pace in popular movies</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com" ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>I ran into a colleague the other day who’d never read Proofs and Refutations (full title: Proofs and Refutations: The Logic of Mathematical Discovery). He’d never even heard of it!</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/23/hey-mathematicians-whove-never-read-proofs-refutations-whassup/">Hey! There are mathematicians out there who’ve never read Proofs and Refutations. Whassup with that??</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Wannabe Rstats-fu

Link: How To Convert A Human To Waves By Magick Package

<p>I saw this tweet about Mathematica last year, which naturally urged me to write the R version of this code.</p> <p><blockquote class="twitter-tweet"><p lang="ja" dir="ltr">Mathematicaを使って，シュレーディンガーの顔をこのようなアニメーションにされたユーザの方がいらっしゃいます。コードも掲載されています。<a href="https://t.co/IDIzM8Xfy2">https://t.co/IDIzM8Xfy2</a> <a href="https://t.co/ZTIbjtmXBm">pic.twitter.com/ZTIbjtmXBm</a></p>— Wolfram Japan (@WolframJapan) <a href="https://twitter.com/WolframJapan/status/875407701676785664?ref_src=twsrc%5Etfw">June 15, 2017</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Robert Wiblin writes: If we have a study on the impact of a social program in a particular place and time, how confident can we be that we’ll get a similar result if we study the same program again somewhere else? Dr Eva Vivalt . . . compiled a huge database of impact evaluations in […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/22/also-observed-results-smaller-studies-conducted-ngos-often-pilot-studies-often-look-promising-governments-tried-implement-scaled-versions/">“She also observed that results from smaller studies conducted by NGOs – often ...

Source: L. Collado-Torres on L. Collado-Torres

NA ...

Source: Econometrics and Free Software

Link: The best way to visit Luxembourguish castles is doing data science + combinatorial optimization

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=XQDm6I3mbMU"> <img width = "400" src="./img/harold_kumar.jpg" title = "Only 00's kids will get the reference"></a></p> </div> <p>Inspired by David Schoch’s blog post, <a href="http://blog.schochastics.net/post/traveling-beerdrinker-problem/">Traveling Beerdrinker Problem</a>. Check out his blog, he has some amazing posts!</p> <div id="introduction" class="section level2"> <h2>Introduction</h2> <p>Luxembourg, as any proper European country, is full of castles. According to Wikipedia,</p> <p>“By some optimistic ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: A Bayesian take on ballot order effects

<p>Dale Lehman sends along a paper, “The ballot order effect is huge: Evidence from Texas,” by Darren Grant, which begins: Texas primary and runoff elections provide an ideal test of the ballot order hypothesis, because ballot order is randomized within each county and there are many counties and contests to analyze. Doing so for all […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/21/bayesian-take-ballot-order-effects/">A Bayesian take on ballot order effects</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 盲人 R 用户

<p>2014 年在洛杉矶的 R 会议上我<a href="https://yihui.name/en/2014/07/a-few-notes-on-user2014/">第一次遇到</a>一位盲人 R 用户。确切地说是我一次看见盲人使用电脑。他叫 <a href="https://github.com/ajrgodfrey">Jonathan Godfrey</a>，从新西兰过来。在那次接触中，我才第一次意识到网络图片的 <a href="https://yihui.name/cn/2017/04/img-alt/">alt 属性的重要性</a>；从此我的 <code>![]()</code> 方括号从不留空。</p> <p>最近我第二次遇到一位盲人用户，叫 <a href="https://github.com/jooyoungseo">JooYoung Seo</a>，应该是一位韩国人。他是第一次在 rmarkdown ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “The hype economy”

<p>Palko writes: I have no idea whether it is real or apocryphal, but there’s an often referred to study with primates where the they earned tokens that could be exchanged for food. According to the standard version, the subjects soon came to value those tokens more than the treats they could be exchanged for. The […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/20/the-hype-economy/">“The hype economy”</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Blog on rOpenSci - open tools for open science

Link: Checklist Recipe - How we created a template to standardize species data

Imagine you are a fish ecologist who compiled a list of fish species for your country. 🐟 Your list could be useful to others, so you publish it as a supplementary file to an article or in a research repository. That is fantastic, but it might be difficult for others to discover your list or combine it with other lists of species. Luckily there’s a better way to publish species lists: as a standardized checklist that can be harvested and processed by the Global Biodiversity Information Facility ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 加餐

<p>我第一次在诗词中接触到加餐这个词是前几年看《古诗十九首》的第一首《行行重行行》，最后一句是努力加餐饭。当时觉得在浓浓古风中突然来这么一句现代白话好违和，甚至有些好笑。看了注释才知道，原来这就是当今保重这个词的古代版。想想其实都是一个意思，加餐就是为了保重嘛。大约十年前我经常跟杨阳说保重保重、保持体重！他还觉得我对保重的解释很好笑。我当时确实只是开玩笑，也没细想过。</p> <p>今日第二次见到加餐，是在看一篇<a href="https://mp.weixin.qq.com/s/KrsyjziBOa1v9_TCi6hMQg">关于脱不花的文章</a>时，里面有人提到黄庭坚的一首《鹧鸪天》（原文写错了一个字，吹柳应为吹雨）：</p> <blockquote> <p>黄菊枝头生晓寒。人生莫放酒杯干。风前横笛斜吹雨，醉里簪花倒着冠。<br ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Tom Wolfe

<p>I’m a big Tom Wolfe fan. My favorites are The Painted Word and From Bauhaus to Our House, and I have no patience for the boosters (oh, sorry, “experts”) of modern art of the all-black-painting variety or modern architecture of the can’t-find-the-front-door variety who can’t handle Wolfe’s criticism. I also enjoyed Bonfire of the Vanities, […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/19/tom-wolfe/">Tom Wolfe</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 编程之道

<p>有好事之歪果仁用《道德经》和《论语》体写了一篇《<a href="http://www.mit.edu/~xela/tao.html">编程之道</a>》。这种中西结合的文风读起来很逗，用了不少中国文化里的典故，如庄生梦蝶、庖丁解牛，不知道老外们能否理解这些梗。作者也是故意要搞笑，经常在最后一句来个急转弯，让读者翻车。</p> ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Graphs and tables, tables and graphs

<p>Jesse Wolfhagen writes: I was surprised to see a reference to you in a Quartz opinion piece entitled “Stop making charts when a table is better”. While the piece itself makes that case that there are many kinds of charts that are simply restatements of tabular data, I was surprised that you came up as […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/18/graphs-tables-tables-graphs/">Graphs and tables, tables and graphs</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Using numbers to replace judgment”

<p>Julian Marewski and Lutz Bornmann write: In science and beyond, numbers are omnipresent when it comes to justifying different kinds of judgments. Which scientific author, hiring committee-member, or advisory board panelist has not been confronted with page-long “publication manuals”, “assessment reports”, “evaluation guidelines”, calling for p-values, citation rates, h-indices, or other statistics in order to […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/17/using-numbers-replace-judgment/">“Using numbers to ...

Source: Econometrics and Free Software

Link: Using a genetic algorithm for the hyperparameter optimization of a SARIMA model

<div style="text-align:center;"> <p><a href="https://keiwan.itch.io/evolution"> <img width = "400" src="./img/tap-walker.gif" title = "Nietzsche's Übermensch"></a></p> </div> <div id="introduction" class="section level2"> <h2>Introduction</h2> <p>In this blog post, I’ll use the data that I cleaned in a previous <a href="https://www.brodrigues.co/blog/2018-11-14-luxairport/">blog post</a>, which you can download <a href="https://github.com/b-rodrigues/avia_par_lu/tree/master">here</a>. If you want to follow along, download the monthly data. In my <a href="https://www.brodrigues.co/blog/2018-11- ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: 2018: How did people actually vote? (The real story, not the exit polls.)

<p>Following up on the post that we linked to last week, here’s Yair’s analysis, using Mister P, of how everyone voted. Like Yair, I think these results are much better than what you’ll see from exit polls, partly because the analysis is more sophisticated (MRP gives you state-by-state estimates in each demographic group), partly because […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/16/2018-how-did-people-actually-vote-the-real-story-not-the-exit-polls/">2018: How did people actually vote? (The real story, not the exit polls.)</a> ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Hey, check this out: Columbia’s Data Science Institute is hiring research scientists and postdocs!

<p>Here’s the official announcement: The Institute’s Postdoctoral and Research Scientists will help anchor Columbia’s presence as a leader in data-science research and applications and serve as resident experts in fostering collaborations with the world-class faculty across all schools at Columbia University. They will also help guide, plan and execute data-science research, applications and technological innovations […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/16/hey-check-this-out-columbias-data-science-institute-is-hiring-research-scientists-and-p ...

Source: Econometrics and Free Software

<div style="text-align:center;"> <p><a href="https://youtu.be/3NxM-AL18lU?t=33s"> <img width = "400" src="./img/dank_memes.jpg" title = "What a time to be alive"></a></p> </div> <div id="introduction" class="section level2"> <h2>Introduction</h2> <p>In this blog post, I’ll use the data that I cleaned in a previous <a href="https://www.brodrigues.co/blog/2018-11-14-luxairport/">blog post</a>, which you can download <a href="https://github.com/b-rodrigues/avia_par_lu/tree/master">here</a>. If you want to follow along, download the monthly data.</p> <p>In the previous blog post, I used the ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The State of the Art

<p>Christie Aschwanden writes: Not sure you will remember, but last fall at our panel at the World Conference of Science Journalists I talked with you and Kristin Sainani about some unconventional statistical methods being used in sports science. I’d been collecting material for a story, and after the meeting I sent the papers to Kristin. […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/15/the-state-of-the-art/">The State of the Art</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Econometrics and Free Software

Link: Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=GIQn8pab8Vc"> <img src="./img/lx_aie.jpg" title = "Luxembourg's largest aircraft landing"></a></p> </div> <p>In this blog post, I will show you how you can quickly and easily forecast a univariate time series. I am going to use data from the EU Open Data Portal on air passenger transport. You can find the data <a href="https://data.europa.eu/euodp/en/data/dataset/2EwfWXj5d94BUOzfoABKSQ">here</a>. I downloaded the data in the TSV format for Luxembourg Airport, but you could repeat the analysis for any ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Robustness checks are a joke

<p>Someone pointed to this post from a couple years ago by Uri Simonsohn, who correctly wrote: Robustness checks involve reporting alternative specifications that test the same hypothesis. Because the problem is with the hypothesis, the problem is not addressed with robustness checks. Simonsohn followed up with an amusing story: To demonstrate the problem I [Simonsohn] […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/14/robustness-checks-joke/">Robustness checks are a joke</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Chocolate milk! Another stunning discovery from an experiment on 24 people!

<p>Mike Hull writes: I was reading over this JAMA Brief Report and could not figure out what they were doing with the composite score. Here are the cliff notes: Study tested milk vs dark chocolate consumption on three eyesight performance parameters: (1) High-contrast visual acuity (2) Small-letter contrast sensitivity (3) Large-letter contrast sensitivity Only small-letter […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/13/chocolate-milk-another-stunning-discovery-experiment-24-people/">Chocolate milk! Another stunning discovery from an experiment on 24 ...

Source: Andriy V. Koval on Andriy V. Koval

Link: Data Science Studio

<p>DSS is a learning community for applied researchers in the areas of health and aging. Based at the Institute for Aging and Lifelong Health (IALH) at the University of Victoria, DSS provides a learning space and a platform for cross-disciplinary dialog between UVic researchers and Vancouver Island Health ...

Source: Andriy V. Koval on Andriy V. Koval

Link: Health System Impact Fellowship

<p>The Health System Impact Fellowship (HSIF) program is founded on partnerships with health system and related organizations (e.g., public, not-for-profit, private for-profit organizations) that are committed to the program objectives, including providing enriching, stimulating and impact-oriented experiential learning opportunities for PhD trainees and/or post-doctoral fellows that accelerate their professional growth and career readiness and advance the organization’s impact goals regarding health system improvement.</p> <p>View <a href="http://www.cihr-irsc.gc.ca/e/50660.html" ...

Source: Blog on rOpenSci - open tools for open science

Link: The Antarctic/Southern Ocean rOpenSci community

Antarctic/Southern Ocean science and rOpenSci Collaboration and reproducibility are fundamental to Antarctic and Southern Ocean science, and the value of data to Antarctic science has long been promoted. The Antarctic Treaty (which came into force in 1961) included the provision that scientific observations and results from Antarctica should be openly shared. The high cost and difficulty of acquisition means that data tend to be re-used for different studies once ...

Source: L. Collado-Torres on L. Collado-Torres

Link: Asking for help is challenging but is typically worth it

<p>Recently I’ve been thinking on the subject of asking for help. In short, it’s hard to ask for help. It involves admitting to yourself that you can’t solve the problem alone, opening yourself up, hoping that another person will understand you and guide you in the right direction. Thus it can be painful if your request for help is misunderstood, met with criticism or ignored. Regardless of these obstacles, I think that the potential rewards make it worth it.</p> <p>I mostly encounter the situation of asking for help in two scenarios. One is about work, mostly R programming. The other one is ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>This op-ed by Virginia Heffernan is about g=politics, but it reminded me of the politics of science. Heffernan starts with the background: This last year has been a crash course in startlingly brutal abuses of power. For decades, it seems, a caste of self-styled overmen has felt liberated to commit misdeeds with impunity: ethical, sexual, […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/12/__trashed/">“Law professor Alan Dershowitz’s new book claims that political differences have lately been criminalized in the United States. He has it wrong. ...

Source: Rbind Support

Link: Data Science Blog: My Experiences with Data Science, Blogging, and R

Between 2012 and 2017, the demand for data scientists has increased by a whopping 650%. Why is data science such a hot topic? There are multiple factors at play: Emerging technologies are producing decidedly more data than previously and these data need to be dealt with. Recent advances in machine learning have produced new algorithms that are particularly suitable for big data. Digitization is steadily advancing and data analytics are an important aspect of this ...

Source: Homepage on Yihui Xie | 谢益辉

Link: 接受自己的普通

<p>前段时间看了一篇文章，宣称多数人过了一定年龄（三十？）之后最重要的事情就是要开始接受自己的普通。这个观点是引用了心理学家荣格。顺便提一下，荣格是《深度工作》一书中第一位登场的大人物。我本科在非典那会儿翻过的心理学家里有荣格，但后来他们的观点全忘光了。这次看见这句话，觉得有些道理，于是按惯例又去放狗搜一下出处和英文版是什么，但并没有找到，只找到他的<a href="https://www.purposefairy.com/81925/38-life-changing-lessons-to-learn-from-carl-jung/">一些其它语录</a>。抄录几句对我有所启发的话在此（英文原文请点击右键查看源代码，我写在 HTML 注释中了）：</p> <blockquote> <p>当我们不理解别人时，我们倾向于当他是傻瓜。<!-- If one ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Hey! Here’s what to do when you have two or more surveys on the same population!

<p>This problem comes up a lot: We have multiple surveys of the same population and we want a single inference. The usual approach, applied carefully by news organizations such as Real Clear Politics and Five Thirty Eight, and applied sloppily by various attention-seeking pundits every two or four years, is “poll aggregation”: you take the […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/11/hey-heres-two-surveys-population/">Hey! Here’s what to do when you have two or more surveys on the same population!</a> appeared first on <a ...

Source: ZedR Blog - Data Science . R . Stock Markets

Link: #DataHack4Fi twitter data

<p></p> ...

Source: Econometrics and Free Software

Link: Analyzing NetHack data, part 2: What players kill the most

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=VnW2g6qbbrA"> <img src="./img/monsters.png" title = "Wizard of Yendor battle music"></a></p> </div> <p>Link to <a href="https://www.brodrigues.co/blog/2018-11-01-nethack/">webscraping the data</a></p> <p>Link to <a href="https://www.brodrigues.co/blog/2018-11-03-nethack_analysis/">Analysis, part 1</a></p> <div id="introduction" class="section level2"> <h2>Introduction</h2> <p>This is the third blog post that deals with data from the game NetHack, and oh boy, did a lot of things happen since the last blog post! Here’s ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: 2018: What really happened?

<p>We’re always discussing election results on three levels: their direct political consequences, their implications for future politics, and what we can infer about public opinion. In 2018 the Democrats broadened their geographic base, as we can see in this graph from Yair Ghitza: Party balancing At the national level, what happened is what we expected […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/10/2018-what-really-happened/">2018: What really happened?</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: 2018: Who actually voted? (The real story, not the exit polls.)

<p>Continuing from our earlier discussion . . . Yair posted some results from his MRP analysis of voter turnout: 1. The 2018 electorate was younger than in 2014, though not as young as exit polls suggest. 2. The 2018 electorate was also more diverse, with African American and Latinx communities surpassing their share of votes […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/10/2018-who-actually-voted/">2018: Who actually voted? (The real story, not the exit polls.)</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>John Spivack writes: I am contacting you on behalf of the biostatistics journal club at our institution, the Mount Sinai School of Medicine. We are working Ph.D. biostatisticians and would like the opinion of a true expert on several questions having to do with observational studies—questions that we have not found to be well addressed […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/10/matching-discarding-non-matches-deal-lack-complete-overlap-regression-adjust-imbalance-treatment-control-groups/">Matching (and discarding non-matches) to deal with ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Recapping the recent plagiarism scandal”

<p>Benjamin Carlisle writes: A year ago, I received a message from Anna Powell-Smith about a research paper written by two doctors from Cambridge University that was a mirror image of a post I wrote on my personal blog roughly two years prior. The structure of the document was the same, as was the rationale, the […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/09/recapping-recent-plagiarism-scandal/">“Recapping the recent plagiarism scandal”</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal ...

Source: Homepage on Yihui Xie | 谢益辉

Link: My Biggest Regret in the knitr Package

<p>The <strong>knitr</strong> package was written in late 2011, when I knew little about character encodings. I still don’t know much about them today, but I have had enough pain. At RStudio, I have become the (self-nominated) “Character Encoding Ambassador”, mainly because I’m the only native Chinese in the company, and Chinese characters are multibyte. It is common that problems related to character encodings are reported by Chinese users. If we can fix these problems, chances are problems for other languages will disappear at the same time.</p> <p>The default ...

Source: Homepage on Yihui Xie | 谢益辉

Link: On Cosmetic Changes in Pull Requests

<p>Programmers, including myself, make cosmetic changes in code all the time, such as adding or deleting white spaces or blank lines, or re-wrapping lines, or adding <code>{ }</code> to single-line <code>if</code> statements. That is usually fine. However, when making changes in other people’s code and sending pull requests on Github, I suggest you refrain from introducing cosmetic changes. There are two reasons:</p> <ol> <li><p>Such changes require unnecessary attention of the pull request reviewer. If you don’t know if the reviewer is busy or not, always assume he/she is ...

Source: Homepage on Yihui Xie | 谢益辉

Link: Some of My JS Tricks to Enhance the HTML Output of Markdown

<p>Just because Markdown is so simple (well, “<a href="https://yihui.name/en/2018/11/hard-markdown/">simple</a>”), there will definitely be useful features missing. I want to share some of my relatively simple JS tricks to fill the gaps. Please note that I’m only talking about the HTML output, and the Markdown flavor can be arbitrary (it doesn’t have to be Pandoc’s Markdown). In fact, all tricks below are general-purpose and will work on any web pages, no matter if the pages are generated from Markdown or not.</p> <h2 id="how-to-work-with-mathjax">How to work ...

Source: Homepage on Yihui Xie | 谢益辉

Link: Things are Getting Better and Better

<p>Last month <a href="https://twitter.com/_R_Foundation/status/1050791571514413057">I suddently learned</a> that the R Core Team started a blog earlier this year. And of course, I was even more excited that the blog was based on <strong>blogdown</strong>. Given that the official R Project website had used <code><iframe></code> for nearly two decades, I was very delighted that they were finally embracing the newer world-wide web step by step.</p> <p>Although blogs and mailing lists are both open to the public, I think blogs are more public. For developers, there are always difficult ...

Source: Homepage on Yihui Xie | 谢益辉

Link: The Two Surprisingly Hard Things about the Otherwise Simple Markdown

<p>Markdown is simple in general. Over the years, however, I have observed two things that are surprisingly hard and many people have stumbled over them, including experts in my eyes.</p> <h2 id="1-how-to-write-verbatim-content-especially-text-that-contains-three-backticks">1. How to write verbatim content, especially text that contains three backticks</h2> <p>This is the thing that tortures me weekly, if not daily, because without knowing how to write verbatim text, the Markdown output will be poorly formatted. It is not rare for me to see Github issues like this:</p> <p><img ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “35. What differentiates solitary confinement, county jail and house arrest” and 70 others

<p>Thomas Perneger points us to this amusing quiz on statistics terminology: Lots more where that came from.</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/07/35-differentiates-solitary-confinement-county-jail-house-arrest-70-others/">“35. What differentiates solitary confinement, county jail and house arrest” and 70 others</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Homepage on Yihui Xie | 谢益辉

Link: Winners Take All in the Dependency World (or Hell)

<p>Yesterday I read <a href="https://twitter.com/sarah_edo/status/1059616001937960960">a tweet on the marketing of software</a>, which encourages software engineers to promote their good software packages. It seems to be reasonable at the first glance. The problem, in my opinion, is that we are now living in this miserable age in which we are deeply manipulated by the <a href="https://en.wikipedia.org/wiki/Attention_economy">attention economy</a>.<sup class="footnote-ref" id="fnref:So-deeply-manipu"><a rel="footnote" href="#fn:So-deeply-manipu">1</a></sup> At the same time, there are just so ...

Source: ZedR Blog - Data Science . R . Stock Markets

Link: African Markets indices tracker

<p></p> ...

Source: L. Collado-Torres on L. Collado-Torres

Link: A knot of threads: from CSHL to LCG-UNAM to Aldo Barrientos to diversity scholarship opportunities

<p>I can’t tell you how many times I’ve started to write this post in my mind since May 2018. Today I’m finally typing it on the computer. This will be a rather long post that ties in several threads. I’ll talk about Cold Spring Harbor’s Biology of Genomes conference and its relationship to my undergrad in Mexico. I’ll also introduce you to Aldo Barrientos (198x-2011) who was was my undergrad classmate. Then I’ll tell you about myself and how regardless of your situation, privileged or not, you should ask for help and get to know the people around you. Finally, I want to highlight that many ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Postdocs and Research fellows for combining probabilistic programming, simulators and interactive AI

<p>Here’s a great opportunity for those interested in probabilistic programming and workflows for Bayesian data analysis: We (including me, Aki) are looking for outstanding postdoctoral researchers and research fellows to work for a new exciting project in the crossroads of probabilistic programming, simulator-based inference and user interfaces. You will have an opportunity to work with […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/06/postdocs-and-research-fellows-for-combining-probabilistic-programming-simulators-and-interactive-ai/">Postdocs and ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: “Statistical and Machine Learning forecasting methods: Concerns and ways forward”

<p>Roy Mendelssohn points us to this paper by Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos, which begins: Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/06/statistical-machine-learning-forecasting-methods-concerns-ways-forward/">“Statistical and Machine ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: The purported CSI effect and the retroactive precision fallacy

<p>Regarding our recent post on the syllogism that ate science, someone points us to this article, “The CSI Effect: Popular Fiction About Forensic Science Affects Public Expectations About Real Forensic Science,” by N. J. Schweitzer and Michael J. Saks. We’ll get to the CSI Effect in a bit, but first I want to share the […]</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/05/purported-csi-effect-retroactive-precision-fallacy/">The purported CSI effect and the retroactive precision fallacy</a> appeared first on <a rel="nofollow" ...

Source: Statistical Modeling, Causal Inference, and Social Science

Link: Why it can be rational to vote

<p>Just a reminder.</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/05/why-it-can-be-rational-to-vote-2/">Why it can be rational to vote</a> appeared first on <a rel="nofollow" href="https://andrewgelman.com">Statistical Modeling, Causal Inference, and Social ...

Source: Homepage on Liechi | 張列弛

Link: 进化之一点点-3

早期的进化思想 生活在公元前六世纪的阿拉克西曼德觉得,人在远古时代应该不是现在这个样子，因为和别的动物刚出生就能又跑又跳的幼崽相比，在野外人没 ...

Source: Econometrics and Free Software

Link: Analyzing NetHack data, part 1: What kills the players

<div style="text-align:center;"> <p><a href="https://www.youtube.com/watch?v=dpM2o4dRLto"> <img src="./img/deepfried_loss.png" title = "Click here to listen to epic music while reading"></a></p> </div> <div id="abstract" class="section level2"> <h2>Abstract</h2> <p>In this post, I will analyse the data I scraped and put into an R package, which I called <code>{nethack}</code>. NetHack is a roguelike game; for more context, read my previous blog <a href="https://www.brodrigues.co/blog/2018-11-01-nethack/">post</a>. You can install the <code>{nethack}</code> package and play around with the ...

Source: Statistical Modeling, Causal Inference, and Social Science

<p>Brendan Nyhan and Thomas Zeitzoff write: The results do not provide clear support for the lack-of control hypothesis. Self-reported feelings of low and high control are positively associated with conspiracy belief in observational data (model 1; p</p> <p>The post <a rel="nofollow" href="https://andrewgelman.com/2018/11/03/reluctant-engage-post-hoc-speculation-unexpected-result-not-clearly-support-hypothesis/">“We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis”</a> appeared first on <a rel="nofollow" ...

Source: ewen

Link: ewenthemes (AKA how to mod hrbrthemes)

Achieve ggplot + website ...

Source: Homepage on Yihui Xie | 谢益辉

Link: English is Still Hard for Me, and Thoughts on (Computer) Language Wars

<p>My native language is Chinese (Mandarin). Although I started learning English in middle school 22 years ago, and even have been in the US for more than 9 years, I still struggle with English as a foreign language on a daily basis. I’m only slightly content with my English writing. For reading, speaking, and listening, they are all miserable. In particular, my listening sucks. For example, when someone tells a joke in a conversation or talk and everyone laughs, I often have to smile awkwardly like an idiot, not knowing what exactly the joke was.</p> <p>Those who know me may think ...

Source: Nan-Hung Hsieh on Nan-Hung Hsieh

NA ...

Source: Rob J Hyndman

A particular focus of water-quality monitoring is the concentrations of sediments and nutrients in rivers, constituents that can smother biota and cause eutrophication. However, the physical and economic constraints of manual sampling prohibit data collection at the frequency required to capture adequately the variation in concentrations through time. Here, we developed models to predict total suspended solids (TSS) and oxidized nitrogen (NOx) concentrations based on high-frequency time series of turbidity, conductivity and river level data from low-cost in situ sensors in rivers flowing into ...

Source: About on Likan Zhan | 战立侃

Link: R Used in Literature

1. A brief description 2. Recent Applications 3. References 1. A brief description This post will summarize the advancement of using R to analyze data obtained in behavioral and relevant areas. 2. Recent Applications Benitez & Saffran (2018) To be added To be added 3. References Benitez, V. L., & Saffran, J. R. (2018). Predictable events enhance word learning in toddlers. Current Biology. Journal ...

Source: R2

Link: Análisis bibliométrico

Este post es una síntesis de una parte de un trabajo de consultoría consistente en un análisis bibliométrico de los trabajos publicados sobre la utilización de sensores remotos en aplicaciones ambientales. De acuerdo a la OCDE (Organisation for Economic Co-operation and Development 2006) la bibliometría es el análisis estadístico de publicaciones escritas. La aplicación de técnicas cuantitativas y estadísticas al análisis de publicaciones científicas y sus respectivas citas bibliográficas es muy utilizado en la actualidad para evaluar el crecimiento y la maduración de un campo científico, sus ...

Source: EnTyrely Too Much on EnTyrely Too Much

Link: Computer Setup

<h3 id="r">R</h3> <p>Download and install the <a href="http://cran.rstudio.com/" target="_blank">R base system</a> for your operating system. I assume you use the <a href="http://www.rstudio.com/products/rstudio/download/" target="_blank">Rstudio Desktop system</a> to work with the base system. You have to scroll down to find the installer for your operating system. When installing these you can accept all the default options.</p> <p>You should also install package <code>tidyverse</code>. While connected to the internet, start up RStudio, go to the console prompt</p> <p><img ...

Source: Alison Presmanes Hill on Alison Presmanes Hill

Link: Skills

NA ...

Source:

Link: Daily R

Source:

Link: Daily R

Source:

Link: Daily R Archives

Source: Home on Emi's Blog

Link: About

This blog is where I host my trivial musings. I’m more of a listener in life. Blogging or being public doesn’t come naturally to me but I think sometimes I have something worthwhile to share. Sometimes. I have odd ideas and plans about life. One of my idea is that you ought to work hard while you are young and able so my current plan is to work hard now but 50+, I plan to switch to managerial role and make room for the young ones to work ...

Source: Alison Presmanes Hill on Alison Presmanes Hill

Link: Experience

NA ...