(no subject)
May. 22nd, 2020 10:07 amДобавление ко вчерашнему:

the data in Figure 2 shows a decrease in infection rates after countries eased national lockdowns with >99% statistical significance""

the data in Figure 2 shows a decrease in infection rates after countries eased national lockdowns with >99% statistical significance""
no subject
Date: 2020-05-23 10:07 pm (UTC)Without describing the model for the null 99% makes no sense. And the definition of the null is non-trivial -- for example it does not make sense to compare this to iid data, since the rates in different states or countries are obviously correlated. Thus, giving a number like 99% seems manipulative, designed to give some scientific veneer to potentially problematic inferences.
no subject
Date: 2020-05-24 04:01 pm (UTC)no subject
Date: 2020-05-24 04:18 pm (UTC)First we have to make sure that there is a baby. For that we need to know something about their methodology, which they seem to be hiding.
I actually find it hard to believe that those JPM guys do not understand statistics or how to properly deal with data. They seem to be solid quantitative types. Therefore it feels like they have some agenda.
no subject
Date: 2020-05-24 04:38 pm (UTC)But yes, the guy is very smart and silly mistakes are not to be expected.
no subject
Date: 2020-05-24 06:32 pm (UTC)no subject
Date: 2020-05-24 07:03 pm (UTC)no subject
Date: 2020-05-24 11:18 pm (UTC)Could you remind me: if we're testing a linear regression, what is the null hypothesis? I don't recall this being discussed in the last class I took almost 30 years ago, nor in anything I've read later. I may be wrong but I believe it is not necessary b/c we're hiding behind the CLT.
no subject
Date: 2020-05-25 01:53 pm (UTC)Take this with a grain of salt -- I have not looked carefully at such problems in the testing context.
no subject
Date: 2020-05-25 05:00 pm (UTC)no subject
Date: 2020-05-25 11:38 pm (UTC)no subject
Date: 2020-05-26 02:51 am (UTC)We test the alternative (that it does not - w/o specifying the nature of the alternative), and if we reject it then we choose to live with the opposite, i.e. the one that we want to accept in the first place. We know we can't prove it but we rejected the alternative, so we deem it good to go.
no subject
Date: 2020-05-26 03:40 pm (UTC)More precisely there are two ways to do hypothesis testing: 1. you can reject the null (i.e. prove that the null does not hold with high probability in your model). This is called Fisher testing.
2. You choose between two hypotheses A and B. This is called Neyman-Pearson hypothesis testing.
There is a nice discussion on wikipedia: https://en.wikipedia.org/wiki/Statistical_hypothesis_testing
It is pretty subtle stuff actually.
no subject
Date: 2020-05-26 04:30 pm (UTC)1. Compute from the observations the observed value t_obs of the test statistic T.
2. Calculate the p-value. This is the probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed.
3. Reject the null hypothesis, in favor of the alternative hypothesis, if and only if the p-value is less than (or equal to) the significance level (the selected probability) threshold
What exactly is null hypothesis? That we got our "good fit" by accident, right? No distributional assumptions are required.
Случайно наткнулся на экономиста, который решил написать серию заметок типа введения в статистику.
https://hroniki-paisano.livejournal.com/118867.html
https://hroniki-paisano.livejournal.com/119264.html
Интересно, что он тоже не заморачивается спецификацией нулевой гипотезы.
no subject
Date: 2020-05-26 08:42 pm (UTC)For example what is 99% in your original post refers to? I have no idea, but here is one possibility: assume that the point has equal probability to be above and below the line and assume that the points are sampled iid. Under this model the picture shown in the graph is very unlikely and we can reject this hypothesis. However, iid is an key assumption here. If these data are not iid, this conclusion in general cannot be made.
For example, imagine that the rates are perfectly correlated between the states. Then either all of them will be below the line or all of them will be above the line.
no subject
Date: 2020-05-26 09:57 pm (UTC)I thought your original complaint was that the model that we hope to reject based on low p-value is not stated.
no subject
Date: 2020-05-26 11:50 pm (UTC)no subject
Date: 2020-05-26 11:55 pm (UTC)99% is not the worst actually -- sometimes people have 99.9999% or something like that and you immediately suspect something is fishy.
no subject
Date: 2020-06-27 06:10 pm (UTC)no subject
Date: 2020-07-08 05:00 pm (UTC)You can take a look at a simple synthetic model with linear regression, where the features of this curve can be seen: https://arxiv.org/abs/1903.07571