By Andy May
The argument about the proper way to estimate error in the European Centre for MediumRange Weather Forecast (ECMWF) ERA5 weather reanalysis dataset between Nicola Scafetta and Gavin Schmidt has finally been published by Geophysical Research Letters. Schmidt, Jones, and Kennedy’s comment is here (Schmidt, Jones, & Kennedy, 2023), and Scafetta’s response is here (Scafetta N., 2023a).
I first wrote about this dispute earlier in the year here. Nothing much has changed in the final versions.
Schmidt, Jones, and Kennedy’s assessment of the error in the ERA5 surface temperature dataset average still (incorrectly) assumes that, during such a period, the global surface temperature was constant from 20112021 and that its yearly variability is due to random noise. This is clearly a nonphysical interpretation of Earth’s climate, since there are real systematic changes in the climate from year to year, whether one assumes they are due to natural or manmade forces, or both.
By conflating natural and manmade climatic forces with random noise Schmidt, Jones, and Kennedy inflate the real error of the temperature mean by 5–10 times. In fact, a proper analysis of the ensemble of observed global surface temperature members yields a decadalscale error of about 0.01–0.02°C, as reported in published records. BEST (Berkeley Earth Land/Ocean Temperature record) derives an error of +/ 0.018 0.020 °C for the 11year period 20112021 (19511980 anomalies and the April 2023 version of the BEST dataset). Instead, Schmidt, Jones, and Kennedy assessed the error using the standard deviation of the mean (see Chapter 3 here) from the period 20112021. The equation they use is an equation that can only be used when there are multiple measurements of the same quantity, not eleven annual estimates for eleven different years. It cannot be used to properly estimate the error of a quantity, in this case the average surface temperature of the Earth, that changes naturally and possibly due to human emissions, from year to year.
Scafetta’s original paper, the reason for the dispute can be downloaded here. In the paper Scafetta shows that all IPCC/CMIP6 climate models that result in an ECS^{[1]} that is greater than 3°C warming per doubling of CO_{2} overestimate observed global warming at a statistically significant level. How to determine what is statistically significant is at the heart of the dispute. But statistics or not, Scafetta’s point is apparent in figure 1. When in doubt look at the data.
In figure 1, the observations are from ECMWF ERA5. Clearly, if CO_{2} and other greenhouse gases are causing all the recent warming, as the IPCC AR6 report claims (IPCC, 2021, pp. 425 & 961962), the climate sensitivity we are observing is lower than 3°C. Scafetta’s analysis of ECS is very compelling, but there is still more evidence that the higher AR6 ECS estimates are incorrect. For more on this subject, see my four part series on the mysterious AR6 ECS: Part 1, Part 2, Part 3, and Part 4. There is also a very good summary of observational estimates of ECS, and a critique of the AR6 methods of determining ECS in Chapter 7 of the Clintel volume on AR6, here.
Works Cited
Crok, M., & May, A. (2023). The Frozen Climate Views of the IPCC, An Analysis of AR6.
IPCC. (2021). Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. In V. MassonDelmotte, P. Zhai, A. Pirani, S. L. Connors, C. Péan, S. Berger, . . . B. Zhou (Ed.)., WG1. Retrieved from https://www.ipcc.ch/report/ar6/wg1/
Scafetta, N. (2022a). Advanced Testing of Low, Medium, and High ECS CMIP6 GCM Simulations Versus ERA5T2m. Geophysical Research Letters, 49. doi:10.1029/2022GL097716
Scafetta, N. (2023a). Reply to “Comment on ‘Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5T2m’. Geophysical Research Letters, 50. doi:10.1029/2023GL104960
Schmidt, G. A., Jones, G. S., & Kennedy, J. J. (2023). Comment on “Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5T2m”. Geophysical Research Letters, 50. doi:10.1029/2022GL102530
Taylor, J. (1997). An Introduction to Error Analysis, second edition. University Science Books. Retrieved from https://www.amazon.com/IntroductionErrorAnalysisUncertaintiesMeasurements/dp/093570275X/ref=monarch_sidesheet

ECS is the equilibrium climate sensitivity, or the ultimate change in global average surface temperature after an instantaneous doubling of CO_{2}. See here for more details. ↑
The issue here is deciding whether the difference between the earth observations and the model results is significant, for various classes of models. Well, there is the eyeball test, as Andy applies. But Scafetta claimed to use advanced methods. Here is an expanded version of that plot:
It’s unbelievably primitive. He just averages the models over the time period, and the observations, to give the result on the right, and then applies the eyeball test to that. There is a lot wrong with that; the variable that he is averaging is clearly not stationary, so it would indeed be hard to find a statistical test. But the eyeball cannot make up for that.
But the error that should be obvious here is that he allows no uncertainty in the observations. None at all. Now Andy objects that Schmidt et al have allowed too much, but zero has to be wrong. This matters, because the outcome is a claim that observations couldn’t have been in the group of model results. If you allow no uncertainty, of course that is more unlikely.
Normally the uncertainty of the mean would be derived from the stantard deviation of yearly results, which as you can see is substantial.
The uncertainty is different from the one for records, as with Best etc. That is uncertainty relative to given weather. But for this test, you also have to include weather uncertainty.
Suppose you had an ideal model, which would be a planet B (Earth is A), similar in all respects, including rising GHG. You want to know if the climate is different, so you watch both for a decade. In what ways might the results differ and still have the same climate? Well, measurement, for one, and also sampling error (different choice of locations give different results). But also they experiene different weather. The ENSOs will happen at different times, for example. All aspects of weather will be different, and you have to allow for all that before you can say nthe climate is different. That is what Schmidt et all correctly did.
Correction, the numbers on the right are not the mean, but the “warming”, as determined by the trend. But the same objections apply, even more so. The models can be said to have uncertainty estimated by scatter, but the trend of observations (blue) most certainly has uncertainty too, as in any regression.
I’m shocked there’s any kind of uncertainty in climate science? But… but… I thought “the science is settled”.
Stokes has zero clues about what measurement uncertainty is, just like his noisy disciples have zero clues.
So do you think it is zero, as Scafetta used?
Uncertainty is not error, they are both wrong.
Both Models are lacking forecast skill and based on the bogus AGW conjecture which has long been shown to be crap.
1) No Hot Spot exists.
2) No Positive Feedback Loop exist and never existed in the last Billion years.
3) The Sun/Ocean effect is routinely ignored which is stupid.
All because of the stupid insistence of slavishly hanging onto a dead AGW conjecture.
This is why I don’t take these climate models seriously.
Nick never responded to the mistakes he made on the thread that shows that climate models are based on the wrong dynamical system of equations.
Ignoring the Sun’s cycles of solar output in the models is a big mistake
As is getting clouds wrong.
Whenever you box Nick in he goes silent running. Then he pops up somewhere else and launches torpedoes, “Fire!, range, mark” in his ready shoot aim fashion. Soros funds him by the post and not the word
Hi Nick,
Scafetta is not assuming zero error in the observations, he is assuming the error recommended by BEST:
Check the link behind “published records.” This is not the exact dataset that Scafetta used, he downloaded in April, and it has been revised, but it is very close. Do your own analysis of the error, you will come up with Scafetta’s value, not Schmidt’s, as BEST did.
Estimating error properly means not including the quantity you want to estimate, which is what Schmidt did when he assumed all the variation over 11 years was random and not real climate variability.
“he is assuming the error recommended by BEST”
Where? You may have assumed that, but I can’t see any mention in Scafetta’s paper. As you see from Fig 1 that I posted, he just says for observed warming
ΔT=0.56C
No uncertainty quoted. Just a single point on the graph.
And as I said, the BEST uncertainty is inappropriate anyway, because it is for fixed weather. To decide if the climate is different, you have to allow for weather uncertainty as well.
“Do your own analysis of the error, you will come up with Scafetta’s value”
Scafetta’s value is 0.00000
Nick,
He explains the computation of error in Appendix 1 of his paper. He actually evaluated the error of a number of products, because ERA5 does not supply error for their products.
Scafetta’s error is not zero, as stated it is between 0.01 and 0.02, and conforms to many other estimates. I bet you cannot find one that matches Schmidt’s.
Andy,
Scafetta’s paper is here. There is no appendix. No error is considered in the paper.
As I said, the BEST estimate for a year would not be right anyway, because it is for fixed weather. Since he is talking about warming, based on regression trend, the first error to consider would be the normal uncertainty of the trend. For BEST, for warming from 1980 to 2022, that would be ±0.16C.
that would be ±0.16C.
Oops, should be ±0.08C. The range is 0.16.
Nick you have the wrong paper. The error analysis is in this one:
I’m talking about the paper Gavin et al commented on (your link, in your first para):
and to which Scafetta replied – again your link, first para. It is the one listed in your works cited
Scafetta, N. (2022a). Advanced Testing of Low, Medium, and High ECS CMIP6 GCM Simulations Versus ERA5T2m. Geophysical Research Letters, 49. doi:10.1029/2022GL097716
This other paper is not cited in your article.
Nick, you appear to be commenting on a research letter published in March 2022. Scarfetta’s paper appears to have actually been published in September of 2022 and looks a little different to the letter that Schmidt et al commented on.
I could be very confused here as its difficult to follow and I haven’t followed it closely but it looks like
Research letter by Scarfetta in March 2022
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2022GL097716
Reply to that letter also September 2023 by Gavin et al
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2022GL102530
Actual paper published by Scarfetta in September 2022
https://link.springer.com/article/10.1007/s0038202206493w
It now has a different name so that doubles how confusing it is but you can see from the abstract that they’re still grouping the GCMs for analysis.
eg from the letter
“Scafetta (2021a) tested several CMIP6 GCMs against some temperature records and found that the datamodel agreement improves for the models with lower ECS. Herein, we provide a complementary and more robust statistical approach by grouping the same models into three subensembles according to their ECS: lowECS, 1.8–3.0°C; mediumECS, 3.01–4.50°C; and highECS, 4.51–6.0°C. “
and from the paper
“The Coupled Model Intercomparison Project (phase 6) (CMIP6) global circulation models (GCMs) predict equilibrium climate sensitivity (ECS) values ranging between 1.8 and 5.7°C. To narrow this range, we group 38 GCMs into low, medium and high ECS subgroups and test their accuracy and precision in hindcasting the mean global surface warming observed from 1980–1990 to 2011–2021 in the ERA5T2m, HadCRUT5, GISTEMP v4, and NOAAGlobTemp v5 global surface temperature records.”
I could be wrong but it looks like Gavin et al are commenting on an earlier version of the analysis.
“accuracy and precision in hindcasting ” ???
With so many parameter and fudge factors…
… surely they can get a hindcast to agendafabricated data correct !!
You have the links and sequence correct, and they are also the links given by Andy. It’s very clear what Gavin commented on; it is spelt out in the title. The paper Andy mentions has not previously been referenced here.
Nick writes “It’s very clear what Gavin commented on; it is spelt out in the title.”
Yes, but why? Why comment on a research letter now which I’m taking to be a preliminary analysis from well over a year ago when the actual paper has been available for a year too?
Good point. To me it is very clear that Schmidt, Jones and Kennedy are being disingenuous.
“Why comment on a research letter”
A long story, set out here. They actually wrote and submitted their response within a few days. The holdup was that the journal policy was in some confusion, and when they sorted that out, it required that the comment and author’s response be published together. But Scafetta’s response ran into trouble with the reviewers.
Yes, it has, both in my original post and in Scafetta’s reply.
Tim,
This argument has been going on for a very long time. But, the most recent comment and reply articles cited in the post have been reviewed and updated. They should include all of the various papers on the subject.
Nick,
The paper on the statistics and the appendix you seem so concerned about are mentioned in the Abstract of Scafetta’s reply and in my original post on this dispute. This is from my original post:
This short blog post was just an update and summary of a very long discussion on this subject. Schmidt, Jones, and Kennedy’s high school mistake in statistics should be obvious to everyone but caughtthey now have dug in their heels. You shouldn’t even need the Appendix. And if you are going to argue this rather obvious and trivial point to such an extreme, you should have at least read Scafetta’s reply or my original post that went into much more detail.
Andy,
this is a total switcheroo. Gavin etc described the errors in the paper you cited. They are glaring, especially the total absence of uncertainty for observations. Gavin et al did not say anthing about this other paper. It is not the subject of the dispute you mention. Whether it is wrong or not, I am not going to chase up now. The fact is that the paper you cited, about which the comments were written, was wrong.
Nick writes
But why? If the “other paper” is the most recent analysis on the subject by Scarfetta, why comment on the previous analysis over a year later?
This whole thing looks suspicious to me.
If the newer paper is statistically sound then they would appear to have pulled a “Streisand effect” on it.
Now, you decide what papers to include in the argument and which to ignore. You are no better than Gavin. The paper I cited is correct. Schmidt had ample time to consider Scafetta’s reply and his other papers and chose to ignore everything, just like you. Childish.
“Now, you decide what papers to include in the argument and which to ignore.”
No, you decided. You listed them explicitly in your “works cited”.
Scafetta, N. (2022a). Advanced Testing of Low, Medium, and High ECS CMIP6 GCM Simulations Versus ERA5T2m. Geophysical Research Letters, 49. doi:10.1029/2022GL097716
Scafetta, N. (2023a). Reply to “Comment on ‘Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5T2m’. Geophysical Research Letters, 50. doi:10.1029/2023GL104960
Schmidt, G. A., Jones, G. S., & Kennedy, J. J. (2023). Comment on “Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5T2m”. Geophysical Research Letters, 50. doi:10.10
You even said, explicitly
“Scafetta’s original paper, the reason for the dispute can be downloaded here”
linking to that paper. Now, when the faults in that paper are very evident, you want to talk about some paper that the article didn’t mention or allude to anywhere.
“ Schmidt had ample time to consider Scafetta’s reply”
No, they wrote and sent in their comment within a few days. The rest of the time was journal processes and waiting for Scafetta to produce a response that would pass refereeing. But anyway, it stands as a valid comment on a published paper. It was wrong.
“Schmidt, Jones, and Kennedy’s high school mistake in statistics should be obvious to everyone but caughtthey now have in their heels.”
Schmidt et al are using high school (well, 101) correctly. You and Scafetta are talking nonsense. I see now that Scafetta is measuring warming by subtracting the mean of the last decade from the mean of the first. Regression would be better, but no matter. SJK say that the uncertainty of each mean is just the standard error (SEM). That is basic, as is the way of combining in a difference (by quadrature).
It’s no use putting up smokescreens about Best, sources of error etc. The SEM derives from the observed variability, as measured by the sd. You can’t get less than that. You might get more if there is autocorrelation.
None are so blind as those who will not see. I’ve nothing more to add.
He is thread fogging again because he is commonly in a state of confusion himself.
True, Nick doesn’t know which end is up. You always include all the relevant papers, ignoring them only makes you look a fool. I know for a fact that Scafetta’s second paper was emailed directly to Schmidt well in advance, and he deliberately ignored it. Totally high school.
Yeah, I see that a lot over the years when they don’t provide all the relevant papers.
They did that to Bob Tisdale and John McIntire too.
Oh dear, seems Nick is only partially educated.
And quite unable to learn what he doesn’t want to understand.
It is sad, but I don’t think Nick, Schmidt, Jones, or Kennedy realize what they are doing. Climate science is now so corrupt, everyone in it thinks it is OK to ignore data, and just choose the data they want, simply because it supports their personal agenda. Everyone is just AR6 these days.
If Nick, Schmidt, Jones, Kennedy, etc. don’t realize what they are doing then neither does NIST, JCGM, UKAS, and every other standards body and statistical texts.
Wrong, it is you and Nick that don’t understand what those standards bodies do. Or basic statistics texts.
You do not know what uncertainty really means.
The GUM defines experimental standard uncertainty as the dispersion of the values that could reasonably be attributed to the measurand. The center of the uncertainty interval is the expected value, that is, the mean of the data. This makes the Standard Deviation the proper measure of uncertainty.
The experimental standard uncertainly of the mean is a measure of where the mean may lay taking into account sampling error. It is not a measure of uncertainty in the observed data.
The experimental standard uncertainly of the mean is a measure of the distribution around q̅, the estimated mean. In other words, the distribution of the sample means but not the data. (See GUM B.2.17 and B.2.18)
Lastly, anomalies are the result of subtracting two random variables, a monthly value and a baseline value. Common statistical practice allows that the anomaly inherits the sum of the variances of the two parent random variables. This variance must be treated further than just finding the variance of of the much smaller numbers representing anomalies. Don’t forget these are not temperatures, they are ΔT values that have their own variance.
Trendology at its nadir, this.
What mistake?
The mistake is using a 10year standard deviation to calculate measurement error. 10 year includes actual climate change, not just measurement error.
They use the standard deviation divided by the sqrt of N which is how Taylor, Bevington, NIST, JCGM, UKAS, etc. all say to do it. And if there is any question there is an example provided by NIST that is pretty close to the use case in question here.
If N is very large, then there’s little difference between N and N1. However, you’re supposed to do the math correctly. One is the SD of a population and the other is the SD of a sample–not exactly the same thing.
Nobody is saying you shouldn’t use the sample SD nor is anyone saying you should assume N equals DOF. The point Nick, Bellman, and I are making, which is supported by Taylor, Bevington, NIST, JCGM, UKAS, etc. is that you apply the law of propagation of uncertainty which results in the division by a square root when the measurement model is an average. A lot of people here erroneously assume the division by a square root is optional. That’s all that is being said here. We can’t get into the minutia of details until everyone accepts the law of propagation of uncertainty first.
No, the real truth is that this lot of trendologists you list here ABUSE all the metrology texts and pound them into your square hole.
The division by the square root gives you the SEM. That is only applicable when the distribution is random and Gaussian and you can assume that all measurement uncertainty is random, Gaussian, and cancels.
If the distribution is skewed, which the temperature data set surely is then the SEN is not the proper measure of uncertainty. If measurement uncertainty is not random and Gaussian, and the temperature data set is assuredly not random and Gaussian, then the SEM is not the proper measure of the uncertainty.
Why does everyone in climate science want to assume that combining winter temps with summer temps does not produce a multimodal distribution? They have different average temps and different variances. It’s like saying the average value of Shetland ponies and quarter horses gives you a meaningful average and the uncertainty is the SEM!
Again, you have to know what the prerequisites for using this formula is. In the case of standard deviation divided by the square root of the number of samples is for “…N measurements of the same quantity….” to use the words of Taylor. The meaning of “same quantity” is what you need to understand.
For temperature measurements of free open space air with varying humidity and pressure the meaning of this prerequisite is that you need N measurements at the same place and time in order to use the formula. In weather measurements you clearly do not fulfil that requirement.
Taylor also states quite clearly that systematic errors are not affected by the number of measurements.
Just not true. You need to quote the full context of Taylor. Here is an example from Possolo at NIST doing just what you say they can’t do. They used exactly that formula to get uncertainty of a monthly average of daily thermometer readings. Key section:
Wrong, Nitpick.
An air temperature measurement has a sample size of exactly ONE, your holy and precious SEM is a red herring that DOES NOT APPLY,
Nick,
You have no clue.
Same location, same device, same month. You’ll notice that only Tmax is involved. Have you ever heard of repeatability conditions?
GUM
“””B.2.15
repeatability (of results of measurements) “””
“””closeness of the agreement between the results of successive measurements of the same measurand carried out under the same conditions of measurement
NOTE 1 These conditions are called repeatability conditions.
NOTE 2 Repeatability conditions include:
— the same measurement procedure
— the same observer
— the same measuring instrument, used under the same conditions
— the same location
— repetition over a short period of time.
NOTE 3 Repeatability may be expressed quantitatively in terms of the dispersion characteristics of the results. “””
“Same location, same device, same month.”
So OK, now it doesn’t have to be a measurement of the same thing, just taken in the same month. Where is that in GUM?
How is measuring the max temperatures on successive days a repetition?
It is the same thing. Can you not read
The measurand is declared as the average Tmax for one month, at one station. Notice it is not Tavg because that involves two things.
From NIST 1900:
“””Questions are often asked about whether it is meaningful to qualify uncertainty evaluations with uncertainties of a higher order, or whether uncertainty evaluations already incorporate all levels of uncertainty. A typical example concerns the average of n observations obtained under CONDITIONS OF REPEATABILITY and modeled as outcomes of independent random variables with the SAME MEAN µ and the SAME STANDARD DEVIATION σ, both unknown a priori. “””
“””EXAMPLES: Examples E2, E20, and E14 involve multiple observations made under conditions of repeatability.”””
“””variance (squared standard deviation) of a sum or difference of uncorrelated random variables is equal to the sum of the variances of these random variables. Assuming that the weighings are uncorrelated, we have u2(mP) = u2(cP) + u2(cE) + u2(cR 2 − cR,1) exactly.”””
B.2.15 repeatability (of results of measurements)
closeness of the agreement between the results of successive measurements of the same measurand carried out under the same conditions of measurement
NOTE 1 These conditions are called repeatability conditions.
NOTE 2 Repeatability conditions include:
— the same measurement procedure
— the same observer
— the same measuring instrument, used under the same conditions
— the same location
— repetition over a short period of time.
NOTE 3 Repeatability may be expressed quantitatively in terms of the dispersion characteristics of the results. [VIM:1993, definition 3.6]
Please note that TN 1900 was created very carefully to meet these conditions.
”EXAMPLES: Examples E2, E20, and E14 involve multiple observations made under conditions of repeatability.”
You give just half the para. It goes on:
“EXAMPLES: Examples E2, E20, and E14 involve multiple observations made under conditions of repeatability. In Examples E12, E10, and E21, the same measurand has been measured by different laboratories or by different methods.”
BFD, what’s your point, Nitpick?
I didn’t leave anything out! We were discussing E2. E12, E10, and E21 do not apply.
As a word of caution, are you familiar with interlaboratory testing procedures? I’ll bet not. I have nothing but a passing knowledge of how it is done so I am not an expert on that.
Yep! Like his disciple Bellman, he just poked around and found a random quote to generate another Stokes Patented Red Herring.
It appears that the example that you are using is for one station, one thermometer, not thousands of stations and thermometers. It therefore meets the requirement of one measurand being measured by the same instrument. One month is short enough any trend is probably negligible. Therefore, we can define the timeseries as also meeting the requirement of stationarity:
https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc442.htm
[I consider the above URL to be a citation. If you don’t agree, please state why.]
However, longer temperature timeseries will not meet the requirement of stationarity without processing that needs to be explicitly noted, which I don’t recollect ever seeing in anything that you have posted.
What requirement? I want a link to text that says the inputs into a measurement model must be of the same thing. I then want you to reconcile that with the fact that every single example in JCGM 100:2008 and NIST 1900 are of measurement models that not only accept inputs of different things, but usually of things with completely different units measured by completely different instruments.
BTW keep in mind that NIST TN 1900 E2 takes 28 different temperature measurements and uses them as inputs into the measurement model that computes the monthly average. NIST then does a type A evaluation of uncertainty of that measurement model using those 28 different temperature measurements. Schmidt did something very similar.
The time series in NIST TN 1900 E2 is not stationary.
If one is dealing with nonstationary data, it reflects that both the mean and standard deviation are changing with time and there is no single value for them or any parameter derived from them. One cannot make comparisons and determine statistical significance and make inferences about whether they represent the same population.
NIST computed the average of a nonstationary time series of temperatures and it’s corresponding uncertainty so I’m not understanding the relevance of stationarity here.
Did you not read what Clyde said?
Also,
From TN 1900:
Look at that measurement equation closely. It links the data to the measuranc. It is what lets NIST determine the monthly average Tmax.
To use several months one must meet the short time repeatability condition. As Clyde says, longer periods cause means and uncertainty to vary. Maybe you have a way to get around that. If so, state it.
“It therefore meets the requirement of one measurand being measured by the same instrument”
So what is the one measurand? The temperature on the 1st, or the temperature on the 21st? Do you expect them to be the same?
That is the giveaway. When you measure on the 2nd, noone ses that as reducing the uncertainty of what you measured on the 1st. It is not a repetition.
I just cannot see the point of your link to the definition of stationarity. It isn’t even about measurements.
The one measurand is what TN 1900 said, the monthly average of Tmax.
It is a measurement of different experiments, one for each day recorded.
Do you understand what experimental means?
The measurand is what you define it to be! It may be the result of 10 different chemical reactions involving 5 different reactants. You may weigh everything carefully, you may use micropipettes, but there will be differences, the GUM calls them influences, such that each experiment will result in a different value. The SEM is nice to know, but the detail but it is very unlikely that anyone trying the same experiment will ever get the “value ± SEM. The important fact for someone duplicating the experiment is to know the range of values they may get, i.e., the Standard Deviation.
“The one measurand is what TN 1900 said, the monthly average of Tmax.”
So it has to be measured with the same instrument. What instrument measures the monthly average?
How do you have repetitions?
You truly don’t know much about measurements do you. I’m sure that to you numbers are exact.
From TN 1900:
“””The daily maximum temperature r in the month of May, 2012, in this Stevenson shelter, may be defined as the mean of the thirtyone true daily maxima of that month in that shelter.”””
“””This socalled measurement error model (Freedman et al., 2007) may be specialized further by assuming that ε1, …, εm are modeled independent random m variables with the same Gaussian distribution with mean 0 and standard deviation σ. In these circumstances, the {ti} will be like a sample from a Gaussian distribution with mean r and standard deviation ( (both unknown).”””
These are declaring the measurand to be the mean of Tmax for the days in the month and the standard deviation to be the dispersion in the experimental measurements. Because the distribution is ASSUMED to be Gaussian and of the same thing, an expanded experimental standard deviation of the mean can be used as a measure of the uncertainty in the mean estimate.
I don’t necessarily agree with the use of the expanded experimental standard deviation of the mean as the appropriate uncertainty to use, it is very well defined as to what is being used and is a choice the scientist can make.
Whether an interval of 1.8 °C is used or 4.1 °C is immaterial. They are both far above what climate scientists use to justify anomalies with one onethousandths of a degree.
You need to answer why you think running multiple experiments are not repetitive measurements of the same thing.
Bellman calls any uncertainty other than the little wiggles in an anomaly graph “hypothetical”.
This says everything one needs to know about this lotl.
You really are obsessed with me. Even when I’m taking no part in the discussion you still insist on dragging my name into it, without a link or quote to what I actually said.
The fact is any uncertainty estimate will be hypothetical. That’s not a bad thing, it’s how science works. You have a hypothesis and then try to find evidence to support or falsify it.
In this case what I think I said, was that your and Tim Gormans, accusation that UAH had uncertainties of multiple degrees was contradicted by the evidence that across the entire 500+ monthly anomalies, the standard deviation was only about 0.25°C. I may have been commenting on Tim quoting the old “a beautiful hypothesis killed by an ugly fact” meme.
You, of course, argue that empirical evidence is irrelevant as your calculations cannot be wrong. You will claim that all the uncertainty is caused by a hypothesized systematic error, which can not be detected so is unfalsifiable. Which is convenient, but also difficult to accept when we are talking about anomalies.
I never comment on UAH uncertainty. They use a totally different system to make measurements and calculations. I am not knowledgeable of all the satellite uncertainties from the measuring devices to orbital variances and any judgment would be pure conjecture. I doubt you are either.
I am familiar with terrestrial physical measurements using typical SI units.
So do you think measuring successive daily maxima are repetitive measurements of the same thing? They certainly are not repetitive mesurements of what you now say is the measurand, the monthly average.
Nick,
“””They certainly are not repetitive mesurements of what you now say is the measurand, the monthly average.”””
Geez, have you ever done repetitive experiments in a basic physics or chemistry lab to determine something like the force of gravity or the mass of a product isolated by filtering?
There isn’t a gravity meter you can use or a product mass meter you can use to weigh a product while in solution.
You take repetitive measurements of measurable quantities that can be averaged to obtain a mean μ and a standard deviation σ that is used to state a value and the dispersion of data around the mean.
If you want the average Tmax temperature for a period of time, i.e., a month, you take daily Tmax measurements to determine the measurand you desire.
I really don’t know why you are asking these basic questions of measurement while questioning the analysis of experimental data.
The unstated assumption is that the gravitational acceleration for a specific location doesn’t change with time. Thus, while one may not actually be measuring the ‘same’ force every time, being unchanging, it is equivalent to measuring the same force, for all practical purposes. This speaks to the issue of stationarity. It doesn’t matter when one measures a physical constant, but it does matter when one measures a temperature. That is why the Tobs (time of observation) became an issue in correcting temperatures derived from satellites with degrading orbits.
The measurand is implicitly defined as the daily samples of Tmax during a single month, where the unstated assumption is that a single month has a negligible trend, and the mean for the month represents the average monthly high temperature, with the uncertainty represented by random variations caused by clouds, changing wind directions, and low/high pressure systems moving through the station site.
As I have pointed out before, one isn’t strictly measuring the same thing multiple times, but it is a good approximation for the impossible. It reflects on the carelessness of researchers that they don’t bother to explicitly state all their assumptions and compromises. But then, I suspect they haven’t given a lot of thought to what they are doing and why.
It is always about measurements, because that is how a timeseries is obtained.
“It reflects on the carelessness of researchers that they don’t bother to explicitly state all their assumptions and compromises. But then, I suspect they haven’t given a lot of thought to what they are doing and why.”
You pretty much nailed it. You would think that the realization that different Tmax and Tmin values can result in the same midrange value would be an indication that the midrange value is not a good index for climate.
That is the ultimate in actual measurement uncertainty. If climate is what you are trying to measure and you can’t separate one climate from another based on your measurement then exactly what do you really know about the climates you are measuring? Ans: Nothing.
The trend in NIST TN 1900 E2 is 0.22 C/day. That is hardly what I’d call negligible.
Yep. Exactly as Nick, Bellman, and I have pointed out numerous times. Not that we should have needed to since NIST explicitly states it right there in the example. BTW…there is a 3rd source of uncertainty included in the example as well…the time of day of the measurement.
Likewise, Schmidt’s uncertainty includes the variation caused by ENSO, solar cycles, and other heat flux oscillations into and out of the atmosphere plus a bunch of other components. Not only does Scafetta not account for any of that, but he doesn’t even account for measurement uncertainty.
“””BTW…there is a 3rd source of uncertainty included in the example as well…the time of day of the measurement.”””
NIST captures this in the following:
“””The {εᵢ} capture three sources of uncertainty: natural variability of temperature from day to day, variability attributable to differences in the time of day when the thermometer was read, and the components of uncertainty associated with the calibration of the thermometer and with reading the scale inscribed on the thermometer.”””
Funny how you bring up the trend but not other statistical items. How about the distribution? See the images I have attached. Is this skewed and does a Students T match it well enough?
These are the things textbooks in statistics don’t teach you about the real world.
That should be greatest in March and September, and least in June and December.
Nick writes
And then they combine all those averages areal, weighted and call that a global average.
But every day the first of those readings at say 10pm at 0deg longitude happens nearly 24 hours from the last of the readings at say 10pm at 345deg longitude and there is a lot of weather between those two.
Where does that error live in the calculation?
That’s part of the simplicity of max/min. There is only one a day.
But also phase issues fade when you take a monthly average.
Nick writes
But temperatures aren’t independent like that. For example the same cold front can influence and be measured multiple times at multiple locations if its moving in the right direction.
This isn’t about “fading” its about uncertainty and error.
So throwing away information increases your knowledge.
Huh? This is climate “science”.
All errors are Gaussian and cancel with averaging, right?
They )part) cancel because some are positive nd some are negative. That has nothing to do with being gaussian.
So a skewed distribution will have errors completely cancel when calculating a mean?
For any distribution, a finite number of numbers drawn from it won’t have a mean equal to the population mean. But as the sample grows, the mean will converge to the population mean. Law of large numbers, and there is no requirement that the distribution be symmetric.
You totally missed the point! If the errors don’t totally cancel then how precisely you have located the population mean doesn’t tell you what the measurement uncertainty is.
Why do so many in climate science ALWAYS assume that all measurement uncertainty is random, Gaussian, and cancels? Because that’s the only way you can justify using the SEM instead of the propagated measurement uncertainty from the component data elements onto the averge.
There is something overwhelmingly ironic about their constant yammering about the SEM, I will it save for next week (see if you can guess what it is!).
I find it ironic that a sampling distribution, i.e., temperatures, are considered a population so you can divide σ by the √N to get an SEM. Technically, the standard deviation of a sample means IS the SEM. I see each temperature as a sample whose size is 1 (one) and the mean of each sample IS the temperature. That makes the √N=√1=1.
Exactly, and they run away from this problem every time it is raised.
Not a single one of them, not bdgwx, Stokes, bellman, mosher, etc can give a cogent, coherent explanation of how they compensate for the undoubted existence of systematic bias in temperature measurements taken from multiple measurement stations.
They can’t even explain how they compensate for the microclimate systematic bias changes caused by grass below the station changing from green in the summer to brown in the winter.
They just assume that somehow systematic bias always cancels but they can’t explain how. And they simply aren’t willing to even admit that systematic bias exists in the temp measurement data – for if they admit that then they would also have to admit that the SEM is not a valid way to specify the measurement uncertainty of the average.
It’s telling that none of them are even willing to use the term “measurement uncertainty”, preferring instead to use the words “uncertainty of the mean”. It’s an argumentative fallacy known as Equivocation – changing the meaning of a words that are vague hoping no one will notice.
The truth can only be that they really don’t care about measurement uncertainty, given the amount of verbal gymnastics they generate trying to support these bizarre claims. Uncertainty is a roadblock along the golden road to CAGW.
Does it involve anomaly baselines?
Nick,
That is a stock sampling assumption.
The problem is that the temperatures being read ARE the samples and form the sample means distribution.
Have you ever checked to see what that sample means distribution looks like? Even Tmax and Tmin are taken from different shaped funtions. Is the SEM of Tavg very small?
It is why TN 1900 ASSUMES a Students T distribution will work for a monthly average. It is why they say that other methods may have a wider interval.
Remember the CLT you are quoting does say the sample means distribution should be Gaussian regardless of the population distribution. I would advise checking to see if monthly, or 30 year baselines are normal distributions to verify that the SEM is an appropriate statistic.
Nick writes
Not for weather if the weather has a built in bias. Like the polar vortex. Or roaring 40s. Or any number of features that cause a commonly expected result that can change with no reason to expect that change to be randomly distributed in timeframes we care about.
If they PARTLY cancel then you add them in quadrature. You don’t assume total cancellation! Adding in quadrature still means the measurement uncertainty increases with every data element added to the data set. The larger the number of stations with measurement uncertainty the larger the measurement uncertainty becomes.
You simply cannot assume total cancellation and substitute the SEM for the measurement uncertainty.
That is patently false. Taylor does not say that. Neither does Bevington, NIST, JCGM, UKAS, etc.
And all examples in JCGM 100:2008 or NIST TN 1900 are of different things. Every single one of them. And yet the law of propagation of uncertainty or one of its derivatives is applied equally to each one.
And literally E2 in NIST TN 1900 is of measurements of different temperatures.
All the usual lies from bozox.
Turning the angle on a transit sighting multiple times to increase precision has been SOP for land surveyors for a very long time. Note however, that averaging all the readings for every line gives a number, although it is a meaningless number.
One has to be very careful to define just what the measurand is, as noted in the link provided by Stokes (Possolo, 2015):
I know. I provided the link to that exact text in the post you just responded to. Now can you help Nick and I convince AndersV and the other WUWT participants here who think the inputs to a measurement model must be of the same thing?
Obtaining the average of a set of numbers is a mechanistic process that can be done for any set of numbers. The issue, as I remarked above, is does the measurand make logical sense? Averaging all the telephone numbers in the world will provide a result, but to what end? You could use it as a proxy for God’s telephone number, but I doubt He/She/It will pick up.
The other issue is the precision of the resulting calculation(s). One could define a measurand as being the weight of all the dogs in the world and all the dog fleas in the world. There is such a large difference in the weight of the two that different methods would have to be used to weigh them, and if a couple fleas jumped off a dog during the weighing process, it might be observed, but could probably not be measured. So, it is important to define a measurand as something that is practical, useful, and measurable. The error in dog weights for a dog shedding hair or drooling would be larger than the weight of the fleas!
What many of us are arguing about/for, is known errors be handled properly so that one knows whether the fleas make a difference or not, rather than just ignoring them. That is, we want a good estimate of what the real global average temperature is, taking into account all known uncertainties so that statistical tests can be performed to determine if historical differences are statistically significant.
So, let me ask you a practical question regarding whether measurements have to be of the same thing. If you are trying to measure the angle turned by a transit to improve the precision, what good does it do to conflate two angles? If measuring the diameter of a ball bearing to determine if a particular machine is producing ball bearings within tolerance, why would you also measure ball bearings from a second machine producing different size ball bearings? It all comes down to the definition of one’s measurand and the purpose to which it will be applied.
“ If measuring the diameter of a ball bearing to determine if a particular machine is producing ball bearings within tolerance, why would you also measure ball bearings from a second machine producing different size ball bearings?”
It would be a very reasonable thing to measure the diameter of the product of a certain firm with same nominal size, and get the mean and sd, and SEM, even if produced by many different machines.
And exactly what would you look for in these?
The SD or the SEM?
What does each one tell you about the product?
Would you buy the product if the SEM was 1″ ±0.001″ (SEM) and the SD was 1″ ±0.2″ (SD)?
You have obviously never dealt with real world measurements and informing people that pay money what they are actually paying for. The SEM may have a sacred place for statisticians dealing with numbers, but for those of us who deal with this with some accountability, the Standard Deviation is what matters.
If I am a contractor buying a truck load of 8′ 2×4’s I don’t want to know how accurately the mean was calculated, I want to know how many won’t work for 8′ walls and how many I’ll have to work over because some are too long. That means I want to know the SD and NOT the SEM.
If I’m using a voltmeter I want to know the interval I can expect around the voltage I am currently reading, i.e., the SD. The SEM means little to me since I would need to take 100 readings under repeatable conditions to calculate the SEM to see if it matched your specification.
Neither of them will even try to answer, because they can’t.
“Would you buy the product if the SEM was 1″ ±0.001″ (SEM) and the SD was 1″ ±0.2″ (SD)?”
All that tells you is that you looked at 40000 samples. It doesn’t tell you anything new about the product.
You really skipped over the issue. So let’s try again.
If a salesman offered you buy a truck load of 2×4’s with specs of 8′ ±0.03125″ would you buy them for making 8′ walls?
He does that quite frequently.
Note that the question I asked was “why would you also measure ball bearings from a second machine producing different size ball bearings?” You changed that to, “It would be a very reasonable thing to measure the diameter of the product of a certain firm with same nominal size?” Basically, your response is a non sequitur. Not untypical of you, which is why I have accused you of being a sophist.
Most of the SEM clique have reading comprehension problems.
Because you may be interested doing an analysis on two different types of ball bearings.
This would not be unlike measuring two different types of temperatures: one at the surface and one at the TLT layer. Like the ball bearings one is larger (surface) and one is smaller (TLT layer). Yet we are still interested in doing an analysis on both.
It is intuitive to hypothesize that a common effect, like the planetary energy imbalance, could influence both temperatures in a similar way even though they are of different magnitude. Likewise, it is intuitive to hypothesize that a common effect, like the composition of the alloy, could influence both ball bearings in a similar way even though they are of different magnitude.
Your word salad has little to do with measurement uncertainty and more to do with pushing numbers into some analysis.
Before you get involved with a second experiment to determine the temperature at a different elevation, you should be certain that you are doing it correctly and getting the right answer. Your suggestion is a distraction to the main point, which is whether one obtains the precision of a measurement by dividing by the sq rt of the number of measurements, or adds the uncertainty of measurements in quadrature when it isn’t the same thing measured multiple times with the same measuring device.
You and the others are basically saying that measuring all
air parcelschickens is the same as measuring a singleair parcelchicken the same number of times as the size of the flock. In the first case, one gets the variance of a measurand of a flock of chickens, while in the second case one gets the precision of the measurand of a particular chicken. Why is that so hard to grasp?It’s not hard to grasp. The problem is that grasping it would also mean recognizing that the uncertainty of the GAT is so large that you can’t distinguish differences in the hundredths digit and likely in the tenths and unit digits. Meaning that climate science would have to find a different way to support claims of global warming.
You can measure one chicken 1000 times, find the SEM, AND ASSUME THAT ALL CKICKENS ARE THE SAME. That is, the same μ and the same σ.
Dr. Taylor covers this in his spring example of finding the “k” factor. Once you measure a spring that is outside the interval of the test function, you can no longer rely on the statistics you have assumed apply to everything. That is, all the springs are not similar.
His proof of dividing by √n requires all “samples” have the same μ and σ.
“You can measure one chicken 1000 times, find the SEM, AND ASSUME THAT ALL CKICKENS ARE THE SAME. That is, the same μ and the same σ.”
Make your mind up. Normally you are insisting that measuring the same thing hundreds of times is the only time you can use statistics.
On reality of course measuring the same chicken 1000 times will tell you no more about all chickens, than measuring one chicken once. That’s because it is not a random sample. All measuring it multiple time will do is allow you to determine the measurement uncertainty.
“Make your mind up. Normally you are insisting that measuring the same thing hundreds of times is the only time you can use statistics.” (bolding mine, tpg)
Your lack of reading comprehension skills is showing again.
Assuming the measurand you measured 1000 times represents *all* measurands is simply not logical – yet this is what climate science does.
“On reality of course measuring the same chicken 1000 times will tell you no more about all chickens, than measuring one chicken once. That’s because it is not a random sample. All measuring it multiple time will do is allow you to determine the measurement uncertainty.”
Why then does climate science assume that their GAT represents all climates?
“Why then does climate science assume that their GAT represents all climates?”
It doesn’t.
Your lack of reading comprehension is showing again.
The term “global” has a meaning. You can’t just ignore it like you do measurement uncertainty.
As does the word “average”.
No one is saying otherwise.
You were the one that asked why someone would want to take measurements of different sized ball bearings. I gave you a plausible reason and even related it back to the topic of temperature.
Let me repeat…again. You cannot increase the precision of an individual measurement by taking more measurements. It is only the precision of the average that improves as you increase the number of measurements that went into that average and only if the measurements have a correlation coefficient r < 1. This is true regardless of whether the measurements are of the same thing or different things like in the case of NIST TN 1900 E2. And at no time is summation in quadrature a valid procedure for assessing the uncertainty of the average.
No we are not.
If you’re trying to convince me that the uncertainty of the average is not less than the uncertainty of the individual elements that went into it then reason that is hard to grasp is because it is wrong.
It’s not unlike how some of the contrarians here try to convince me that averages and sums are interchangeable and are then incredulous when I don’t grasp that either. Not only is it wrong to suggest that averages are the same thing as sums, but it is absurdly wrong. That’s why I don’t grasp it.
“Let me repeat…again. You cannot increase the precision of an individual measurement by taking more measurements. It is only the precision of the average that improves as you increase the number of measurements that went into that average and only if the measurements have a correlation coefficient r < 1. “
But it does nothing for estimating the accuracy of the mean you calculate unless systematic uncertainty is either insignificant or zero.
“This is true regardless of whether the measurements are of the same thing or different things like in the case of NIST TN 1900 E2. And at no time is summation in quadrature a valid procedure for assessing the uncertainty of the average.”
You are equivocating again. The issue at hand is the MEASURMENT uncertainty of the average, not how many digits your calculator uses in calculating the average.
Why do you never use the term “MEASUREMENT” uncertainty of the average?
“”””This is true regardless of whether the measurements are of the same thing or different things like in the case of NIST TN 1900 E2.””””
How many times does it need repeating that this example IS measuring the same thing. NIST declares the measurand to be the monthly average. The daily Tmax temps are experimental measurements of this measurand.
Ultimately the experimental standard deviation and the expanded standard deviation of the mean is not terribly different. 2 vs 4 for experimental uncertainty.
The real question is how do you propagate this uncertainty
I don’t know. I’m having a hard time visualizing this scenario.
Each widget (ball bearing or whatever) is a different thing. It is still perfectly reasonable to compute an average of them and assess the uncertainty of that average.
And this number tells you absolutely nothing.
Put some numbers to this example.
Does the experimental standard deviation of the mean provide you any information about the variance of the ball bearings?
Remember it uses √N to reduce the value, so the more you measure, the smaller the numbe because σ doesn’t change.
So, does the experimental standard deviation of the mean tell you more about the mean than it does about the dispersion of possible values surrounding μ? Or, does σ tell you more about the values that ball bearings may have?
But, you are not justified in conflating them and increasing the claimed precision of all of them just because you measured more of them!
They simply can’t accept the fact that if there is *any* systematic bias in the measurements that the measurement uncertainty of the mean will grow with every data element you add to the data set.
Even if the systematic bias is in the thousandths digit, by the time you add 1000 data elements the measurement uncertainty will have grown into the hundredths digit.
The systematic bias in field temperature measurements is most assuredly greater than the thousandths digit once the measurement device has been in place for any period of time. Even the newest PTC sensors are only precise to the thousandths digit and precision is not accuracy.
“They simply can’t accept the fact that if there is *any* systematic bias in the measurements that the measurement uncertainty of the mean will grow with every data element you add to the data set.”
No. If a thermometer is reading 1C too high, then the mean of many readings will be 1C too high.
And just where do you find this applied in any of the traditional surface temperature data sets? Show us what these data sets use for combined standard uncertainty that has all the components like systematic error.
“No. If a thermometer is reading 1C too high, then the mean of many readings will be 1C too high.”
Exactly! And when you average that with a different station whose systematic uncertainty is different what happens to the measurement uncertainty of that average?
And what if a year from now the systematic uncertainty of that station is +1.1C? How do you discern a temperature difference in the hundredths digit?
You *really* don’t expect all measurement station calibration to always be the same from month to month or year to year do you?
If the calibration drifts then anomalies less than the drift can’t be discerned – they would fall in the UNKNOWN.
How do you KNOW it is 1C too high?
Nick writes
If instead of a thermometer, the readings are from a compass with a systematic bias…after lots of readings, what is the uncertainty in your heading?
And for bonus points, why is this different?
Oh WOW! This is a good one! Hope you don’t mind if I use it!
Yes it is; consider another example, sending a probe to another planet, which requires midcourse corrections: do the mission controllers have the probe make 100 different position measurements and average them to get a higher “accuracy”?
Absolutely not, they design the probe to have the necessary measurement resolution and precision builtin.
The uncertainty in each measurement will be the combination of the components of uncertainty arising from both systematic and random effects. The uncertainty of the average will depend on how much of the combined uncertainty is systematic vs random or the correlation coefficient r between any two measurements.
For example, if r = 0.5 and the combined uncertainty of systematic and random effects is u then when y = (a+b) / 2 the general solution is u(y) = sqrt[ 3/4*u(a)*u(b) ]. Note that r = 0.5 is a statement that the systematic and random effects are in equal magnitude.
Let’s put some hard values on it. Let’s say the random effect has an uncertainty u_r = 5 degrees and the systematic effect has an uncertainty u_s = 5 degrees. The combined uncertainty for a single measurement is is thus u = sqrt[ u_r^2 + u_s^2 ] = sqrt[ 5^2 + 5^2 ] = 7.1 degrees. And since u_r = u_s then r = 0.5. Now let’s say we have two measurements a = 90 degrees and b = 120 degrees. The average is y = (a + b) / 2 = (90 + 120) / 2 = 105. The uncertainty in that value is u(y) = sqrt[ 3/4*u(a)*u(b) ] = sqrt[ 3/4 * 7.1 * 7.1 ] = 6.1 degrees. That’s your answer…105 ± 6.1 degrees.
Refer to JCGM 100:2008 equation 16 for details on how I did this. The derivation of u(y) = sqrt[ 3/4*u^2 ] when y = a+b is a bit tricky, but I can walk you through it stepbystep if you like.
You can verify the solution with the NIST uncertainty machine.
Other than being a compass and the measurements being in different units there is nothing else different. It is all handled the same way.
So for a compass that has a systematic bias of x degrees then all measurements from it are biased by x degrees which means that an average of many measurements will be biased by x degrees as well.
Oh Lord, please have mercy, not the NIST uncertainty machine spam AGAIN.
Do you not find it amazing that the two numbers you chose could also be two temps on the globe and end up with an uncertainty of 6°?
“ Note that r = 0.5 is a statement that the systematic and random effects are in equal magnitude.”
Do *YOU* know what the u(random) and u(systematic) are for the measurement station at Forbes Air Force Base in Topeka, KS?
If you don’t then you have no way to separate them to do the calculation you just went through.
If you *do* know then show us the math to determine the uncertainty in the midrange value for yesterday, 9/27/23.
If you don’t know then show us the math to determine the uncertainty in the midrange value for yesterday, 9/27/23.l
You are getting old, you just pulled a Biden and repeated yourself!
Damn, hit the wrong button.
Let’s see what the GUM says about some of this.
Your functional description requires two measurements “a” and “b” to obtain a Y value!
If your functional relationship is
y = (a + b)/2
then you have only one measurement to use with the assumption of input quantities of 90 and 120.
Your functional description is a simple addition of similar SI units, so a simple RSS calculation will give a value of 7.1 as you indicated.Your measurement IS 105. That makes your value 105 ±7.1°.
You need two additional measurements “c” and “d” obtain a measurement quantity for another measurand. Only then can you find a mean and begin a statistical analysis of two measurements made under repeatable conditions.
Let me add that a measurement quantity of a measurand that consists of an AVERAGE of two physical measurements is probably not a good relationship. I can’t think of where this might occur in the real world. You might enlighten the folks here just what your example might be.
Lastly, when you add additional measurements from new experiments to your data, you need to use Section 4. Section 5 is for input quantities that determine a single measurement
Yes, if one hasn’t dialed in the local magnetic declination, no amount of readings is going to give one the correct heading.
If you are walking 400 miles across the Kansas prairie you would totally miss the town you are looking for. Of if you are in the CO mountains you could totally miss the valley you are looking for with only a 10 mile walk!
There *are* physical consequences that go along with measurement uncertainty. Climate science, at least as advocated for by the likes of bdgwx, refuses to accept this.
Strawman. Nick, Bellman, myself, NIST, JCGM, etc. never said you could. I’m going to tell you what I tell everyone else. That is your argument and yours alone. Don’t try to pin that on us. And don’t expect any of us to defend your arguments especially when they absurd.
What we have been saying is that the uncertainty of the average (not the uncertainty of the individual elements) decreases as the number of elements increases when those elements are correlated at r < 1.
Read that statement carefully. Read it multiple times and burn it into your brain.
Still bullshit, and a distinction without a difference.
What *can* be pinned on you is your use of the unjustified assumption that the temperature measurements have no systematic uncertainty so you can do a statistical analysis of the data and use the SEM as the measurement uncertainty of the data.
No one cares how precisely you locate the average if the average has measurement uncertainty that overwhelms the differences you are trying to identify. If you precisely calculate the average to 100.000001 but the measurement uncertainty of the average is 100.0 +/ .1 you are only fooling yourself that you can tell the difference between 100.000001 and 100.000002. Both of those values in the millionth digit is part of the great UNKNOWN. It’s the location of the diamond in a fish tank full of milk.
It’s why those of us who understand metrology see climate scientists trying to identify temp differences in the hundredths digit as no different than a circus fortune teller gazing into a cloudy crystal ball.
Cue Johnny Carson and Carnak.
Your age is showing!
Heh.
When you say the measurements don’t have to be of the same thing, then that is exactly what you are saying.
Saying that measurements do not have to be of the same thing to apply the procedure in NIST TN 1297 and JCGM 100:2008 is not a statement that the uncertainty of the individual measurements decreases as the number of measurements increases. The first is true. The second is not. Those are two completely different concepts. Neither Nick, Bellman, or myself are conflating them. And it is not unreasonable to demand that you not conflate them either.
And don’t hear what I didn’t say. I did not say that the average of the individual uncertainties is same as the uncertainty of the average. It isn’t. I did not say that error is the same thing as uncertainty. It isn’t. I did not say that the uncertainty of individual measurements decreases as the number of measurements increases. It doesn’t. There is likely to be countless strawmen that some here are going to want to pin on NIST, JCGM, Nick, Bellman, and I that none of us said or advocated for. I repeat again…we are not going to defend the arguments made by others especially when they are absurd.
The only thing that is being said is that the law of propagation holds in all cases. That necessarily means that for a measurement model that computes the average of inputs whose correlation coefficient r is less than 1 the uncertainty of the output (not the inputs) decreases as the number of inputs increases. And it works regardless of whether the inputs are themselves repetitions of a single measurand or different measurands all together measured by different instruments. That’s it. Nothing else is being said.
The first part of the law of propagation is that you cannot assume all measurement uncertainty is random, Gaussian, and cancels. You have to *prove* that the data is random, Gaussian, and cancels.
“That necessarily means that for a measurement model that computes the average of inputs whose correlation coefficient r is less than 1 the uncertainty of the output (not the inputs) decreases as the number of inputs increases.”
You keep forgetting to state the assumptions under which this is true. It is *ONLY* true for random, Gaussian distributions. Primarily where you have multiple measurements of the same thing using the same calibrated instrument and under the same environmental conditions.
“And it works regardless of whether the inputs are themselves repetitions of a single measurand or different measurands all together measured by different instruments.”
Using *your* logic, you can combine the heights of Shetland ponies with the heights of the quarter horses and the measurement uncertainty becomes the SEM – and *NOT* the propagated measurement uncertainty from the individual elements.
In fact, such a thing would give you a multimodal distribution which is *not* adequately described by the population mean – no matter how precisely you locate that mean by adding more and more height measurements of the different breeds. It’s exactly the same for temperature measurements.
Why do you continue to use the words “uncertainty of the output” or “uncertainty of the mean” when it is the measurement uncertainty of the mean that is at issue?
I’ll ask you again. If your functional relationship is an average of all the inputs, how do you calculate a mean of experimental standard deviation or even experimental standard deviation of the mean. You have one experiment (one sample) with one mean as your only data point.
See the image. Where do you get multiple “k” experiments? If you average all your data into one data point, you can’t even find a mean from multiple experiments because you only have one data point calculated from X₁,₁, …, Xₙ,₁, you won’t have any X₁,₂, … , Xₙ,₂ or other experiments because you have already used the information you have available.
That you lump your gang of flat earth trendology nutters in with the GUM is beyond ironic.
You don’t even know what the “same thing” really means do you?
Define the measurand like in TN 1900. The monthly average of Tmax for a single month at a given station.
“formulate the measurement that relates the values of the output to the input values, i.e., the daily Tmax’s recorded for that month.
Tell us which one of the following conditions of repeatability conditions were violated?
GUM
Experimental measurements don’t work like the old “single thing measured with the same device multiple times.” Under those conditions the “true value” plus or minus the SEM may be applicable. That is why the GUM includes an experimental standard deviation of the mean in their text. Under some conditions that may be an applicable statement of a value. It is up to the experimentor to publish an adequate statement of value, i.e. the experimental standard deviation to let readers understand that the is a dispersion of values around the mean.
Here is what Dr. Taylor says in Section 5.7.
“””Because x1, … , xn are all measurements of the same quantity x, their widths are all the same and are are equal to σₓ,
σₓ₁ = ••• = σₓₙ = σₓ”””
“””We imagined a large number of experiments, in each of which we make N measurements of x and then computed the average x̅ of those N measurements. We have shown that after repeating this experiment many times, our many answers will be normally distributed, that they will be centered on the true value of X, and that the width of their distribution is
σₓ̅ = σₓ/√n.”””
In case you don’t notice, this is exactly describing a sampling distribution where the sample means estimates the “true value” made up of the many x̅ values from multiple sample.
Since temperatures are single measurements of temperature, N = 1, even when averaging a months worth experimental data points. In essence, σₓ̅ = σₓ/√n = σₓ/√1 = σₓ
TN 1900 gets around this by making assumptions about a Students T distribution. TN 1900 also says that:
“””A coverage interval may also be built that does not depend on the assumption that the data are like a sample from a Gaussian distribution. The procedure developed by Frank Wilcoxon in 1945 produces an interval ranging from 23.6 ◦C to 27.6 ◦C (Wilcoxon, 1945; Hollander and Wolfe, 1999). The wider interval is the price one pays for no longer relying on any specific assumption about the distribution of the data. “””
Note the last sentence. NIST notices that the assumptions may not be correct. They admit that another test provides an interval of ±2.
It is all about definitions. If you repeatedly measure the diameter of a single ball bearing, you have the means of describing the random variation resulting from measurement error (assuming perfect sphericity or at least negligible ellipticity). Thus, you can state an estimate of the probable diameter, +/ a measurement uncertainty. You are justified in increasing the precision estimate by a factor of the sq rt of the number of measurements: The same thing, measured multiple times, with the same instrument.
If you measure the diameter of a sample of 100 ball bearings from a particular batch, you have the means of describing the average diameter of the ball bearings from that batch as a distribution, plus the inherent measurement error from the single ball bearing experiment, above. You aren’t justified in claiming an increase in precision of the individual bearings by a factor of ten: Different things, measured once each, with the same instrument.
Do you understand how that relates to measuring air masses?
They don’t understand anything about real world measurements. They have their Stat 101 for nonmath majors that describes the SEM as the error of the mean and by Jiminy they are going to stick to that! Anything else is just a unnecessary distraction so it’s easier to just assume it all cancels.
(Note: (population average)/sqrt(n) IS the SEM, it is *not* the measurement uncertainty of the population mean.)
No one is saying otherwise. And if you think Nick, Bellman, and I are saying that then you haven’t been reading our posts because we have clear and unequivocal on the point that it is only the uncertainty of average of elements correlated at r < 1 that is lower with a higher element count and not the uncertainty of the individual elements themselves. We have stated that repeated and concisely numerous times.
Yes I do. And I see no reason to challenge NIST’s understanding on the matter either.
The SEM, which is what you are calling the “uncertainty of average”, is *NOT* the measurement uncertainty of the average. The measurement uncertainty of the average is the important factor determining what you know and what you *can’t* know.
Calculating the average to the thousandths digit, which is what the SEM tells you, tells you NOTHING about how accurate that mean is. The mean could be off by 100% while the SEM is zero. You simply can’t tell anything about accuracy of the mean from the SEM.
Why climate science refuses to accept that measurement uncertainty is so large in the temperature data that you simply cannot identify differences in the hundredths digit is beyond me.
It *is* assuming that if you can measure a crankshaft journal enough times using a yardstick marked in 1/8″ increments that you can determine the diameter of the crankshaft journal to the .001″. It *is* assuming that you can identify the location of a diamond in a fish tank of milk if you just stare at the tank long enough!
Their religious dogma is that the Earth is warming dangerously from human activities, and the ONLY way they can even make a case for any warming is to make claims such as:
“Further analysis also indicates that if the surface temperature in the last five months of 2023 approaches the average level of the past five years, the annual average surface temperature anomaly in 2023 of approximately 1.26°C will break the previous highest surface temperature, which was recorded in 2016 of approximately 1.25°C …”
DOI: 10.1007/s0037602332009
If they showed the justifiable significant figures and the correct uncertainty, they would have to conclude that there is no statistically significant difference between the two measurements, and no basis for the recent warming claim — Game Over!
My pet peeve is that after seeing numerous individual stations with little to no warming, not one warmest has ever had the temerity to post some stations that could average out to 1.25°C with a station that has 0°C warming.
One has to wonder what it takes to have some evidence of stations that are warming at a 2.5°C or better that isn’t affected by UHI!
“not one warmest has ever had the temerity to post some stations that could average out to 1.25°C with a station that has 0°C warming.”
Try looking through GHCN. Here are few from the unadjusted data, using annual averages.
SIE00115076 – POSTOJNA
1962 – 2022. Warming rate 0.48°C / decade. Total warming 2.9°C.
IR000407660 – KERMANSHAH
1951 – 2022. Warming rate 0.42°C / decade. Total warming 3.0°C.
SWE00139498 – HOLJES
1961 – 2022. Warming rate 0.49°C / decade. Total warming 3.0°C.
Did you do any research on these? I realize you answered the question, but maybe these aren’t the appropriate stations.
I have included an image of a graph for Slovenia. Your graph seems to be anomalous, at least for the country.
The Tmax temps for KERMANSHAH appear to have some changes. ~1980 and ~1994 both show the possibility of a station change of some kind with relatively flat temps afterward.
I guess you should realize that the graphs on WUWT in the past were pretty well researched and not just random finds.
And there go the goal posts. First you complain nobody shows you stations that show anomalous warming trends, then complain that the trends you are shown are anomalous.
As I said, I deliberately used unadjusted data, as I knew the whining if when it’s adjusted. But that inevitably means there will be all sorts of reasons why a particular stations will show more or less warming than the rest of the area. I just looked at the trend for every station, and randomly selected stations that looked plausible. Rejecting any which show obvious problems or discontinuities.
“I guess you should realize that the graphs on WUWT in the past were pretty well researched and not just random finds.”
Which ones would those be? The one Tim insisted was perfect despite a big chink showing mean temperatures rather than maximums? The one in Tokyo that keeps being used to prove there’s been no warming despite the fact it was moved to a cooler location in several years ago?
I’m not going to find the actual graphs. A few are from Japan, Australia, U.S. CRN, Great Britain and other places.
I didn’t move the goalposts. I provided a Google located graph for Slovenia. It doesn’t show the warming you have. Do you think this might be indicative of the problem with the temperature database?
“I provided a Google located graph for Slovenia.”
For 10 years, ending in 2021.
Here are the same years for the GHCN Postojna. Not that different.
You are lying every time you claim the SEM is the “uncertainty of the average”.
The understanding problem is on YOUR end, not NIST’s.
Bevington: “The accuracy of an experiment, as we have defined it, is generally dependent on how well we can control or compensate for systematic errors, errors that will make our results different from the “true” values with reproducible discrepancies. Errors of this type are not easy to detect and not easily studied by statistical analysis.”
Taylor: “As noted before, not all types of experimental uncertainty can be assessed by statistical analysis based on repeated measurements. For this reason, uncertainties are classified into two groups: the random uncertainties, which can be treated statistically, and the systematic uncertainties, which cannot.”
The SEM is a *statistical” treatment – AND IT IS NOT JUSTIFIED WHEN SYTEMATIC BIAS EXISTS IN THE MEASUREMENTS.
These guys keep trying to justify using the SEM as a measurement uncertainty of the average when it simply doesn’t apply at all to temperature measurements.
You are correct, the understanding problem is *NOT* in the NIST, Taylor, Bevington, or Possolo. It is an understanding problem of those who think you can statistically analyze data that has systematic bias (and time varying systematic bias at that) in the face of metrology experts saying that you can *NOT* do so.
“You are lying every time you claim the SEM is the “uncertainty of the average”.”
Here’s NIST’s E2 again
One potential source of uncertainty is model selection: in fact, and as already mentioned, a
model that allows for temporal correlations between the observations may very well afford a more faithful representation of the variability in the data than the model above. However, with as few observations as are available in this case, it would be difficult to justify adopting such a model.
The {εi} capture three sources of uncertainty: natural variability of temperature from day to day, variability attributable to differences in the time of day when the thermometer was read, and the components of uncertainty associated with the calibration of the thermometer and with reading the scale inscribed on the thermometer.
Assuming that the calibration uncertainty is negligible by comparison with the other uncertainty components, and that no other significant sources of uncertainty are in play, then the common endpoint of several alternative analyses is a scaled and shifted Student’s t distribution as full characterization of the uncertainty associated with r.
A lot of assumptions about uncertainty here. You can wave them away as NIST did in the EXAMPLE, but for scientific work, you can’t just do this.
Again, this example has been set up to allow one to use the SEM in terms of the same thing, multiple times with the same device, and in the same location. Do you think NIST is unaware of the requirement of repeatability conditions?
Do you understand what the difference is between the same thing, multiple times with the same device, and in the same location and single measurements of experimental standard deviation?
If you are using this example as an exemplary way of calculating uncertainty, what do you have to say about the end result of said uncertainty being ±1.8°C? Exactly what causes this large uncertainty to not propagate throughout following calculations?
Why don’t you list out the assumptions in E2?
Such as:
Then tell us how these assumptions apply to the global temperature data set.
Here is GUM E.4:
E.4 Standard deviations as measures of uncertainty
E.4.1 Equation (E.3) requires that no matter how the uncertainty of the estimate of an input quantity is obtained, it must be evaluated as a standard uncertainty, that is, as an estimated standard deviation. If some “safe” alternative is evaluated instead, it cannot be used in Equation (E.3). In particular, if the “maximum error bound” (the largest conceivable deviation from the putative best estimate) is used in Equation (E.3), the resulting uncertainty will have an illdefined meaning and will be unusable by anyone wishing to incorporate it into subsequent calculations of the uncertainties of other quantities (see E.3.3).
E.4.2 When the standard uncertainty of an input quantity cannot be evaluated by an analysis of the results of an adequate number of repeated observations, a probability distribution must be adopted based on knowledge that is much less extensive than might be desirable. That does not, however, make the distribution invalid or unreal; like all probability distributions, it is an expression of what knowledge exists.
E.4.3 Evaluations based on repeated observations are not necessarily superior to those obtained by other means. Consider s(q), the experimental standard deviation of the mean of n independent observations qk of a normally distributed random variable q [see Equation (5) in 4.2.3]. The quantity s(q) is a statistic (see C.2.23) that estimates σ(q), the standard deviation of the probability distribution of q, that is, the standard deviation of the distribution of the values of q that would be obtained if the measurement were repeated an infinite number of times.
It standard deviation that quantifies uncertainty, not your precious SEM.
And where are the repeated observations in air temperature measurements? THERE AREN”T ANY.
And as Jim pointed out, why did you not include the assumptions in the NIST example? More of your sophistry.
Dr. Taylor covers this in his spring factor “k” example. You may use a single ball bearing to calculate an uncertainty for 99 remaining ball bearings. However, if you measure a ball bearing that exceeds the first μ+SEM, you must measure it multiple times to find the uncettainty.
Why? Because the SEM is an interval around the mean describing where the mean may lay. If your measurement exceeds this interval, you have a problem. This is one reason to use an expanded SEM.
In TN 1900 Possolo SPECIFICALLY states the assumption that the measurements are of the same thing. He also specifically states the assumption that systematic bias is insignificant. Meaning he assumed all error was random, Gaussian, and canceled thus leaving the SEM as the measurement uncertainty.
You simply cannot do that in reality. Neither assumption holds in reality, especially when you are looking at a midrange value.
The very fact that different Tmax and Tmin values can result in the same Tmidrange should be a clue that the midrange value is *NOT* a good index for assessing climate!
The law of propagation requires the addition of all measurement uncertainty, in quadrature if there is partial cancellation, in all cases. The addition of the measurement uncertainties cannot be substituted for by using the SEM – except in one specific case. And that one specific case is why Possolo made the assumptions he did in TN1900.
It’s the same for the JCGM and TN1900. You are wanting to shoehorn the reality of the temperature data into the same assumptions that Possolo made – all measurement uncertainty is random, Gaussian, and totally cancels leaving the SEM as the measurement uncertainty.
The SEM and the law of large numbers is ONLY good for determining how precisely you have located the population average. Except in the one specific case the SEM tells you nothing about the accuracy of the population mean. The population mean could be off by 100% while the SEM is zero all because of systematic bias in the individual measurements. That is why you HAVE to account for systematic bias in the real world. It never goes away. It can partially cancel which is why you add in quadrature – but it *NEVER* goes away in real world measurements and that is why the SEM can’t be used as the measurement uncertainty in the real world of temperature measurement.
Bevington specifically says that measurements containing systematic bias are not amenable to statistical analysis. He then goes on to analyze distributions, like Gaussian, assuming no systematic bias. His entire book is on distributions whose measurements are assumed to have no systematic bias, only random error.
Almost the same thing applies to Taylor. His Chapter 4 is even titled “Statistical Analysis of Random Uncertainties”. Note carefully the words “Random Uncertainties”.
In the text he says: “As noted before, not all types of experimental uncertainty can be assessed by statistical analysis based on repeated measurements. For this reason, uncertainties are classified into two groups: the random uncertainties, which can be treated statistically, and the systematic uncertainties, which cannot.” (italics are in the text, tpg)
In TN1900, Possolo specifically states that systematic uncertainty has to be assumed to be insignificant in order to use the methodology he proceeds with.
Global temperature data HAS widespread systematic uncertainty. A priori that means the data is *NOT* amenable to statistical analysis. The way it should be treated is how Taylor treats measurement uncertainty in his Chapter 13 – i.e. measurement uncertainty ADDS, either directly or in quadrature.
The use of the SEM *is* based on assuming that all measurement uncertainty is random, Gaussian, and cancels. NO SYSTEMATIC UNCERTAINTY.
It’s an assumption that is simply not justified for the real world of temperature measurements using field measuring devices whose calibration is not guaranteed.
It’s why Hubbard and Lin found in 2002 that regional adjustments to measurement devices is simply wrong. Local microclimates are so different that the random error/systematic bias has to be done on a stationbystation basis. That alone should have been a warning to climate science that assuming all measurement uncertainty is random and Gaussian is wrong. But it seems to have made no impression at all – especially on you, Stokes, and bellman.
Remember that bgw is a big proponent of fraudulent data mannipulations.
They been told this very point many, many times yet refuse to acknowledge the truth.
Did you seen the remark that Stokes made over on Climate Etc. on the 24th about “That’s a common mantra among the uncertainty cranks at WUWT, who never quote any authority for it.?”
You can follow that down to his facts on why, and then Andy May’s subsequent silence.
Still waiting for Bellcurveman to “school” me, blob.
I think you are incapable of being schooled.
Which thread Clyde?
Judith Curry’s blog, Comment and Reply to GRL on evaluation of CMIP6 simulations by Scafetta, https://judithcurry.com/2023/09/24/commentandreplytogrlonevaluationofcmip6simulations/#comment993573
Thank you.
Nick doesn’t understand measurement uncertainty and shouldn’t be commenting on it.
As I have pointed out to Nick, as have you, the assumptions in TN 1900 are entirely set up to follow the traditional same thing, multiple measurements, same device, repeatable conditions whereby the SEM is an appropriate statistic.
The GUM is definite about this. Experimental measurements of different stations do not meet repeatable conditions requirement.
“Together with Gareth Jones and John Kennedy, he [Gavin] wrote a letter to the Editorial Board of GRL asking them to retract my paper.”
The #1 tactic of leftists — censorship.
“The topic of discussion in this subthread is not the correct way to handle data, but instead, your willful lying.” — CS
Amen!
No I missed it, don’t read CE. Typical Stokes, who thinks he is the world’s foremost expert on absolutely everything.
Exactly!
Bingo! Someone that understand metrology!
The JCGM does not say that. It says;
“””B.2.18 uncertainty (of measurement)”””
“””parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand. “””
“””C.3.2 Variance”””
“””The variance of the arithmetic mean or average of the observations, rather than the variance of the individual observations, is the proper measure of the uncertainty of a measurement result.”””
The variance of the average of the observations is the proper measure of the uncertainty in a measurement. That is the statistical parameter that characterizes the dispersion of the values that could reasonably be attributed to the measurand.
I have yet to see anyone define what the measurand is, the procedure for determining the observation of the measurand, or the measurment model. I refer you to NIST TN 1900 for a tutorial on how to define an observation equation and a measurement error model.
I’ll say it again, anomalies are not measurements of temperatures they are a delta T.
One more criticism is that all this work should be done Kelvin.
No Stokes.
The real issue is that Schmidt proved many years ago (15?), that he is either a completely incompetent or, happy to write anything for a price.
Witness his laughable paper on ‘warming’ Antarctica with stations in wildly incorrect locations, others long buried under meters thickness of snow, in addition to smearing warmth from the volcanic peninsula as far around as he dare.
Exposed in detail by Steve McIntyre and others.
I think you are thinking of Steig (and have it all wrong).
Nick,
You are correct, it was Steig’s work.
Geoff S
Agreed. It was Steig’s work and IIRC Nic Lewis criticised the method and effectively proved Steig’s analysis wrong.
Tim,
You are correct about the response(s) to Eris Steig about 20098, by Ryan O’Donnell, Nick Lewis, Stephen McIntyre and Jeff Condon. Geoff S
This is the Nick of old that I remember! At least, in any serious reply to his comment, some thoughtfulness is warranted. I myself don’t dig down to the minutae of temperature readings because as I speak an algorithm is busy adjusting T down before 1940 and up after 1980. And that was after the Father of Global Warming Hysteria, pushed the 1930s40s 20th century highhstand down over half a degree C. In doing so he got rid of century’s high and the deep 40yr cooling period late 1940s 1979 that followed, both of which falsified the CO2 control knob hypothesis. Jim Jumanji Climate then retired, as is the wont of the climate changers (remember T. Karl and his Karlization of ocean surface T on the eve of his taking his pension).
So how many angels are dancing on the head of the T algorithm stylis doesn’t interest me. Were I a scientist from the Dark Side, I would be delighted that sceptic scientists had legitimized the Big T jiggering by arguing about tenths of a degree on a bogus record.
Gary, I think it important that when an alarmist says the following:
that we hold their feet to the fire and make them defend their claims because people who vote read the above and are impressed by their credentials.
“Suppose you had an ideal model, which would be a planet B (Earth is A), similar in all respects, including rising GHG.”
So use two FAKE and ERRONEOUS models and compare the output…
…. and assume the difference is real.
Seriously , Nick. !!!
If that is the sort of antiscience you need to rely on….
YOU GOT NOTHING !!!
It is actually worse than Niccola Scafetta paints it.
This is a comparison between HadCRUT5 and CMIP6 from:
climexp.knmi.nl/CMIP6/Tglobal/global_tas_mon_ens_ssp245_192_ave.dat
Baseline: 19611990
Models are getting wronger by the day. There’s no fix for their problem. And the AMO hasn’t even started going down. This is going to be fun.
That data says it is from 2019. It’s from an early time when almost all the data came from CanESM5, which did indeed run rather hot. A more complete sample will show a different picture.
So you are saying that all earlier models RAN HOT
And now you say they don’t (which is of course BS)
Trouble is, the whole AGW scam is built around those earlier models.
Now take you left foot out, and put your right foot in !
That is certainly not correct. CanESM5 has 28 members out of 175. Since when is 16% of the data “almost all the data”?
That CMIP6 runs hotter than CMIP5 has made it to Science:
Voosen, P., 2021. Science, 373 (6554) pp.474–475. doi.org/10.1126/science.373.6554.474
Since when are you a Science denier? Since about the same time when 16% of the data became almost all the data?
In any case, that the IPCC has gone from highlighting the scariest scenarios and the models that produced the most warming to doing the opposite speaks volumes about this supposed climate emergency and the skill of models in predicting it.
“That is certainly not correct. CanESM5 has 28 members out of 175. Since when is 16% of the data “almost all the data”?”
It is certainly enough to create a large bias. But why go back to the very early days in 2019, when only a few results were in?
So.. 28 WRONG MODELS
Hilarious that you even pretend any are accurate. 🙂
28 runs from 1 model
“28 runs from 1 model”
So the model was WRONG “at least 27 times out of 28”
WOW how reassuring is that !!
Change feet again, Nick !!
So.. wait for the URBAN data to be fabricated,
Then say, “these models are close to the urban adjusted temperature fabrication”
When all the rest of the models are wayyyyy off !!
And pretend it is nothing more than an accident.
That is “climate science™” for you
No. If I remove CanESM5 the two curves would approach less than 16%, you wouldn’t be able to tell the difference.
That’s what’s available in knmi, and it was before the IPCC started cherrypicking the coolest models and changing the baseline to the 21st century to hide the issue.
I Missed the important part. Since when is climate model output data as Nick asserts?
Javier,
Some annotation on the graphs would help. What are the red lines, the black li
In both graphs the black are observations and the red are models. The top graph is the anomalies and the bottom is rate of change.
Thanks, Andy & Javier.
Geoff S
Yes, thank you, Andy. The black thin line is HadCRUT5 13month running average. The black thick line is a Gaussian smoothing. The dashed red line is the average of the 175 ensemble members (40 models) for CMIP6 available at KNMI Explorer. All curves in anomaly with respect to the 19611990 baseline.
The second graph is the 15year rate of change of the curves in the first one, expressed as ºC per decade.
About half of the time the models appear clueless about what is going on in the real climate. That indicates more chance than skill.
Thank you for the clarification.
The inevitable coming cooling trend will utterly destroy their forecast skill of their models which will never predict the cooling at all.
The Internet now has too many pictures of what is currently being assumed. It will eliminate too much changing. Which means any cooling is going to look outlandish and at the same time destroy CO2 beeping the boogyman.
Nice spaghetti, Stokes.
Not mine, Scafetta’s.
You posted it.
Nick, you are wrong about the eye. Millions of years of evolution have worked wonders. See my remarks from years ago about the uni prof and his mark one eyeball.
I agree when you say “But the error that should be obvious here is that he allows no uncertainty in the observations. None at all. Now Andy objects that Schmidt et al have allowed too much, but zero has to be wrong.”
Scafetta (2023, “Reply to “Comment on…”) seems to “want his cake and eat it too” by assuming the time series 20112021 is completely deterministic (“the 2011–2021 ERA5T2m interannual variability—which represents the actual climatic chronology that occurred—cannot be replaced by random data”) so he can handwaveaway any stochasticity in each series by grid cell BUT in contrast he assumes (by the statistical method he applies of ttests by grid cell) that the set of CGMs applied are a simple random sample of some superset of CGMs. You cannot have it both ways. You either have both deterministic (i.e. no stochasticity and thus no variance estimation and hypothesis testing) or consider both (part) stochastic.
My suggestion would be to fit a thinplate regression spline to each grid cell’s time series of observed ERA5T2m records by fitting the appropriate linear mixed model (incorporating the thinplate spline as linear plus random effect terms) to the ERA5T2m records minus the corresponding prediction of each CGM as an 11 x N catenated vector representing the response variable and including a random effect for each CGM, a random effect for each year in the series, and finally the residual error. The test of no difference would then be based on the support interval for the intercept parameter which under the null hypothesis of zero difference between populationlevel mean of observations and populationlevel mean of model predictions (assuming both observations and predictions are sets of random samples within each grid cell). This support interval would be based on the random CGM variance component and the residual variance about the fitted splines for the ERA5T2m series and the residual representing the interaction of CGM_factor and the times series factor (adjusting for other terms). These last two variances would be greater than zero but less than that obtained by not fitting the spline (i.e. the equivalent of the SJK2023 approach in this last case).
eg in R using MCMCglmm for each grid cell
prior1 < list(G = list(G1=list(V =1, nu = 0.002), G2 = list(V =1, nu = 0.002), G3 = list(V =1, nu = 0.002), R = list(V=1, nu = 0.002))
>
> m5d.1 < MCMCglmm(T2m_minus_CGM_pred ~ 1+Years_centred, random=~ spl(Years_centred) + Year_factor + CGM_factor, data = data,
nitt=130000, thin=100, burnin=30000, prior = prior1, family = “gaussian”, pr=TRUE, verbose = FALSE)
>
> summary(m5d.1)
Is there such a thing as “random noise” in climate science? I have no clue.
There is random noise in all measurements, each measurement of any particular thing will be slightly different, which is why we repeat measurements and take an average.
Estimating an average annual temperature for one year twice and getting a different value is truly an estimate of error. Measuring the average temperature for two (or 11) consecutive years, then averaging those is not error, since the underlying climate trends are also in the differences. This is the problem with what Schmidt, Jones, and Kennedy did. They mixed real climate change variations with random error. That is a nono.
Andy,
I disagree that there is random noise in all measurement.
We could argue what random means.
But every measurement variation has a cause and effect. We get into trouble when we cannot or will not discern the direction, magnitude and abundance of all of the causes of variation. It is simply not good enough, but seductively appealing, to hit the “too hard basket” and say it is “noise”. It is worse to next assume that positive and negative noises cancel out. It is easy to find comfort by fiddling with strings of synthetic numbers that appear to support your assumptions, especially when it makes it so much easier. Geoff S
On point, Geoff.
Every single field calibration of a surface air temperature sensor has revealed considerable systematic measurement error. The error wasn’t merely random in any one of them.
The pervasive assumption of strictly random temperature measurement error is the crack cocaine of AGW climatology.
And the fake equation of model precision as an accuracy metric is their operating fantasy..
Yet the climate science warriors rant on and on about how nonrandom error transmogrify into random error which they then ignore.
Yeah, they used temperature data that themselves have been adjusted to climate models that fails to show valid forecast skill thus the whole thing is a waste of time.
Thus, the errors are magnified while they pretend to themselves, they are too small to matter.
Andy,
Noise is a term that is too casually bandied about. Noise is a signal that is detected along with and emulates the intended signal.
Do temperatures have noise, of course. Wind, humidity, surface types, clouds, or any of a number of environmental conditions can externally change measurements. However, this is what uncertainty intervals are designed to account for.
However, too many people call variations in temperature from day to day. Month to month, year to year at any given station noise also. The variations ARE the signal! The fact that they don’t lie on a simple trend doesn’t mean noise.
Consider each measurement as one from repeating experiments. Those experimental measurements will vary and their variance will be a measure of the spread in uncertainty. Repeatability conditions are important and are discussed in the GUM. One is measurements occurring in a short time. NIST in TN 1900 uses a month of data. To me longer periods begin to begin to add in seasonal changes that increase variance.
The uncertainty I see mentioned here are far to small to represent a dispersion of measurementsp values around a mean.
Lastly, the experimental standard uncertainty of the mean is a statistic describing an interval surrounding a mean where tells one where the mean may lay. It is not a measurement uncertainty informing one of the variance (spread) in the measurements themselves.
I would be very interested to know what the experimental standard uncertainty in this data.
It is thing in all disciplines of science. They all utilize measurements that contain a component of uncertainty arising from a random effect.
“”D.5.2 Uncertainty of measurement is thus an expression of the fact that, for a given measurand and a given result of measurement of it, there is not one value but an infinite number of values dispersed about the result that are consistent with all of the observations and data and one’s knowledge of the physical world, and that with varying degrees of credibility can be attributed to the measurand. “””
INFINITE NUMBER OF VALUES DISPURSED ABOUT THE RESULT THAT ARE CONSISTENT WITH ALL THE OBSERVATIONS AND DATA.
Tell us how an experimental standard uncertainty of the mean tell anyone about the dispersion of observations and data.
There can also be systematic variations that have to be identified and removed. One cannot assume that all variation is random.
I’m addressing Joseph Zorin’s question “Is there such a thing as “random noise” in climate science?” The say the answer is yes. I extend my answer to all disciplines of science. Are you are challenging the answer I have given?
From the GUM.
Of course there are influences that can cause measurement uncertainty. Is that considered noise? Not really, wind, clouds, etc. are not temperature. Their influence may change a temperature reading, but they are really part of the weather that temperature is measuring. You might as well ask if nighttime temperatures are noise.
It looks like Schmidt cherry picked which models to include. Also reanalysis data includes model error as it uses models to create it.
So given reanalysis data is generated by models doesn’t it stand to reason that it would be closer to climate model forecasts?
The fact that they don’t match suggests one is wrong. I strongly suspect it is the climate models that are wrong, both BEST and ERA5 incorporate as much data as possible. One can argue about the models used and the processing, as I have in the past, but they are based on data and much closer to it than climate models.
So they are showing the effect of massive surface urbanisation.
There is no possible way that using urban tainted surface data can give you a true representation of the Earth’s warming.
I can say confidently that prognostic models (CMIP6) are less correct regarding the global average temperature than diagnostic models (BEST, GISTEMP, ERA5, etc.). The obvious reason is because the prognostic models have to make a prediction of a future state from measurements of a past state whereas the diagnostic models only have to assess the present state using present measurements.
BEST, GISS et al are built using mostly URBAN and AIRPORT temperatures.
They cannot possibly give a correct global view of the temperature.
They’re built on air temperature measurements riddled with massive amounts of systematic error.
Yep, well aware of the measurement issues.. and how they have changed with different equipment.
… but apart from that, there is also a lot of spurious local urban and other warming over time.
There is absolutely zero possibility that the surface station data fabrications can give even a remotely true representation of any planetary warming.
Pat,
I have nearly finished an article on UHI using 45 “pristine” Aussie stations and a comparison set of 44 “urban” stations. Like a preview copy?
It is difficult even to select a matching set of stations because anything above about 10 stations of each departs Goldilocks territory and I end up rejecting station after station because of errors and noise (undefined).
My initial expectation was dashed, that Aust would have numerous pristine stations that group into regions like Koppen and provide an estimate of real climate trend without UHI. Multiyear trends for pristine stations over comparable periods like 1970 to 2020 range in Tmax and Tmin from 1 degC negative to above 4 degC per century equivalent with no apparent clustering around a plausible pristine value.
Yet the “experts” allege they can calculate global averages for a year to numbers like +/ 0.1 degC since 1910 or whatever.
By any definition, junk science. Geoff S
Unfortunately, even rural stations are very often corrupted by local factors that may not be apparent until visited physically.
And of course, Australia is a VAST country, with many different EVERchanging weather patterns.