By Andy May

You can read this post in German here, courtesy of Christian Freuer.

Here we go again, writing on the proper use of statistics in climate science. Traditionally, the most serious errors in statistical analysis are made in the social sciences, with medical papers coming in a close second. Climate science is biting at their heels.

In this case we are dealing with a dispute between Nicola Scafetta, a Professor of Atmospheric Physics at the University of Naples and Gavin Schmidt, a blogger at RealClimate.org, a climate modeler, and director at NASA’s Goddard Institute for Space Studies (GISS).

Scafetta’s original 2022 paper in *Geophysical Research Letters* is the origin of the dispute (downloading a pdf is free). The essence of the paper is that CMIP6 global climate models (GCMs) that produce an ECS (Equilibrium Climate Sensitivity) higher than 3°C/2xCO_{2} (“°C/2xCO_{2}” means °C per doubling of CO_{2}) are statistically significantly different (they run too hot) from observations since 1980. This result is not surprising and is in line with the recent findings by McKitrick and Christy (2020). The fact that the AR6/CMIP6 climate models run too hot and that it appears to be a function of too-high ECS is acknowledged in AR6:

“The AR5 assessed with low confidence that most, though not all, CMIP3 and CMIP5 models overestimated the observed warming trend in the tropical troposphere during the satellite period 1979-2012, and that a third to a half of this difference was due to an overestimate of the SST [sea surface temperature] trend during this period. Since the AR5, additional studies based on CMIP5 and CMIP6 models show that this warming bias in tropospheric temperatures remains.”

(AR6, p. 443)

And:

“Several studies using CMIP6 models suggest that differences in climate sensitivity may be an important factor contributing to the discrepancy between the simulated and observed tropospheric temperature trends (McKitrick and Christy, 2020; Po-Chedley et al., 2021)”

(AR6, p. 443)

The AR6 authors tried to soften the admission with clever wording, but McKitrick and Christy showed that the AR5/CMIP5 models are too warm in the tropical troposphere and fail to match observations at a statistically significant level. Yet, regardless of the evidence that AR5 was already too hot, AR6 is hotter, as admitted in AR6 on page 321:

“The AR5 assessed estimate for historical warming between 1850–1900 and 1986–2005 is 0.61 [0.55 to 0.67] °C. The equivalent in AR6 is 0.69 [0.54 to 0.79] °C, and the 0.08 [-0.01 to 0.12] °C difference is an estimate of the contribution of changes in observational understanding alone (Cross-Chapter Box 2.3, Table 1).”

(AR6, p. 321).

So, we see that the AR6 assessment that the AR6 and AR5 climate sensitivity to CO_{2} may be too high and that AR6 is worse than AR5 supports the work that Scafetta, McKitrick, and Christy have done in recent years.

Now let’s look at the dispute on how to compute the statistical error of the mean warming from 1980-1990 to 2011-2021 between Scafetta and Schmidt. Schmidt (2022)’s objections to Scafetta’s error analysis are posted on his blog here. Scafetta’s original *Geophysical Research Letters* paper was later followed by a more extended paper in *Climate Dynamics* (Scafetta N., 2022b) where the issue is discussed in detail in the first and second appendix.

**Scafetta (2022a)’s analysis of climate model ECS**

The essence of Scafetta’s argument is illustrated in figure 1.

In figure 1 we see that when ECS is greater than 3°C/2xCO_{2} the models run hot. The righthand plots show a comparison of the mean difference between the observations and models between the 11-year periods of 1980-1990 and 2011-2021. Scafetta’s 2022a full analysis is contained in his Table 1 where 107 CMIP6 GCM average simulations for the historical + SSP2-4.5, SSP3-7.0, and SSP5-8.5 IPCC greenhouse emissions scenarios provided by Climate Explorer are analyzed. The ERA5-T2m^{[1]} mean global surface warming from 1980-1990 to 2011-2021 was estimated to be 0.578°C from the ERA5 worldwide grid. The IPCC/CMIP6 climate model mean warming is significantly higher for all the models plotted when ECS is greater than 3°C/2xCO_{2}.

**Schmidt’s analysis**

The plots shown on the right in figure 1 are the essence of the debate between Scafetta and Schmidt. The data plotted by Schmidt (shown in our figure 2) is slightly different but shows the same thing.

In figure 2 we see that the only model ECS ensemble mean estimates (green dots) that equal or fall around the ERA5 weather reanalysis mean difference between 1980-1990 and 2011-2021 are ECS estimates of 3°C/2xCO_{2} or less. All ensemble ECS estimates above 3°C/2xCO_{2} run too hot. Thus, on the basic data Schmidt agrees with Scafetta, which is helpful.

**The Dispute**

The essence of the dispute is how to compute the 95% uncertainty (the error estimate) of the 2011-2021 ERA5 weather reanalysis mean relative to the 1980-1990 period. This error estimate is used to decide whether a particular model result is within the margin of error of the observations (ERA5) or not. Scafetta computes a very small ERA5 error range of 0.01°C (Scafetta N. , 2022b, Appendix) from similar products (HadCRUT5, for example) because ECMWF (European Centre for Medium-Range Weather) provides no uncertainty estimate with their weather reanalysis product (ERA5), so it must be estimated. Schmidt computes a very large ERA5 margin of error of 0.1°C using the ERA5 standard deviation for the period. It is shown with the pink band in figure 2. This is the critical value in deciding which differences between the climate model results and the observations are statistically significant.

If we assume that Scafetta’s estimate correct, figures 1 and 2 show that all climate model simulations (the green dots in figure 2) for the 21 climate models with ECS >3°C and the great majority of their simulation members (the black dots) are obviously too warm at a statistically significant level. Whereas, assuming Schmidt’s estimate correct, figure 2 suggests that three climate models with ECS>3°C partially fall within the ERA5 margin of error while the other 18 climate models run too hot.

Although Schmidt’s result does not appear to significantly change the conclusion of Scafetta (2022a, 2022b) that only the climate models with ECS<3.01°C appear to best hindcast the warming from 1980-1990 to 2011-2021, it is important to discuss the error issue. I will refer to the standard stochastic methods for the evaluation of the error of the mean discussed in the classical textbook on error analysis by Taylor (1997).

In the following I repeat the calculation made by Schmidt and comment on them using the HadCRUT5.0.1.0 annual mean global surface temperature record instead of the ERA5-T2m because it is easier to get, it is nearly equivalent to ERA5-T2m, and especially because it also reports the relative stochastic uncertainties for each year, which, as already explained, is a crucial component to evaluating the statistical significance of any differences between reality and the climate models.

Schmidt’s estimate of the error of the mean (the pink bar in Figure 2) is ± 0.1°C (95% confidence). He obtained this value by assuming that the interannual variability in the ERA5-T2m from 2011 to 2021 from the decadal mean is random noise. Practically, he calculated the average warming (0.58°C) from 2011 to 2021 using the ERA5-T2m temperature anomalies relative to the 1980-1990 mean. That is, he “baselined” the values to the 1980-1990 mean. Then he estimated the error of the mean by computing the standard deviation of the baselined values from 2011 to 2021, he then divided this standard deviation by the root of 11 (because there are N=11 years) and, finally, he multiplied the result by 1.96 to get the 95% confidence. Download a spreadsheet performing Schmidt’s and Scafetta’s calculations here.

Figure 3 shows Schmidt’s equation for the error of the mean. When this value is multiplied by 1.96, to get the 95% confidence, it gives an error of ± 0.1°C.

The equations used by Schmidt are those reported in Taylor (1997, pages 100-102). The main concern with Schmidt’s approach is that Taylor clearly explains that the equation in figure 3 for the error of the mean only works if the N yearly temperature values (*T _{i}*) are random “

*measurements of the same quantity x.*” For example, Taylor (page 102-103) uses the above equation to estimate the error of the mean for the elastic constant k of “one” spring by using repeated measurements with the same instrument. Since the true elastic constant is only one value, the variability of the repeated measurements can be interpreted as random noise around a mean value whose standard deviation is the Standard Deviation of the Mean (SDOM).

In using the SDOM, Schmidt et al. implicitly assume that each annual mean temperature datum is a measurement of a single true decadal value and that the statistical error for each datum is given by its deviation from that decadal mean. In effect, they assume that the “true” global surface temperature does not vary between 1980 and 1990 or 2011-2021 and all deviations from the mean (or true) value are random variability.

However, the interannual variability of the global surface temperature record over these two decades is not random noise around a decadal mean. The N yearly mean temperature measurements from 2011 to 2021 are not independent “measurements of the same quantity x” but each year is a different physical state of the climate system. This is easily seen in the plot of both decades in this spreadsheet. The x-axis is labeled 2010-2022, but for the orange line, it is actually 1979-1991, I did it this way to show the differences in the two decades. Thus, according to Taylor (1997), SDOM is not the correct equation to be adopted in this specific case.

As Scafetta (2022b) explains, the global surface temperature record is highly autocorrelated because it contains the dynamical interannual evolution of the climate system produced by ENSO oscillations and other natural phenomena. These oscillations and trends are a physical signal, not noise. Scafetta (2022b) explains that given a generic time series (*y _{t}*) affected by Gaussian (randomly) distributed uncertainties ξ with standard deviation σ

_{ξ}, the mean and the error of the mean are given by the equation in figure 4.

The equation in figure 4 gives an error of 0.01°C (at the 95% confidence level, see the spreadsheet here for the computational details). If the standard deviation of the errors are not strictly constant for each datum, the standard error to be used in the above equation is the square root of the mean of the squared uncertainties for each datum.

Scafetta’s equation derives directly from the general formula for the error propagation discussed by (Taylor, 1997, p. 60 and 75). Taylor explains that the equations on pages 60 and 75 must be adopted for estimating the error of a function of “several” independent variables each affected with an individual stochastic error, corresponding to different physical states, such as the average of a global surface temperature record of N “different” years. The uncertainty of the function (e.g., the mean on N different quantities) only depends on the statistical error of each quantity, not on the variability of the various quantities from their mean.

We can use an error propagation calculator tool available on the internet to check our calculations. I uploaded the annual mean ERA5 temperature data and the respective HadCRUT5 uncertainties and had the calculator evaluate the mean with its relative error. The result is shown in Figure 5.

Schmidt’s calculation of the standard deviation of the mean (SDOM) is based on the erroneous premise that he is making multiple measurements of the same thing, using the same method, and that, therefore, the interannual variability from the decadal mean is some kind of random noise that can be considered stochastic uncertainty. None of these conditions are true in this case. The global yearly average surface temperature anomaly is always changing for natural reasons, although its annual estimates are also affected by a small stochastic error such as those incorporated into Scafetta’s calculation. According to Taylor, it is only the errors of measure of the yearly temperature means that can determine the error of the 11-year mean from 2011 to 2021.

As Scafetta writes in the appendix to Scafetta 2022b, HadCRUT5’s global surface temperature record includes its 95% confidence interval estimate and, from 2011 to 2021, the uncertainties for the monthly and annual averages are monthly ≈ 0.05°C and annual ≈ 0.03°C. Berkeley Earth land/ocean temperature record uncertainty estimates are 0.042°C (monthly), 0.028°C (annual), and 0.022°C (decadal). The longer the time period, the lower the error of the mean becomes.

Each of the above values, year-by-year, must averaged and divided by the square-root of the number of years (in this case 11) to determine the error of the mean. In our case, the HadCRUT5 error of the mean for 2011-2021 is 0.01°C. Scafetta’s method allows for the “true” value to vary in each year, Schmidt’s method does not.

The observations used for the ERA5 weather reanalysis are very nearly the same as those used in the HadCRUT5 dataset (Lenssen et al., 2019; Morice et al., 2021; Rohde et al., 2020). As Morice et al. note, the MET Office Hadley Centre uses ERA5 for quality control.

Lenssen et al., which includes Gavin Schmidt as a co-author, does an extensive review of uncertainty in several global average temperature datasets, including ERA5. Craigmile and Guttorp provide the plot in figure 6 of the estimated yearly standard error in several global surface temperature records: GISTEMP, HadCRUT5, NOAA, GISS, JMA and Berkeley Earth.

Figure 6 shows that from 1980 to 2021, at the annual scale and at 95% confidence, the standard error of the uncertainties is much less than Schmidt’s error of the mean of 0.10°C, which, furthermore, is calculated on a time scale of 11 years. The uncertainties reported in Figure 6 are *not* given by the interannual temperature variability around a decadal mean. This result clearly indicates that Schmidt’s calculation is erroneous because at the 11-year time scale the error of the mean must be significantly smaller (by the root of 11 = 3.3) than the annual value.

Scafetta (2022b) argues that the errors for the annual mean of the ERA5-T2m should be of the same order of magnitude as those of other temperature reconstructions, like the closely related HadCRUT5 dataset. Thus, the error at the decadal scale must be negligible, about ±0.01°C, and this result is also confirmed by the online calculator tools for estimating the error of given functions of independent variables as shown in figure 5.

The differences between Scafetta and Schmidt are caused by the different estimates of ERA5 error. I find Scafetta’s much more realistic.

*Patrick Frank helped me with this post, but any errors are mine alone.*

*Download the bibliography **here**.*

ERA-T2m is the European Centre for Medium-Range Weather (ECMWF) Reanalysis 2-meter air temperature variable.

Irrespective of trivial quibbling about the surface, even Schmidt concedes the mid troposphere is way off. So there can be no dispute the physics has been misrepresented. Schmidt likes to zero in on surface quibbling because some models kinda-sorta get it right. But those same kinda-sorta-right models get other layers totally wrong. what gives?

Any model that shows any ocean surface waters sustaining more than 30C are wrong.

The INM model does not exceed the 30C but it lowers the present by 4C to get its warming trend. So WRONG now rather than by the end of the century.

They run too hot on top of inflated temperature readings. As they say, a twofer.

or maybe a toofer.

From a Climate Twoofer? a.k.a. Truther

”on top of inflated temperature readings”

Yes, although I do notice poor old 1998 a been emasculated.

Hansen dethroned 1934 as the warmest year recorded in the United States, and then he did the same for 1998, because both years did not fit the “hotter and hotter and hotter” narrative of the climate change alarmists.

If it’s just as warm in the recent past as it is today, yet there is much more CO2 in the atmosphere today, then logic would tell you that increased amounts of CO2 have not resulted in higher temperatures.

So to stifle that argument, the Temperature Data Mannipulators like Hansen went into their computers and changed the temperature profile from a benign temperature profile (not too cold and not too hot), into a “hotter and hotter and hotter” temperature profile.

It’s all a BIG LIE and our radical Leftwing politicians are using this BIG LIE to destroy Western Democracy.

That’s a pretty big lie you perpetrated, James. Proud of yourself? You shouldn’t be.

Never mind that a global mean is a meaningless concept with regards to Earth’s environment which is characterized by wild swings locally and regionally. Extreme variation is the norm.

Hacks like Gavin need to pretend there was stability unbalanced by human emissions to push their phony narrative littered with errors.

If you look at the atmosphere, there are clearly at least TWO indicators of the energy content present, one being temperature the other being wind.

The global average published only includes one of these energy systems. And hence can never be a measure of the energy total in the system.

We all know that a difference in temperature can cause wind and most would assume that this can also work in reverse. So the energy in the system is capable of being in either of two modes. You could consider the wind to be the kinetic energy form and the temperature to be the potential energy.

Imagine trying to define the energy of a park full of swings, all with kids on board, some big kids, some light kids some moving quickly some moving slowly and yet you only define the energy present by the height of the swings at any one instant.

I’d argue that any sense of an average is meaningless without reference to the inclusion of the kinetic energy at the same instant.

So when will we see a plot of the total energy in the atmosphere and not just some measure of the temperature, which is otherwise missing large portions, if not most of the data.

“which is otherwise missing large portions, if not most of the data.”

Yes, there are thousands of variables in continuous flux regulating Earth’s temperature. Because we lack the grid cell resolution to capture most of those variables accurately they must be modeled. The error bars in model assumptions are greater than the effect they are trying to isolate. Pretending that model output has any statistical significance for real world conditions is a primary error.

Averages tell you nothing about the magnitude of individual fluxes which can be determinant for future conditions.

Actually, in measuring atmospheric energy content, you also have to take into account moisture content (latent energy), pressure/altitude (potential energy), and not only temperature (dry-static energy), and wind (kinetic energy).

It makes no sense to compute temperature anomalies in the winter Arctic, where the atmosphere is extremely dry, together with temperature anomalies in the rest of the world. The result is that Arctic warming is driving most of the warming seen in the GSAT anomaly when the energy change involved is very small. That’s how they got HadCRUT 5 to show a lot more warming and absence of pause than HadCRUT 3. Expanding Arctic coverage, as the Pause was accompanied by an increase in Arctic warming.

The real measure of atmospheric energetics would require the use of enthalpy, but it is a lot more difficult to compute than temperature, and it is a concept that most people don’t handle.

Would it be difficult to calculate enthalpy? I think the relative humidity and air pressure are recorded as well as temperatures.

It would be interesting to just weight the HadCRUT grid cells by average humidity in calculating the HadCRUT global mean time series. They already weight the grid cells in proportion to area to do the calculation.

Nice suggestion! Wonder why the so-called “climate scientists” have never done this? It shouldn’t be hard to do!

Perhaps they have done it but didn’t like the results?

It has been done many times. Song et al. 2022 is a more recent example. They found that equivalent potential temperature (theta-e) increased by 1.48 C despite the dry bulb temperature only increased by 0.79 C.

The paper is BS. They claim hurricane frequency and strength has increased.

The deal killer- “Here we examine the Thetae_sfc changes under the Representative Concentration Pathways 8.5″

Can you post a link to another study that shows values of temperature and equivalent potential temperature for the period 1980 to 2019 that are significantly different from 0.79 C and 1.48 C respectively?

“Can you post a link to another study that shows values of temperature and equivalent potential temperature for the period 1980 to 2019 that are significantly different from 0.79 C and 1.48 C respectively?”

Can you understand that using RCP 8.5 invalidates their alarmist claptrap? The paper oozes with bias.

I had the opinion you had better chops than this.

RCP8.5 (or any RCP for that matter) was not used to provide the 0.79 C and 1.48 C temperature and theta-e values respectively. Those come from HadCRUT.

They designed a study to show the greatest amount of future warming. This is activism not science.

Surface equivalent potential temperature (Thetae_sfc) doesn’t match surface air temperature observations.

“The magnitude of the Thetae_sfc trends is significantly different from that of SAT. Thetae_sfc has much larger temporal variations than SAT, and the linear trend (1.48 °C) is roughly double that of SAT (0.79 °C) in the observations.”

Yep. It would be very unusual if they did match.

Yep. That’s what happens when there is a positive trend in temperature and humidity.

Potential temperature isn’t real. It would be seen in observations if it was. Since observations didn’t give them the warming rates desired they had to make up something that did.

If the objective is to be as obtuse as possible then why not indict all thermodynamic metrics as fake? Anyway, theta-e is included in observations. You see it in most upper-air station observations. Furthermore theta-e is not used to assess the warming rate. However, it is used to assess the enthalpy rate. That’s partly what this subthread is about. The other part is kinetic energy. And as I’ve said already ERA already provides a total integrated energy product already that combines enthalpy and kinetic energy.

It would be even easier to just use the total vertically integrated energy product provided by ERA already.

“Extreme variation is the norm.”

Yes, we need to keep reminding people of this, as the climate change alarmists claim every extreme weather event is abnormal. No, they are not. Extreme weather should be expected.

And there has never been any connection made between any extreme weather event and CO2, although you wouldn’t know that listening to climate change alarmists who see CO2 in everything.

Extreme weather is normal weather.

The extreme level of propaganda people are exposed to is the problem. Because blatant lies are presented as unquestionable fact people can’t even conceive that it could be wrong. They cling to their “trusted news source’s” lies over empirical evidence because they have heard the lies hundreds of times and can’t accept new information because the belief is so ingrained.

It’s an interesting article that should encourage some good conversation.

It looks like Scaffetta did a type B evaluation while Schmidt did a type A evaluation. The Schmidt type A evaluation includes the component of uncertainty caused by natural variation whereas the Scaffetta type B evaluation only includes the component of uncertainty caused by measurement variation.

Both evaluations are correct for their respective intents. The question is…which one is best suited for comparison with CMIP6? I think there are valid arguments both ways. However, given that the comparisons are sensitive to the timing of natural variation within the decade I think it makes sense to include the natural variation component. Case in point…we are currently in a Monckton Pause lasting 8.75 years primarily because the early part of the period was marked by a very strong El Nina while the later part of the period is marked by a triple dip La Nina. If in the future Schmidt compared 2023-2033 with 1980-1990 he might get a very different result due to 2023 starting off as an El Nino (assuming one does indeed form later this year) even though long term forcings might remain unchanged.

BTW…your correct statements about the uncertainty of the average being assessed with the 1/sqrt(N) rule is going to absolutely trigger some of the familiar posters here causing hundreds of posts. As always…it will be interesting.

BTW #2…ERA5 actually does include uncertainty. It’s just really hard to work with because you have to download the grids and type B it spatially and temporally up to global values and then daily, monthly, and annual values.

Great Scott! A hypocrisis to the nth degree. Tangential yawnfest.

The earth’s climate has been warming for three hundred years or more. The odds are that it will continue to warm for another hundred years or more, regardless of what anyone tries to do to stop it.

For those who are more interested in the complexities of public policy analysis than we are in the complexities of IPCC climate modeling, the energy marketplace and the impacts of public policy decision making on that energy marketplace are much more interesting and much more rewarding as topics to be spending serious personal time in discussing.

Not sure who the proverbial “we” is there, but I can guarantee that 100 comments re-litigating the 1/sqrt(N) rule just isn’t going to get you very far.

JCM, I’m presuming that I’m not the only one who posts comments on WUWT who is more interested in public policy decision making than in the low level details of climate-related temperature measurement and prediction.

Hence the use of ‘we’ assuming there is one more person who has the same outlook as I do.

However, as the number of comments on this article continues its steep rise, just as Mr. Bdgwx said would happen, the tempation to ask Nick Stokes a question or two concerning natural variability in GMT versus anthropogenic effects on GMT is becoming almost too powerful to resist.

The intent of both Schmidt and Scafetta was to evaluate at what point ECS became too large. That is, when is greenhouse gas warming statistically higher than observed warming between the two decades.

How do we treat natural oscillations inside the two decades being compared? Are they part of the random noise and included in the uncertainty? Or not.

Since the IPCC has declared in AR6, and previous reports, that natural forcings are zero and greenhouse gases have caused 100% of modern warming, I don’t think it is justified to include natural forces in the uncertainty. It is only appropriate to include measurement error and biases.

If we include it in the uncertainty here, it must also be included in the calculation of ECS, which would lower the ECS, since ECS is determined by subtracting natural “forcings” from the computed greenhouse gas warming. You can’t have it both ways and both ways, if done consistently, lower ECS.

“Since the IPCC has declared in AR6, and previous reports, that natural forcings are zero and greenhouse gases have caused 100% of modern warming, I don’t think it is justified to include natural forces in the uncertainty.”

This is the primary error which is the source of all the false attributions and the reason models run hot. Ignoring what actually drives climate is the only way they can elevate CO2 to a dominant position. Natural forces are the uncertainty being swept under the rug.

Always appreciate Gavin being shown up but arguing about error bars from fictional models feels like validation that they represent Earth’s actual climate system.

”Always appreciate Gavin being shown up but arguing about error bars from fictional models feels like validation that they represent Earth’s actual climate system.”

Sweeter words have not been written. It’s like arguing that fairies would beat leprechauns at poker.

Don’t be silly. How else do you think the leprechauns got their pots of gold?

old cocky:

“Don’t be silly. How else do you think the leprechauns got their pots of gold?”I was told by my Irish ancestors, usually on or about the 17th of March of every year, that it was alchemy.

“Ignoring what actually drives climate is the only way they can elevate CO2 to a dominant position. Natural forces are the uncertainty being swept under the rug. ”

Absolutely right.

It’s kind of ridiculous, isn’t it. Natural forces (Mother Nature) drive the climate until the 1980’s and then, all of a sudden, CO2 takes control, and Mother Nature goes away.

We should keep in mind that today’s CO2 levels are at about 415ppm, and climate change alarmists say this is too much and will cause a runaway greenhouse effect that will overheat the Earth with dire consequences for all life.

But CO2 levels have been much higher in the past, 7,000ppm or more, yet no runaway greenhouse occurred then, so why should we expect one to occur now?

CO2 Scaremongering is the problem. A little cooling from Mother Nature might fix that problem.

Andy

“since ECS is determined by subtracting natural “forcings” from the computed greenhouse gas warming”

Absolutely untrue. It makes no sense. ECS for models is usually determined by direct experiment – ie double CO2 and see what happens.

First you need to know what happens before you double co2.

“Absolutely untrue. It makes no sense. ECS for models is usually determined by direct experiment – ie double CO2 and see what happens.”

In practice this IS true. The GCMs cant produce natural forcings beyond 17 years (by Santer) and then average out over the timescales applicable to ECS. By using GCMs to investigate ECS, they’re effectively ruling out natural forcings.

However natural forcings produced a habitable greenland for example. The GCMs wont do that.

“The GCMs cant produce natural forcings beyond 17 years (by Santer)”Santer said nothing like that.

The relevant paper is

https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2011JD016263

where a key point is named as

And they use control runs (no forcings) to establish that the model doesn’t show climate change beyond 17 years and therefore any observed climate change longer than 17 years must be forced (by CO2). Namely

“The GCMs cant produce natural forcings beyond 17 years (by Santer)”The paper you linked says nothing about GCM’s at all. It is about surely identifying a trend from data. It doesn’t say that a model, or anything else, can’t reproduce natural variation. It says you need that length of time to say that trend was probably due to CO2. Then you can distinguish it from natural variation; that does mean there wasn’t any.

That hilarious. I recommend you read it.

And when you do, remember that GCM control runs by definition dont produce climate change. Beyond 17 years apparently. They’re comparing control runs to observed warming trends.

Yes, my apologies, my memory of the paper was faulty. Santer does use GCM’s to calculate an error structure for determining the significance of trends in measured tropospheric temperature.

But the claim

“The GCMs cant produce natural forcings beyond 17 years (by Santer)”is still nonsense. What he says is that in models, pauses (periods under forcing with zero or less trend) of greater than 17 years are very infrequent (there are important extra bits about removing known sources of variation). That is just a frequency observation. It doesn’t say they can’t do anything. And it has nothing to do with “

produce natural forcings“. I don’t even know what that could mean.I gave you the relevant quote.

“[30] On timescales longer than 17 years, the average trends in RSS and UAH near-global TLT data consistently exceed 95% of the unforced trends in the CMIP-3 control runs (Figure 6d), clearly indicating that the observed multi-decadal warming of the lower troposphere is too large to be explained by model estimates of natural internal variability.”

The unforced trends are control runs where GCMs produce no climate change. It’s by definition. Its how they’rebuilt. And the implication is clearly GCMs don’t model natural forcings resulting in climate change like most climate change over most history. They’re just not capable.

So ECS is essentially as a result of the forcing as modelled by a GCM.

There is no issue of capability. That is a frequency observation. 95% of control periods do not reach the current trend level over 17 years. Therefore it is unlikely that the trend we see happened without forcing. It doesn’t say it couldn’t happen.

What it says is that after 17 years the trend reverses. And over time it averages to nothing. Its by definition. So from an ECS perspective, GCMs describe ECS purely from a forcing point of view. There is no natural variability built in over the timescales needed for ECS in GCMs.

Unlike the real world where climate change persists longer than 17 years and results in a habitable Greenland or frozen Thames.

Nick the first line says he used models:

Nick,

ECS is not a scientific value since it can never be measured in the real world. An instantaneous doubling of CO2 can never happen outside the model world. And, even if it did, everything else would not stay the same while the world came into equilibrium with the doubled CO2. It is a model fantasy number. And ECS, in the model world, is determined by subtracting natural only models from all forcings models. The problem is they assume natural only is zero, and it clearly isn’t.

See figure 3 here:

https://andymaypetrophysicist.com/2020/11/10/facts-and-theories-updated/

TCR is a value that can probably be tested and falsified. A 1% rise per year, until doubled, is a decent hypothesis. And we are on track to show TCR is at the very low end of the IPCC range.

While STILL assuming all of the warming is caused by CO2 – which it is not.

You need no model for that. There is sufficient reverse correlation between temperature and atmospheric CO2 in the Earth’s climate history to confirm ECS as zero.

I believe I understand what the point of the analysis is. My point is that the observed temperature is subject to both natural variation and forced change. If the forced change is +0.6 C natural variation could cause us to observe +0.5 C to +0.7 C assuming the natural variation is ± 0.1 C (2σ). In other words, an observation of 0.58 C is consistent with a forced change of 0.48 to 0.68 C. If would be 0.48 C if we happen to be in an extreme cool phase of the natural oscillation and 0.68 C if we happen to be in an extreme warm phase of the natural oscillation.

The question…do we conclude that a model under/over estimated if the observation was only 0-0.1 below/above the prediction. I don’t think so because the observation could be high/low because ENSO (or whatever) happened to be high/low.

I think you mean “the observed temperature is subject to the natural variation, forced change and adjustments in search of fitting it into the required result.”

If you assume the temperature trend line is entirely due to a constant monotonic increase in forcing, the natural variation is the excursions from that trend line. From the graphs, those excursions appear to be along the lines of +/- rand(0.5)

Yeah. Exactly. I sometimes use the example y = x + sin(x). The x term is like the forced change while the sin(x) term is like the natural variation.

BTW…using the NIST uncertainty machine we see that if u(x) = 1 then when Y = x it is the case that u(Y) = 1 as well. But if Y = x + sin(x) then u(Y) = 1.63. The point…adding variation adds uncertainty.

I see where you’re coming from as far as noise adding to the uncertainty. It was more a case that the trend and noise in your example were reversed.

It can take quite a difference in trend to allow noisy time series to be distinguished at a sensible level of significance.

“If we include it in the uncertainty here, it must also be included in the calculation of ECS, which would lower the ECS, since ECS is determined by subtracting natural “forcings” from the computed greenhouse gas warming..

No. Even if this were true (https://wattsupwiththat.com/2023/04/13/the-error-of-the-mean-a-dispute-between-gavin-schmidt-and-nicola-scafetta/#comment-3708016), since they both have expected values (like ’em or not),and approximately normally distributed standard errors*, that are either independent (worst case) or correlated (which would tend to

increasethe difference), then, assuming worst case – independence – the expected value of the subtraction would be taken from the 2 expected values. Now, the standard error of it would then be (u”natural”^2+u”GHG”^2)^0.5, which is certainlygreater.*per modern CLT.

Dear bdgwx,

Heck, I thought the science was settled and that is why here in Australia they are blowing-up power stations, destroying electricity generation networks, flushing perfectly good fresh water down the rivers to irrigate the ocean, bringing in carbon taxes (run by ‘the market’) and spending zillions of dollars they don’t have in order to make everybody poorer and more dependent, in a grovelling sort of way, on China.

The absurdity of the claim that computer games can be used to predict anything is beyond scientific belief. Which particular scientists at the University of NSW, or ANU or anywhere else can be held to account for the wreckage created out of thin-air by their green-dreams? They forget the atmosphere is still 99.96% CO2-free.

Against this background, I don’t see the point of arguing the toss about uncertainty, when in fact all the data being used is modeled. All of it is modeled, and each model within models suffer from the same problem – they take specks of data from weather balloons, ships, planes, temperatures measured in Stevenson screens or not, rub them together and come up with a trend. Then they squabble about which trend is ‘real’ and which models are best, meanwhile there is a war going-on, most nations are broke, many have been furiously back-pedalling, the rest will eventually once all this is revealed to be nonsense.

Two years ago we presented a study on WUWT, about sea surface temperature along the Great Barrier Reef (https://wattsupwiththat.com/2021/08/26/great-barrier-reef-sea-surface-temperature-no-change-in-150-years/). A detailed report downloadable at https://www.bomwatch.com.au/wp-content/uploads/2021/10/GBR_SST-study_Aug05.pdf outlined our methods, results and discussed the findings. I was clear subsequently, that a journal version of the report submitted to

Frontiers in Marine Science Coral Reef Researchwould never be published, probably because the apple-cart was too-full of rotten apples, the same ones running the peer review process, and upsetting the cart would not be ‘collegiate’ as they say.The paper presented a substantial amount of data and raised valid questions about biases embedded in HadISST that caused trend, which was not measurable in Australian Institute of Marine Science temperature timeseries. There is also no long-term trend in maximum temperature observed at Cairns, Townsville and Rockhampton that is not explained by site moves and changes. Confirmed by aerial photographs and archived documents, some site changes were fudged by Bureau of Meteorology (BoM) scientists, most recently Blair Trewin, to imply warming in the climate,

which actually does not exist.So here in Australia, the BoM massages the data, which is used to underpin CSIRO

State of the ClimateReports which are then used by the government to lecture and eventually screw the populace intobelievingin something that has been manufactured for the last 30-years.My question is if there is no warming in surface temperature, and no warming in SST, irrespective of uncertainties amongst them, how can any of the models be correct? And if the data (HadISST and homogenised maxima) have been fudged to support the models, how can such data be used to verify the models?Science in Australia has been totally corrupted, top-down by governments, intent on ‘proving’ warming to drive their absurd WEF-UN-agendas. The same seems true in Canada, New Zealand and the USA.

Yours sincerely,

Dr Bill Johnston

http://www.bomwatch.com.au

They can’t

Those who claim problems with surface temperature measurements can’t dispute that the world has been warming, unless they say UAH v6 TLT is also wrong by showing warming.

Dear donklipstein,

I am not claiming anything, I am explaining what I’ve found (there is a difference). I have also only published a small proportion of all the datasets I have examined over the last decade.

The “can’t dispute that the world has been warming” thing, reflects success of a marketing strategy, not unambiguous scientific study.The original thesis that temperature is increasing, goes back to before the satellite era. For Australia, to before 1990 when the government started on about “greenhouse”. From that point, in order to support the thesis they (the Bureau of Meteorology and CSIRO) had to find warming in surface temperature data and they did that using various forms of homogenization, including arbitrary ‘adjustments’.

Their main trick is to adjust changes in data that made no difference, and/or, not mention/not adjust changes that did.I have used objective statistical methods and aerial photographs and documents/pictures from the National Archives to track such changes and verify station metadata.Check out my Charleville report for example, where for 13 years they made the data-up (the full report is here: https://www.bomwatch.com.au/wp-content/uploads/2021/02/BOM-Charleville-Paper-FINAL.pdf). Or Townsville, where they forgot they moved the Stevenson screen to a mound on the western side of the runway in 1970 (https://www.bomwatch.com.au/wp-content/uploads/2021/02/BOM-Charleville-Paper-FINAL.pdf). There are also many examples of where they did not know where the original sites were or that they had moved (https://www.bomwatch.com.au/wp-content/uploads/2020/08/Are-AWS-any-good_Part1_FINAL-22August_prt.pdf).

Your faith is misplaced. While you say “Those who claim problems with surface temperature measurements can’t dispute that the world has been warming”, it is no joke that scientists within the Bureau of Meteorology have cheated and changed data in order to homogenise-in the ‘trend’.Most of the individual site reports I have published on http://www.bomwatch.com.au are accompanied by the datasets used in the study, so anyone or any of the multitude of fake fact-checkers is welcome to undertake replicate analysis.

Regardless of what UAH has to say (and they don’t estimate surface T), no medium to long-term datasets in Australia are capable of detecting small trends that could be attributed to the ‘climate’.

The outdoor ‘laboratory’ is not controlloed enough, instruments are too ‘blunt’ in their response, and observers too careless, for data to be useful, and no amount of patching and data wrangling can change that.

Data-shopping using Excel, as practiced by some and the use of naive timeseries methods – technical for“add trendline” to an excel plot, has confused themselves and many others into ‘believing’intrend that in reality is due to site changes, not the climate.Yours sincerely,

Dr. Bill Johnston

scientist@bomwatch.com.au

‘It’s an interesting article that should encourage some good conversation.’

Agreed. I thought it was interesting that no matter which method is used to test for significance, it’s very clear that a hefty proportion of the models are unfit for the purpose. Do you have any insight as to when Gavin will be advising the Administration to perhaps defund some of the worse performers and/or let up on the alarmism?

That should be “El Niño,” or at least “El Nino.”

Doh. That is embarrassing. You are, of course, correct.

It is just the new name for a transgender El Niño.

Wouldn’t that be Niñx?

Not. It is Ellos Ninos

The issue is *not* using 1/sqrt(N). The issue is what that tells you. It is *NOT* a measure of the accuracy of anything. It is only a measure of the interval in which the population average can lie. That does *NOT* tell you that the average you calculate is accurate at all.

Remember, the examples mentioned from Taylor are from Chapter 4.

Taylor specifically states in the intro to Chapter 4:

“We have seen that one of the best ways to asses the reliability of a measurement is to repeat it several times and examine the different values obtained. In this chapter and Chapter 5, I describe statistical methods for analyzing measurements in this way.

As noted before, not all types of experimental uncertainty can be assessed by statistical analysis based on repeated measurments. For this reason, uncertainties are classified into two groups: The

randomuncertainties, whichcanbe treated statistically, and the systematic uncertainties, whichcannot. This distinction is described in Section 4.1. Most of the remainder of this chapter is devoted to random uncertainties. ”In Section 4.1 Taylor states: The treatment of random erros is different from that of systematic errors. The statistical methods described in the following sections give a reliable estimate of random errors, and, as we shall see, provide a reliable procedure for reducing them. ….. The experienced scientist has to learn to anticipate the possible sources of systematic erro and to make sure that all systematic errors are much less than the required precision. Doing so will involve, for example, checking meters against standards and correcting them our buying better ones if necessary. Unfortunately, in the first-year physics laboratory, such checks are rarely possible, so treatment of systematic errors is often awkward.”

It seems to be an endemic meme in climate science, and that includes you bdgwx, that all measurement uncertainty is random, Gaussian, and cancels leaving the stated values as 100% accurate. Therefore the variation in the stated values can be assumed to be the uncertainty associated with the data. The problem is that when it comes to temperature records climate science is in the same boat as the first-year physics class Taylor mentions. Systematic error in the climate record *is* awkward to handle – but that is *NOT* a reason to ignore it.

Taylor discusses this in detail in Section 4.6. (when systematic bias exists)

“As a result, the standard deviation of the mean σ_kavg can be regarded as the random component ẟk_ran of the uncertainty ẟk but is certainly not the total uncertainty ẟk.”

If you have an estimate of the systematic bias component then you can add them either directly or in quadrature – let experience be your guide.

About adding in quadrature Taylor says: “The expression (4.26) for ẟk cannot really be rigorously justified. Nor is the significance of the answer clear; for example we probably cannot claim 68% confidence that the true answer lies in the range of kavg +/- ẟk. Nonetheless, the expression does at least provide a reasonable estimate of our total uncertainty, give that our apparatus has systematic uncertainties we could not eliminate.”

YOU keep on wanting to use the standard deviation of the mean calculation as the measure of uncertainty in the record. The typical climate science assumption that all error is random, is Gaussian, and cancels out.

Again, the issue is what 1/sqrt(N) tells you. And it does *NOT* tell you the accuracy of the mean you have calculated since systematic bias obviously exists in the temperature measurement devices and, according to all stated values for acceptable uncertainty of the devices, can range from +/- 0.3C for the newest measurement devices up to +/- 1.0C (or more) for older devices.

No where in any climate science study I have seen, and that includes those referenced here, provides for propagating measurement uncertainty onto the averages calculated from the stated values of the measurements. Doing so would mean that trying to identify differences in the hundredths digit is impossible due to the uncertainties inherent in the measurements.

I am just dismayed beyond belief when I see statisticians that know nothing of the real world trying to justify the standard deviation of the mean as the actual uncertainty in temperature measurement averages by assuming that all measurement uncertainty is random, Gaussian, and cancels. Even Possolo in TN1900, Example 2, had to assume that all measurement uncertainty cancelled or was insignificant in order to use 1/sqrt(N).

No one is ever going to convince me that all temperature measuring devices in use today have 100% accurate stated values for their measurement. Not even you bdgwx.

Example Tim.

Systematic error could be a maximum thermometer with a bubble or break; a scale with a loaded pan; or a measuring jug with sludge in the bottom.

Because they occur across sequential measurements they have the property of being non-random, and also that they are additive to the thing being measured. The simple solution is to detect and deduct their effect

in situusing unbiased methods, is it not?All the best

Bill Johnston

This is the mantra of the data mannipulators, who assume it is possible to determine and remove all “biases” from historic air temperature data.

It is not possible.

Bias errors are typically unknown and can also change over time.

They cannot be reduced by subtraction or by averaging.

They are best handled by estimating their bounds and constructing a corresponding uncertainty limit.

“Systematic error could be a maximum thermometer with a bubble or break; a scale with a loaded pan; or a measuring jug with sludge in the bottom.”

How do you detect these effects remotely? As Taylor states they are not amenable to identification using statistical means. Bevington and Possolo agree.

If you mean they should be detected *at the site* then how often are those stie inspections done, especially considering the global dispersion of the measurement devices?

Consider a station whose air intake is clogged by ice in late December. That ice may stay there for two months or more till air temps will be high enough to melt it. If the site inspection is done in the summer time, you’ll find nothing yet the winter temps for that period will contain a systematic bias.

I could give you numerous examples. Ant infestations, decaying organic matter, brown grass in winter vs green grass in summer, a nearby lake that is frozen in winter but not in summer, a nearby soybean/corn field whose evapotranspiration changes with season, and on and on.

These are all systematic biases that would be undetectable using statistical analysis – but they *are* real. And they can’t be eliminated by “homogenization” with stations up to 1200km away.

It’s why uncertainty intervals are important for the temperature measurements. I know al lot of climate scientists just assume that systematic biases will cancel just like random errors but that just isn’t a good assumption for an old mechanic/machinist/carpenter/physical-scientist like me. Similar electronics tend to have the same temperature coefficient (TC), e.g. many resistors, even SMD types, have the same TC, usually positive. That means that you will see a consistent systematic bias introduced – no cancellation across different stations of the same type. Heck, some resistors, esp ones using nichrome, have a non-linear TC.

It just seems that in climate science the overriding assumption is that measurement uncertainty *always* cancels leaving the stated values as 100% accurate. It’s why variance (or sd) represented by measurement uncertainty is *NEVER* propagated throughout the statistical analyses. First off, it is difficult to do and second, it would obviate the ability to identify differences in the hundredths digit! Instead, uncertainty is just assumed to be the standard deviation of the sample means even though that tells you nothing about the accuracy of the population mean you are trying to identify.

After giving yet another dose of measurement reality, all the trendologists can do is push the downvote button.

They well and truly can’t handle the truth.

Tim Gormon and karlomonte (below),

In the context of climate timeseries, Taylor is incorrect. For a start, if you

can’t measure systematic bias in data, or don’t look for it, you would not know it was there. Further,if you could measure ityou can also explain it, allow for it or correct for it. I don’t want to get bogged-down in semantics or foggy concepts. If you have some clear examples, preferably something you have published then why not use those to support your argument.To be clear Tim and karlomonte, with colleagues I undertook week-about, 9AM standard weather observations at a collaborative site for about a decade. So I know the routine and I know many of of the pitfalls.

As most people don’t know much about how data arrives at a data portal such as BoM’s climate data online, in situ in the context I am meaning is “as you find the data”.Exploratory data analysis is about quality assurance of data, and before embarking on any analysis it is an essential first step.An effective QA protocol includes descriptive statistics as well as analysis of inhomogeneties that could indicate systematic bias.Contrary to your claims, tests do exist and I invite you to read the downloadable reports available on http://www.bomwatch.com.au, where I have given example after example of an objective method for undertaking such analyses. The methodology is all there in methods case studies, backed-up by hundreds of individual analyses.

There are methods case studieshere: https://www.bomwatch.com.au/data-quality/part-1-methods-case-study-parafield-south-australia-2/; here: https://www.bomwatch.com.au/data-quality/part-1-methods-case-study-parafield-south-australia-2/; and here: https://www.bomwatch.com.au/data-quality/part-1-methods-case-study-parafield-south-australia-2/.Examples of detailed studiesare here: https://www.bomwatch.com.au/bureau-of-meterology/part-6-halls-creek-western-australia/; and here: https://www.bomwatch.com.au/bureau-of-meteorology/charleville-queensland/.If you think the methods are wrong, or want to contest what I am saying, I invite you to grab some daily data and analyse it and thereby demonstrate the things you talk about.Finally, as I think we have discussed before, measurement uncertainty (strictly 1/2 the interval scale) is constant and it does not propagate. For two values having known uncertainty, to be

different, their difference must exceed the sum of their uncertainties. Thus for a meteorological thermometer, where instrument uncertainty = 0.25 degC (rounds to 0.3), the difference between two values must be greater than 0.6 degC.Tim, your argument goes around and round when you talk a bout 1/100ths of a digit. Nothing is measured to 1/100ths of a unit. I think you are also confused about uncertainty of a mean (typically SD, but also 1.96*SE) and accuracy which relates to an observation as discussed above.

Yours sincerely,

Bill Johnston

I’ve looked at some of your papers at bomwatch, Bill. They do not mention field calibrations against an accurate temperature standard, such as a PRT in an aspirated screen.

Field calibrations against an accurate measurement sensor are the only way to detect the systematic error that attends naturally ventilated shields due to the environmental variables of irradiance and insufficient wind speed.

Published work from the19th century shows that meteorologists of that time were well-aware of this problem, but latterly (post-WWII) attention to this critical detail seems to have evaporated.

No statistical inter-comparison of regional sensors will detect weather-related systematic error in an air temperature record.

I have published several papers on the neglect of systematic error in constructing the global air temperature record, most recently here.

The field measurement uncertainty width of any naturally ventilated meteorological air temperature sensor is not better than 1σ = ±0.3 C, which means the anomaly record is not better than 1σ = ±0.42 C.

The 95% uncertainty in the anomaly record is then on order of ±0.8 C. There’s no getting away from it.

That’s strange because 1) type A evaluations of the anomaly record show significantly lower values and 2) if it were on the order of ±0.8 C then we wouldn’t be able to see natural variation like ENSO cycles in the record yet we clearly do and 3) we wouldn’t be able to predict it with any better skill than an RMSD of 0.4 C yet even a complete novice like me can do so with far better skill than that.

Thanks Pat,

Australian screens are not aspirated. PRT probes are lab calibrated. They are also field checked at least once/year and the Bureau has a range of cross checks they apply to assess inter-site bias and drift. They also employ a local error-trapping routine. Details are contained in a range of publications on their website. Although I have worked with PRT probes, I have not focused on probes per se on BomWatch.

Measurement uncertainty (the uncertainty of an observation) is 1/2 the index interval, which for a Celsius met-thermometer is be 0.25 degC, rounds up to 0.3 degC. I don’t know how that translates in anomaly units, neither do I see how 1σ applies to an individual datapoint.

There is a need I think to write a note about metrology that sorts out and clarifies issues surrounding measurements.

All the best,

Bill Johnston

Bill,

I’m not sure who you are trying to snow. I looked at the first case study you listed. No where in there was there a study done to actually identify any systematic bias due to measurement devices or microclimate. If the rest of them are the same then there is no use even looking at them.

Hubbard and Lin showed in 2002 that measurement stations corrections *must* be done on a station-by-station basis. Regional adjustments, like those based on homogenization, only serve to spread systematic biases around to other stations.

“Tim, your argument goes around and round when you talk a bout 1/100ths of a digit. Nothing is measured to 1/100ths of a unit. I think you are also confused about uncertainty of a mean (typically SD, but also 1.96*SE) and accuracy which relates to an observation as discussed above.”

Unfreakingbelievable. What do you think I have been trying to get across in all these long threads? How close you are to the population mean is *NOT* the accuracy of that mean. The uncertainties of the measuring stations *must* be propagated onto the population mean. That propagated uncertainty is *NOT* the variation of the stated value of the measurements either.

Systematic uncertainty is FAR more than the resolution of the measuring device. Calibration drift is a fact of life and no amount of statistical analysis can identify it. Learn it, love it. live it.

Or if you are only concerned with the change in temperature you don’t need to detect and deduct it since that happens via algebra. That, of course, applies only if the bias is truly systematic and not random and remains unchanged between measurements.

“— the main point about bias error that you refuse to acknowledge is that there is no way for you to know if this is true or not.not random and remains unchanged between measurements”Systematic bias is always there in measurement devices. Period. Anomalies do not get rid of it. You do not know what it is and no amount of statistical analysis can tell you what it is.

Anomalies carry the systematic biases with them. If you use them with measurements from other stations, or even with measurements from the same station, you simply do not know if your anomalies are accurate or not.

Systematic bias CAN change between measurements, even using the same device. As a statistician I’m sure you are totally unaware of the impacts hysteresis can have in the real world. A device can read differently when the temperature is falling than it does when it is rising. It’s just a plain fact of real world devices. Do you even have a clue as to what snap-type micrometers are for?

Someday you really should join us here in the real world and leave your blackboard world behind.

The whole landscape experiences hysteresis, it is measurable and therefore can be quantified using data. However, hysteresis is not a property of data per se, but a property of the ‘thing’ being measured.

b.

This is nonsense.

It is also the property of the device doing the measuring!

You can *only* measure hysteresis through calibration procedures.

You say: Or if you are only concerned with the change in temperature you don’t need to detect and deduct it since that happens via algebra.

Tell us what algebra. Systematic bias is NOT factors affecting the background ambience of a site, which for a good unchanging site should be constant.

Cheers,

Bill

Let T = Mi + Ei where T is the measurand, Mi is a measurement of T, and Ei is a random variable effecting all measurements Mi that cause it to be different from T. If we then add a systematic bias B to all measurements we have T = Mi + Ei + B. Then if we want to know the difference between two measurands Ta and Tb we have ΔTab = Ta – Tb = (Mai + Eai + B) – (Mbi + Ebi + B). Simplifying this equation via standard algebra we have ΔTab = (Mai – Mbi) + (Eai – Ebi). Notice that the systematic bias B cancels out. This leaves us with ΔTab = Ma – Mb ± √(Ea^2 + Eb^2).

I know. Or at least I wasn’t assuming that was the case here. I assumed you were talking about a bias in the measurement only. Though, in reality, it doesn’t really matter the source of the systematic bias. If it can be expressed as B then it cancels regardless of how it arises. The more interesting scenario is when there is a systematic bias that changes over time. But that’s a more complex scenario.

“The more interesting scenario is when there is a systematic bias that changes over time. But that’s a more complex scenario.”Only if you can credibly claim that such a bias trend has a significant impact on the trends under discussion. We have yet to see any such claim, with reality based documentation,

Are you aware of the Temperature Coefficient of electronic devices?

The TC can be either negative or positive depending on the exact components used and how they are used.

If *every* identical temperature measurement device has a similar systematic bias due to calibration drift then how can that *NOT* affect the trend? The trend will be made of of the *real* change in the measurand PLUS the change in the calibration of the measuring device.

Did you *really* think about this before posting?

Please show me where your claim results in the referenced “significant impact on the trends under discussion”. I.e., global temperature and sea level trends over physically/statistically significant time periods w.r.t. ACC. This is what you continuously fail to demonstrate.

Please read completely, slowly, for comprehension…

From the main post:

“Now let’s look at the dispute on how to compute thestatistical error of the mean warmingfrom 1980-1990 to 2011-2021 between Scafetta and Schmidt.”These methods are useless for dealing with bias, Type B errors. Climate astrology assumes that subtracting a baseline (the anomaly calculation) removes all bias error so they can safely ignore it.

But this assumption is quite false.

Another in the never ending series of posts that claims – with no proof – that there’s a Big Foot trail of systemic measurement/evaluation errors that qualitatively influence global average temperature and sea level trends over physically/statistically significant periods w.r.t. ACC.

Will you remain Dan Kahan System 2 hard wired to the fallacy that the world is upside down, and that you have (internally inconsistent) alt.statistical theories that superterranea denies? Because, after all, they are all part of a Dr. Evil conspiracy against you!!

That you can’t grasp what I wrote is yet another indication that you have zero real metrology experience, so you resort to fanning from the hip, hoping to land a round on those who expose your abject ignorance.

So, once again, no proof or even evidence. Therefore, you deflect to a fact free authority appeal to your supposed – but undemonstrated – “real metrology experience”.

I really doubt there is any “proof” that would convince blob that he is completely wrong.

And I can push the downvote button too, blob! It’s fun!

Ah, the predictable, convenient excuse for failing to provide any such proof. And a snowflake, pearl clutch whine about down votes in the bargain….

What exactly are you asking for? A list of all possible bias errors?

As Rumsfield said, there are known knowns, known unknowns, and unknown unknowns (or something like that).

BoB wants us to list out all the unknown unknowns.

It’s why there is an *interval* associated with uncertainty. You try to include all the possible unknown unknowns. BoB and climate science wants to just assume they are all ZERO’S.

“BoB wants us to list out all the unknown unknowns.”

No, I acknowledge both systematic and random sources of error. I merely remind you that the chance of systematic errors both sufficiently large and lining up just right over statistically/physically significant time periods to significantly change any of the most likely trends is TSTM.

Again, you’re counting on your chimps to accidentally type the Encyclopedia Britannica. Might happen soon. Probably won’t….

Ok blob, here’s your chance to shine:

Using the manufacturer’s error specification, construct a standard uncertainty interval for the Fluke 8808A Digital Multimeter.

Wayback a few weeks, and ponder. We’ve already done that. You conveniently forgot where it landed…

No evidence, blob? hahahahahahah

Hypocrite.

That you have not the first clue about how to proceed is noted.

And who are “we”?

“that there’s a Big Foot trail of systemic measurement/evaluation errors that qualitatively influence global average temperature and sea level trends over physically/statistically significant periods w.r.t. ACC.”

What do you think the measurement uncertainty of the ARGO floats being +/- 0.5C consists of? Purely random error?

No, which is why it has been accounted for.

“

– with no proof –“Says the guy who clearly has never done the research.

From only the most recent of field calibrations: Yamamoto, et al (2017) “

Machine Learning-Based Calibration of Low-Cost Air Temperature Sensors Using Environmental Data”Abstract, first two sentences: “

The measurement of air temperature is strongly influenced by environmental factorssuch as solar radiation, humidity, wind speed and rainfall. This is problematic in low-cost air temperature sensors,which lacka radiation shield ora forced aspiration system, exposing them to direct sunlight and condensation.”(my bold)The negative impact of environmental variables on the accuracy naturally ventilated air temperature sensors has been known since the field calibrations of the 19th century.

No proof indeed. Your dismissive insouciance disrespects past scientists and the hard careful work they did.

First off, per your linked paper. all errors that both tend to be random, and can be reduced. Second, you are

stilldefecting from demonstrating any such steadily changing sources of systemic error that woulda coulda result in significant change in either temperature or sea level trends over physically/statistically significant time periods.You are essentially making the Chimps Who When Left Alone With Typewriters Will Eventually Type The Encyclopedia Brtiannica argument. Yes, the chances are greater than zero. Just not much….

Another fine word salad, blob, totally sans any meaning.

You get a green!

What kind of malarky is this?

Do you think that all the water impoundments the Corp of Engineers have built over the years didn’t have an impact on the temperature measurements at locations surrounding them? Over physically/statistically significant time periods?

Jeesssh, you *really* need to think things through before spouting nonsense.

Evidently dyslexia is a critical part of your armamentarium, bob.

When you add (subtract) random variables the variances add they don’t cancel! Why is this so hard to understand?

Anyone that has ever built a staircase should have a basic understanding of this!

I can’t grok where they are, at all. Makes no sense to me.

“We have yet to see any such claim”

Aitken, 1884.

Lin, 2005

Two among many, since the 19th century.

The first link errored out. The second link is paywalled, but the abstract ends up in predictable coulda’ woulda.

“The debiasing model could be used for the integration of

the historical temperature data in the MMTS era and the current US CRN temperature data and it also could be useful for achieving a future homogeneous climate time series.”

Maybe if you tried

harder.After all, this is just “Two among many…Apologies for the bad link, Cambridge apparently changed it. Try this.

From Aitken page 681: “

It is only within the last few days that I have been in possession of a Stevenson screen, and been able to make comparative trials with it. During these tests its action was compared with the fan apparatus, and readings were taken at the same time of the thermometers in the draught tube screens. These latter, as already stated, had an average error of about 0.6° too high. The smallest error recorded in more than thirty readings of the thermometer in the Stevenson screen was l.3°, and it only fell to that on two occasions. The excess error was generally more than 2°, and was as high as 2.8 on two occasions.”For Lin, 2005, you ignored this part of the Abstract:

“The results indicate thatthe MMTS shield bias can be seriously elevated by the snow surface and the daytime MMTS shield bias can additively increase by about 1 °Cwhen the surface is snow covered compared with a non-snow-covered surface.”Naturally ventilated MMTS sensors dominated at US stations after 1990.

The debiasing method is real-time filtering. It provides the only way to remove most of the systematic error produced by unventilated screens, and is a technique in use

at no meteorological stations.Once again, you carelessly dismiss.

You’ve got no clue that you’ve got no clue, bob.

Bob and bdgwx are both the same. They think systematic error ALWAYS CANCELS.

It seems to be some kind of a plague among those in climate scientists and statisticians.

All uncertainty cancels, always. So it can always be ignored.

Is that why none of the statistics textbooks I own have any examples using “stated value +/- uncertainty” to teach the students how to handle uncertainty?

I don’t think that.

The way I figure it, Tim, is that if standard scientific rigor was applied, none of them would have anything to talk about.

Hence the refractory ignorance.

Jeez guy, just how do you conclude that the temperatures are subtracted? What if they are added like for calculating an average?

You need to show a reference that “B” cancels out when averaging measurements. That isn’t even logical regardless what your made up algebra example proves.

If I use a micrometer with a bent frame, each and every measurement will carry that error and you can not recognize that error by using algebra or statistics. Calibration is the only way observe systematic error and correct it.

“Let T = Mi + Ei “

ROFL!!

It is T = Mi

±EiPlease note carefully the

±It then progresses to T = Mi± (Ei ± B)What this results in is that the uncertainty must be evaluated on its own, not as part of Mi. The possible range of T is *expanded* as you add elements, they do *NOT* cancel!

First, that model comes from NIST TN 1900. Second, ± isn’t a mathematical operation so what you wrote is nonsensical.

So + is not a mathematical operator? So – is not a mathematical operator?

The use of

±is just a short hand for not having to write it out:T = Mi – Ei to T = Mi + Ei

I am not surprised that you can’t make sense of this. in TN1900 Ei can be either negative OR positive, Possolo just doesn’t state that.

You can have Mi + (-Ei) or Mi + (+Ei)

Why you would think that all error terms are positive is just beyond me. It goes hand in hand with your lack of knowledge of metrology!

+ and – are mathematical operators.

ALGEBRA MISTAKE #24: ± is not a mathematical operator.

Like NIST 1900 Ei is the random variable representing the difference between the measurement M and the true value T. It already takes on both positive and negative negative numbers. There’s no need for two equation.

That’s what I said!

I don’t think that.

ALGEBRA MISTAKE #25: A plus (+) operator in an equation does not imply that the rhs operand is positive.

Clown.

KM says you are a clown. He’s right!

“ALGEBRA MISTAKE #24: ± is not a mathematical operator.”

I told you: ± is short hand for X + e and X-e. You just write it as X ± e. Shorter, sweeter, and just takes up less space in the text!

“Like NIST 1900 Ei is the random variable representing the difference between the measurement M and the true value T. It already takes on both positive and negative negative numbers. There’s no need for two equation.”

If that difference can be either positive or negative then why do you always write it as positive and claim it always cancels?

You *should* write your equation as T + (+e, -e) to make clear what is going on. Then you wouldn’t make the mistake of thinking e always cancels when subtracting two T values!

If T1 is X1-e and T2 is X2 + e and you subtract T1 from T2 you get:

X1 -e1 -X2 -e2. ==>< X1 – X2 – 2e. NO CANCELLATION!

It appears that *YOU* are the one that can’t do simple algebra.

“ALGEBRA MISTAKE #25: A plus (+) operator in an equation does not imply that the rhs operand is positive.”

That’s the only possible conclusion when you claim the errors cancel! You *have* to be considering that both errors are either positive or negative. If one can be positive and one negative then you don’t get cancellation!

You are hoist on your own petard!

You can defend and rationalize your position until you are blue in the face. ± still isn’t a mathematical operator and the rhs operand of the plus (+) operator is not assumed to always be positive. That’s not debatable. And no, I’m not going to start using some made up symbology to write algebra equations. The established standard developed over the last hundred years or so is perfectly fine for the job as-is.

Here is what yer fav reference has to say (you know, one of the bits you skipped over):

7.2.4When the measure of uncertainty isU, it is preferable, for maximum clarity, to state the numerical result of the measurement as in the following example. […]“

m_S= (100,021 47 ± 0,000 79) g, where the number following the symbol ± is the numerical value of (an expanded uncertainty)U=ku_c[…].”Looks like you need to contact the BIPM PDQ and tell them they don’t know algebra.

Get cracking…

I have no problem with the BIPM or the statement of uncertainty using the ± symbol. I do it all of the time myself.

What I have problem with is someone 1) using ± as an operator in an algebra equation and 2) assuming that the rhs operand in a + operation is always positive and 3) inventing a new paradigm for the expression of algebra.

Nutter.

When you assume that “e” cancels in an anomaly you *ARE* assuming that the operand is always positive ( or negative).

(A + e) – (B + e) = (A-B) + (e – e).

This is YOUR math. And (e-e) only cancels if they are equal and of of the same sign.

You can whine about the fact you didn’t specify if “e” is only positive (or negative) but that is the only way your math works.

I’m still waiting for an explanation as to how an error that is sometimes positive and sometimes negative can also be systematic.

Duh!

For one, because you don’t how it changes over time!

Why is this so hard to grasp?

Because if an error changes over time it is not a systematic error

Uncontrolled environmental variables. Systematic, deterministic, may vary through either sign, not random.

Because it is an INTERVAL containing a value that is unknown and therefore uncertain!

Look at a normal distribution. A standard deviation tells you that 68% of the values of that distribution lay within that interval, some plus, some minus. An normal distribution’s standard deviation has values of μ + ε and μ – ε, and ±ε defines the interval.

Uncertainty can define that interval that surrounds the measurement, but you do not know, AND CAN NOT KNOW, where within that interval the actual value truly is.

You are searching for a method of showing that you can eliminate uncertainty in measurements. Eliminating uncertainty can not be done. Even computing via RSS has a basic assumption that all measurements have a normal distribution.

What if they aren’t normal but one is skewed to the minus side and the other to the positive side? Do they cancel as they would using RSS? You end up with results that are incorrect and you can’t even identify why!

You want to convince everyone that uncertainty is reduced beyond an RSS value thru dividing by “n”! Show a general mathematic proof for that.

Let’s do a standard quality assurance problem. My assembly line creates 1000 widgets a day. Each widget is designed to weigh 5 units and is made up of two materials of equal weight with an uncertainty in each of 0.5 units. What should I quote the uncertainty of the average weight to be for any two widgets?

“

Because it is an INTERVAL containing a value that is unknown and therefore uncertain!”That describes random uncertainty.

“

Uncertainty can define that interval that surrounds the measurement, but you do not know, AND CAN NOT KNOW, where within that interval the actual value truly is.”Again, what’s the difference between random and systematic uncertainty? If you are defining a systematic error in terms of a probability distribution (that is probability in terms of a personal belief) the error is assumed to be the same for each measurement, as opposed to a random uncertainty where the value will be different each time.

“

You are searching for a method of showing that you can eliminate uncertainty in measurements.”I absolutely am not. The only certainty is there will always be uncertainties.

Can’t see what the resto of your comment has to do with systematic errors.

Because you have no clues about the subject.

Are you joking 😭! You just sealed the fact that you know nothing of physical measurements.

What experience do you have using sophisticated measuring equipment?

Why do you think there are calibration intervals required for using devices?

You would know devices drift both up and down over time. Some do go the same direction over time but you NEVER know.

Read E.3 in the GUM again and try to understand there is no difference between the treatment of Type A and Type B. The intervals are all based on variance and standard deviations.

^^^

+100

“

You would know devices drift both up and down over time.”Yes, though I doubt that’s the issue here. What I think Frank wants to say is that there are systematic errors that will change over time due to environmental factors. I don’t dispute that, and I think looking at systematic biases that change over time is a useful thing to asses.

But it doesn’t have anything to do with claiming the uncertainty of the global anomaly as very large due to all the errors for all instruments being treated as systematic, and then turning round and saying the error can change randomly from year to year.

Just quite while you are behind, this is not working for you.

Not quite yet. I’m quite enjoying it.

Oh look, a spelling lame. How lame.

No Tim. That is not my math. That is your math. You and you alone wrote.

Here is my math using the notation Pat Frank preferred from Vasquez.

M_k = T + β + ε_k

M_k is a measurement. T is the true value of the measurand. β is the bias or systematic error which is “fixed” and “a constant background for all the observable events” according to Vasquez. ε_k is the random error which is different for each of the k measurements. Note that this is similar to the model Possolo used in NIST TN 1900 except here we have added a systematic bias term. Now consider two measurements: a and b. The difference M_b – M_a is then M_b – M_a = (T + β + ε_a) – (T + β + ε_b) = ε_a – ε_b. Notice that β is eliminated from the equation via trivial algebra.

And like told Pat this can actually be proven with the law of propagation of uncertainty using GUM (16) and the model Vasquez suggested in (1) which is u_k = β + ε_k. Because the correlation term in GUM (16) is signed via the product of the partial derivatives it is necessarily the case that it is negative for a function involving a subtraction. And because β is the same for each measurement then the correlation matrix r(xi, xj) > 0 with a magnitude proportional to the effect β has on the measurements thus leading to the cancellation of the β component of uncertainty. I encourage you to prove this out for yourself using the NIST uncertainty machine.

Garbage-In-Garbage-Out…

From TN 1900.

“””””The equation, tᵢ = τ + εᵢ, that links the data to the measurand, together with the assumptions made about the quantities that figure in it, is the observation equation. The measurand τ is a parameter (the mean in this case) of the probability distribution being entertained for the observations. “””””

In TN 1900

“tᵢ” is not a true value to start with. A true value comes from establishing a normal distribution by measuring THE SAME THING, MULTIPLE TIMES, WITH THE SAME DEVICE.

“tᵢ” is a mean of measurements of different things under repeatable conditions. It is an expanded EXPERIMENTAL determination of a mean and standard deviation of what values may be expected at that location, with that device, and that time of year.

εᵢ is not a measurement error term. εᵢ is an interval describing the spread of the experimental observed data over time.

TN 1900 specifically says:

“””””Assuming that the calibration uncertainty is negligible by comparison with the other uncertainty components, and that no other significant sources of uncertainty are in play,”””””

That is, measurement ERROR plays no part in the analysis.

You have really gone off the tracks and overturned!

The use of “±” is to designate an interval surrounding a mean, mode, or median in a distribution. It is convenient because it defines a given interval of values in a normal distribution and because the square root of variance is a ± value!

There are two unique numbers defined by each. They must remain connected to define those numbers.

A ± does signify a symmetrical interval of (μ – ε) (μ + ε). It is not designed to be an algebraic operator.

You may subtract the two defined values, just as you do to find the distance between two points on a number line.

Where (μ + ε) > (μ – ε)

D = |(μ + ε) – (μ – ε)| = 2ε

The total size of the interval!

I know.

I’m glad you and I agree. Now can you convince TG of that for me? He won’t listen to me.

Here is what TG said.

“””””X1 -e1 -X2 -e2. ==>< X1 – X2 – 2e. NO CANCELLATION"""""

It is exactly correct! There is no cancelation. The intervals size is always 2ε.

“

I told you: ± is short hand for X + e and X-e. You just write it as X ± e.”Is this were Karlo gets all his ± is illegal nonsense from?

That isn’t what ± means. At least not in this context. It’s simply expressing an interval.

https://wattsupwiththat.com/2022/12/09/plus-or-minus-isnt-a-question/

It does mean that in, say the quadratic formula, where the squarevroot can be both positive and negative.

“

If that difference can be either positive or negative then why do you always write it as positive and claim it always cancels?”When writing an error term, the error can be either positive or negative so you only need to write +. if the error is positive you are adding a positive number, if it’s negative you are adding a negative number.

It’s the fact it can be positive or negative that results in cancellation.

“

If one can be positive and one negative then you don’t get cancellation!”If you are talking about systematic errors, then they both have to have the same sign or they wouldn’t be systematic.

“

a systematic bias that changes over time.”It’s called weather.

It could be depending on what you want to include in your uncertainty budget. I was thinking more of the biases caused changing the time-of-observation, changing the station location, changing the station instrumentation, instrument aging, etc.

Yep. The grass turns brown in the winter when the weather gets cold. The grass turns green in the spring when it warms up. The grass turns brown again during the summer heat. The grass turns back green in the fall when the temperature goes down.

All of that is systematic bias in the temperature readings because of changes in the microclimate at the measurement site.

I could go on – like trees loosing leaves changing the wind impacting the stations, etc. None of it is random. The trees don’t just drop their leaves on a whim!

“which for a good unchanging site should be constant”This is the nonsense — you have no basis on which to make this assumption. Bias errors are unknown, and change with time. All you can do is estimate them.

I simply can’t tell you how dismayed I am to see people in climate science saying that calibration drift doesn’t impact the trend identified by the measurement device and that they think a field measurement device can be “unchanging”.

Calibration drift is one of major reasons to point to DMM uncertainties, if you study the error specs it is clear that it is a very real factor. And at the core of all modern temperature measurements is digital voltmeter.

The GUM makes an interesting statement about Type B uncertainties to 4.3.7:

But this really doesn’t you much at all. What it is really saying is that it has to be studied and evaluated on a case-by-case basis, there is no one-size-fits-all.

I see to recall another statement in the GUM that I can’t find ATM which goes on to say that if a standard uncertainty is dominated by Type B factors, you need to do more work to get rid of them.

—————————————————————–

From the International Vocabulary of Metrology:

calibration:

operation performed on a measuring instrument or a measuring system that, under specified conditions 1.

establishes a relation between the values with measurement uncertaintiesprovided by measurement standards and corresponding indications with associated measurement uncertainties and 2. uses this information to establish a relation for obtaining a measurement result from an indication (bolding mine, tpg)————————————————————-

Even a lab calibration should provide a measurement uncertainty statement.

——————————————————————–

From SOP 1 Recommended Standard Operating Procedure for Calibration Certificate Preparation:

2.10 A statement of the measurement uncertainty, and corresponding measurement unit, coverage factor, and estimated confidence interval shall accompany the measurement result.

———————————————————–

Even lab calibration winds up with an uncertainty factor so that measurements taken immediately after calibration will have an uncertainty interval. Calibration problems just proceed from that point!

Site ambience is exactly as variable as the weather.

“

since that happens via algebra”The religious myth that all systematic error is a constant offset; widespread in climate so-called science and fiercely there embraced.

I didn’t say that all systematic error is constant. What I said is that when systematic error is constant it cancels via standard algebriac steps when converting to anomalies. I’ve also said repeatedly that it is the time-varying systematic error is among the most troublesome problems in the quantification of the global average temperature change. I’d love to discuss it. The problem is that not many people can make it past the trivial idealized case of a constant systematic bias B. And as I always say…if you can’t understand the trivial idealized case then you won’t have any better luck with the vastly more complex real world.

It does *NOT* cancel. I showed you that it doesn’t cancel. It *is* simple algebra.

If T1 = X1 +/- e and T2 = X2 +/- e then the difference can range from

T1 – T2 = X1 – X2 + 2e or X1 – X2 – 2e.

NO CANCELLATION. Your algebra assumes all systematic bias is constant in the same amount and in the same direction!

After all this blather and whining about plus/minus signs, he forgot they are used about a hundred places in the GUM!

That’s not my algebra. Don’t expect me to defend it.

It *is* your algebra when you make the assertion that systematic bias always cancels in an anomaly!

The only way for that to happen is if the “e” term for both are the same and in the same direction. You have no justification for assuming that.

No Tim. This..

If T1 = X1 +/- e and T2 = X2 +/- e then the difference can range fromT1 – T2 = X1 – X2 + 2e or X1 – X2 – 2e.…is not my algebra. It is your algebra. You and you alone wrote it.

I never used “e” in my algebra.

Just freaking great!

You used “Ei” instead of “e”.

ROFL!!!! Your algebra still sinks.

Are you still asserting systematic uncertainty *ALWAYS* cancels in an anomaly?

He keeping looking for loopholes to justify his tiny numbers, but doesn’t grasp Lesson One about the subject. There is a reason it is called Uncertainty, it deals with what isn’t known. But he thinks he does know it!

The distinction is important because the i signifies that it is a random variable. And just like Possolo did in NIST TN 1900 it is the random effect. It’s not even the bias I was speaking of. You literally switched the entire meaning “e” and created a completely different scenario to build an absurd strawman that you and you alone created.

It’s not my algebra. It’s the established standard. Specifically ± is not a valid operator. And the valid operators + and – make no assumptions about the sign of the right hand side operator. You can challenge basic algebra and make up as many absurd strawman as you want and ± still won’t be a valid operator and the valid + still doesn’t make assumptions about the sign of the rhs operand.

No. I’m not. And I never have. What I’m saying is that when the systematic uncertainty B is fixed and constant for both components of the anomaly then it cancels. Notice what I didn’t say. I didn’t say systematic uncertainty always cancels independent of the measurement model y. I didn’t even say it always cancels for the measurement model y = f(a, b) = a – b. It obviously won’t if it is different such that there is a Ba and Bb terms.

You forgot the link to the NIST Uncertainty Machine.

HTH

And of course, you are the world’s foremost expert on measurement uncertainty, surpassing even the Great Nitpick Nick Stokes.

“

when systematic error is constant”Yes, and when pigs fly they won’t need to be trucked to market.

Pat Frank said: Yes, and when pigs fly they won’t need to be trucked to market.

And yet this is what you are effectively doing with your calculations. Your calculations are accidentally equivalent to an assumption of correlation of r(xi, xj) = 1 using GUM 16. I say accidentally here because the use of Bevington 4.22 at all is incorrect. It just happens to produce the same result as Bevington 3.13 with full correlation. A correlation of 1 between inputs is statement that their errors are exactly the same (ie the bias is constant and the same for all stations).

A farm animal reference!

You really aren’t just a blackboard scientist, are you?

Now you are just posting lies and propaganda.

My guess is that they have not a clue as to what temperature coefficients of electronic parts is.

Of course not, which are typically given as ±X ppm!

Yeah. That’s a good one. I’ve used it before. The NIST uncertainty machine is really good too. One advantage of the NIST tool is that it works with non-linear measurement models since it does a monte carlo simulation in addition to the GUM method.

That is a good reference. JCGM 100:2008 is also good and more comprehensive. There are other equally good references as well. They are all rooted in the law of propagation of uncertainty which uses the partial derivative technique.

Since I am reading some older papers right now, I thought this quote from Ramanthan et al 1987 might show where “the science” comes from..

Although they talk about equilibrium(!) surface warming, we are pretty close to 2030, but not to 1.5K. I think it is funny.

“

For the 180-year period from 1850 to 2030”Written in 1987as if 2030 were known … and people seem okay with it.

When I first saw fig2, looking at the pink vertical bar I said to myself “Schmidt has assumed a constant mean”

Which directly contradicts the entire notion of ECS. And this contradiction proves Schmidt’s method is wrong.

You cannot have constant mean if you have increasing CO2, unless ECS=0.

That contradiction occurred to me also, his whole approach makes absolutely no sense.

It makes sense, when you understand Schmidt’s purpose to be ‘pimping CO2 as the bringer of climate doom,’ no matter how many times reality shows that notion to be nonsense.

If you are dealing with data that are not stationary, the mean and standard deviation will drift over time. That means it will exhibit a trend. One can calculate a mean for a data set that has a trend, but then the question should be asked, “Of what utility is it, and what does the standard deviation tell us?” To get a sense of the ‘natural variation’ one should probably de-trend the data.

Agreed, or use the residuals of lag 1, meaning subtract the previous values from all values and compute the residuals to remove some or all the autocorrelation.

It would be difficult to learn with a million people checking your work. Then again, he volunteered.

“When I first saw fig2, looking at the pink vertical bar I said to myself “Schmidt has assumed a constant mean””Well, you didn’t understand the figure. Fig 2 plots the difference between averages for two different decades. ERA5 returns just one figure for that, with the uncertainty shown. There is no progression in time.

Sorry Nick, the method Schmidt uses only applies to repeated measurements of the same quantity with the same instrument, see the Taylor reference in the post. The average annual global surface temperature changes each year and so does the measurement error. The entire climate state changes each year.

This makes no sense. Firstly, that isn’t the method Schmidt used. Secondly, his diagram described a single fixed number, the difference between the 1980-1990 mean and the 2011-2021 mean, per ERA5. It doesn’t vary over years.

But the real absurdity is that you keep saying that Schmidt underestimated the error, at 0.1°C, but applaud Scafetta’s estimate of 0.01°C, for all years.

I didn’t say that. I said that Schmidt used the standard deviation of ERA5 over the decade in question to compute the error (0.1). This will include natural variability (both El Ninos and La Ninas) to compare ECS with observations.

Wrong!ECS is computed assuming that natural variability is zero. It is only valid to include measurement error, not natural variability.“ECS is computed assuming that natural variability is zero.”That’s just not true. But it’s also irrelevant to the task of measuring a decadal average.

What is happening is that you have GCM trajectories which diverge from ERA5. The statistical test requirement is to rule out any possibility that this is due to something other than wrong model. So you need to include all the ways that ERA5 might be legitimately different from what is quoted. That includes measurement error, sampling error, and weather variations, eg ENSO. If none of those can account for the difference, the difference is significant and the model is probably wrong.

Of course, it is true, see attached, from page 961, AR6.

That image is not showing or describing neither the ECS nor the natural variation. What it is showing the contribution from various forcing agents to the global average temperature. You might be able to make some inferences about ECS from this graph, but I’m not sure how reliable they will be.

Look closer at the graph. The bottom panel is labeled “natural” and it is zero.

That is natural

. We’re not talking about forcing here. We’re talking about variation. Remember variation is caused by things that effect the ebb and flow of energy as it is moved around the climate system. Forcing is caused by things that perturb the energy balance of the climate system.forcingThese things are unnatural?? This word salad is total nonsense.

Well said. I think these two sentences explain the intent of Schmidt better than anything I’ve said so far.

Speaking of Gavin Schmidt, GISTEMP’s Land Ocean Temperature Index (LOTI) for March just came out. A comparison to February’s LOTI, all of the changes since 1974 were positive and most prior to 1974 were negative. All totaled there were 345 changes made to the 1718 monthly entries since 1880. This goes on month after month, year after year it’s a steady drone.

There could have been 1718 changes this month. In fact, there likely was. I think you’re only seeing a subset because that file is limited to two decimal places. If you change the gio.py and then run the GISTEMP code yourself you can output the same file with more digits and you’ll likely see more changes. Remember, all it takes it for a single observation to get uploaded to the GHCN or ERSST repositories between 1951-1980 for all 1718 monthly values to change.

Meaningless round-off errors? What does the precision of the input temperatures say about the valid number of significant figures that should be retained in the final answer?

Typically yes. That’s why the fewer digits shown the less likely you are to notice a change.

Steve,

You just need to get with the program and understand that the number of coincidences in mainstream climate science is nearly infinite. Data tampering to obliterate unseemly cooling trends is one example. Incorrectly calculating error bars so that at least a few of the crummy models appear plausible is another.

I’ll start off with this.

https://www.ncei.noaa.gov/pub/data/uscrn/documentation/program/NDST_Rounding_Advice.pdf

“”””Multi-month averages, such as seasonal, annual, or long term averages, should also avoid rounding intermediate values before completing a final average calculation:

Seasonal MeanT = (Seasonal MaxT.xxx… + Seasonal MinT.xxx…)/2 = XX.x

Annual MeanT = (Annual MaxT.xxx… + Annual MinT.xxx…)/2 = XX.x For final form, also round Seasonal or Annual MaxT and MinT to one place: XX.x”””””

The adage I learned and it in almost all college lab manuals was the averages could not exceed the precision of actual measurement. IOW, you could not add decimal places through mathematical calculations.

NOAA lists CRN accuracy as ±3°C. NOAA shows one decimal as the final form. Somehow claiming 0.01 uncertainty just looks like mathburtation.

There are other issues we can address. Variance is one. Both monthly averages and baseline averages are random variables with a distribution with variances. When subtracting random variables to obtain an anomaly the variances add. If those variances exceed the “measurement uncertainty” then again, the measurement uncertainty is meaningless.

Last for now, the SDOM/SEM must be calculated from a sample mean that is normal. Have any of the involved data distributions been checked for this? Skewed distributions can really hose stuff up.

All the series are heavily autocorrelated, they are not normally distributed. This is probably the most important reason why Schmidt’s technique is invalid. Using the standard deviation as a stand in for error only works with a very small variance that is due to measurement errors. When nature is involved, as it is here, it is an inappropriate way to measure error/uncertainty. You need to back out the natural part. This can be done by detrending the data or simply lagging it by one sample and working with the residuals. What Schmidt did was break all the rules.

This post can supply more details:

https://andymaypetrophysicist.com/2021/11/13/autocorrelation-in-co2-and-temperature-time-series/

“

All the series are heavily autocorrelated, they are not normally distributed. This is probably the most important reason why Schmidt’s technique is invalid.”I’m not sure I see the logic there. Usually you expect auto-correlation to increase uncertainty.

As I said below, it seems this is simply a difference between only looking at measurement uncertainty verses looking at year to year variance.

“I’m not sure I see the logic there. Usually you expect auto-correlation to increase uncertainty.”Indeed there is no logic. Andy is reeling off all the stuff usually put to claim uncertainty is underestimated, and then saying that Schmidt has overestimated.

Bellman,

Quite the opposite. Autocorrelation decreases the uncertainty because the previous value in the time series determines most of the next value. It does inflate the R^2 and the other statistical metrics and they have to be corrected for the degree of autocorrelation. But the next value has more certainty than if all the y values were independent of one another.

Schmidt’s error value requires that each measurement be independent, but of the same quantity and with the same tool. It cannot be a constantly changing quantity like the global average surface temperature.

“Schmidt’s error value requires that each measurement be independent”You still don’t seem to have any idea where Gavin’s error value came from. It’s nothing like that. He simply reasoned that ERA5, being derived from basically the same data, should have the same uncertainty of the global mean as GISS and other measures.

And that is not derived rom measurement uncertainty, which is indeed small. Scafetta’s figure 0.01C may even be about right for that

(do you still commend it?). But the main source of error in the global average is spatial sampling. You have measures at a finite number of places; what if they had been somewhere else?“Autocorrelation decreases the uncertainty “Autocorrelation increases the uncertainty of the mean. It does so because, to the extent that each reading is predictable, it offers less new information.

“And that is not derived rom measurement uncertainty, which is indeed small. “

Upon what foundation do you make this claim? Even the Argo floats were assessed at having an uncertainty interval of +/- 0.5C.

The measurement uncertainty is *NOT* the standard deviation of the sample means. Measurement uncertainty is not the same as trying to calculate how close you are to the population mean. How close you are to the population mean is irrelevant if that population mean is inaccurate.

“Upon what foundation do you make this claim?”Well, Andy is telling us that it is 0.01C.

“

Autocorrelation decreases the uncertainty because the previous value in the time series determines most of the next value.”That makes no sense. If you are using the standard deviation of annual values to estimate the SEM, then the fact that the previous value determines most of the next value, means you have less variation in annual values. That in tern means your

calculateduncertainty is smaller, but spurious. You need to correct for the autocorrelation byincreasingyour calculated uncertainty.That’s not consistent with JCGM 100:2008 equations (13) and (16). You can also test this out with the NIST uncertainty machine which allows you to enter the correlation matrix.

bdgwx,

Both of those equations are “valid only if the input quantities Xi are independent or uncorrelated.”

They do not apply to autocorrelated time series, like ERA5 obviously.

That’s not correct. Equations (13) and (16) are generalized to handle both correlated and uncorrelated (independent) input quantities.

BTW…the quote in its entirety is as follows.

It is possible that you read this paragraph and missed the verbiage talking about equations (10), (11a), and (12), which are special cases that apply only to uncorrelated inputs, and conflated them with equations (13) and (16), which apply to both uncorrelated and correlated inputs.

“

the correlations must be taken into account.”And where is this taken into account exactly when it comes to climate science?

It isn’t, and certainly not by Schmidt. Regarding eq. 13 and 16, u is the variance, representing uncertainty. These equations are suspiciously similar to those used by Scafetta. Except they are for 2 inputs.

Scafetta assumed the years were independent/uncorrelated.

Being pedantic…u is the standard uncertainty and u^2 is the variance.

Scafetta’s equation can be derived from (13) or (16) if you assume uncorrelated inputs such that u(x_i, x_j) = 0 for (13) or more conveniently r(x_i, x_j) = 0 for (16). Notice that in both case (13) and (16) reduce to equation (10). And since the partial derivative of ∂f/∂x_i = 1/N when f = Σ[x_i, 1, N]/N then equation (10) reduces to σ_ς / √N when σ_ς = u(x_i) for all x_i.

Equations (10) (13) and (16) are all for an arbitrary number of inputs. The inputs don’t have to be of the same thing or even of the same type. They may even have different units. For example, f(V, I) = V*I where X_1 = V is voltage and X_2 = I is the current.

The correlations will only increase the uncertainty. The equations are literally just the equation for data with no correlation

plusanother term based on the correlation.“

These equations are suspiciously similar to those used by Scafetta. Except they are for 2 inputs.”They are for as many inputs as required. And there’s nothing suspicious about the equations being similar – they are all based on the same concepts.

I’m not following that argument. If there is autocorrelation that will increase the uncertainty in this case.

Using the NIST uncertainty machine with r(x_i, x_j) = 0 we get 0.578 ± 0.010 C. But adding some correlation of say r(x_i, x_j) = 1 – ((j – i) / 10) we get 0.578 ± 0.027 C

It might be helpful to point out that I chose r(x_i, x_j) = 1 – ((j – i) / 10) such that a value is 90% affected by the value 1 year ago, 80% for 2 years ago, and so forth. You can certainly plug in any correlation matrix you want. I just thought this one was reasonable illustration of the point.

bdgwx, Again, autocorrelated time series are not “independent nor are they uncorrelated”

I know.

Andy perhaps a review of sampling theory is worthwhile.

1) Sampling is done when you are unable to measure the entire population. Sampling can use the Law of Large Numbers (LLN) and the Central Limit Theory (CTL) to ESTIMATE or infer the properties of the population.

2) Sampling requires one to determine two things, the size of each sample and the number of samples to obtain.

3) Both the LLN and the CTL require two things:

a. the samples must be independent, and,

b. the samples must have a distribution like the population, i.e., an identical distribution.

c) This is known as IID.

4) If done properly the mean of each sample taken together will form a normal distribution called the sample means distribution. This distribution is important since it determines the accuracy of statistics obtained from it.

5) If the sample means distribution is normal,

a. The sample means average “x_bar” will estimate the population mean μ.

b. The sample means distribution variance, s², and the standard deviation of the sample means distribution is √s² = s

c. The population Standard Deviation, σ, can be inferred by the formula s • √n = σ. Where “n” is the sample size and NOT the number of samples.

6) The common term Standard Error of the sample Means (SEM) IS the standard deviation of the sample means distribution, “s”.

What does all this mean?

First, if the stations are considered samples, then all the above applies. Importantly, do the means of the samples form a normal frequency distribution? If not, then all bets are off. The sampling was not done properly. None of the inferences about the population statistical parameters made from the sample means distribution statistics are valid.

Second, a very important point about the SEM, which may affect the perceived error of the mean! See item 6) above. THE STANDARD DEVIATION OF THE SAMPLE MEANS DISTRIBUTION IS THE SEM.

YOU DO NOT DIVIDE THE SEM BY ANYTHING TO OBTAIN A SMALLER NUMBER. You certainly don’t divide by the number of samples (stations)!

The following documents discusses this issue

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1255808/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2959222/#

I have never seen in any paper or blog where the “global” data used has been checked to see if there is a normal distribution. This makes the inferences suspect of both suspect.

I’ll address your references from Dr. Taylor’s book in another post.

And the number of samples for just a single time series is always exactly one.

“Sampling requires one to determine two things, the size of each sample and the number of samples to obtain.”

How many more times. There is usually only one sample. There is no point in taking multiple samples from the same population because if you do you can merge them into one bigger sample.

Holy crap guy, you just denigrated every polster, medical researcher, marketing manager and hundreds of math professors. Have you ever searched the Internet for sampling theory?

That is beside the fact that no climate scientists does what you say. Why do we track and calculate by station? Just dump every temperature reading into a big bucket! Sounds good to me actually.

“

Holy crap guy, you just denigrated every polster, medical researcher, marketing manager and hundreds of math professors.”No, I was denigrating you.

“

That is beside the fact that no climate scientists does what you say.”How on earth do you take multiple samples of the same planet over the same time period? As you keep saying, you can only measure the temperature once.

The issue here isn’t combining different averages, it’s your claim that in order to determine the SEM you repeatedly take samples.

It’s this misunderstanding that means that most of the rest of what you say is either self evident or wrong.

e.g.

“

If done properly the mean of each sample taken together will form a normal distribution called the sample means distribution. This distribution is important since it determines the accuracy of statistics obtained from it.”How many different samples do you want to take to determine the sampling distribution of the mean in this way?

You can do it in a simulation, maybe you do it in your workshop, but if say you you are measuring the heights of trees in a forest, and you take a sample of size 30, why on earth would you then take another 100 samples of the same size, just to determine the sampling distribution?

As you say, the point of sampling is that it’s too expensive or impossible to measure every tree in the forest, so you only take a sample of sufficient size to give you a good enough estimate of the population size. If you had to expand your sample of 30 into a sample of 3000, how is that practicable. And if you could measure 3000 random trees, why not just combine them into a single same of 3000, and get a more accurate estimate?

Maybe you are thinking of bootstrapping or Monte Carlo techniques that do simulate taking multiple samples, but that doesn’t seem to be what you are saying.

“

If the sample means distribution is normal,a. The sample means average “x_bar” will estimate the population mean μ.”The mean of a sample will estimate the population mean regardless of the shape of the sampling distribution. Though with a sufficiently large sample size it should be normal.

“b. The sample means distribution variance, s², and the standard deviation of the sample means distribution is √s² = s”Gosh, the square root of the square of something is equal to the original value.

But this is where you start to confuse yourself.

sis usually used for the standard deviation of the sample. But here you are trying to use it for the standard deviation of the sampling distribution – that is SEM. And why would you be interested in the variance of the sampling distribution?” c. The population Standard Deviation, σ, can be inferred by the formula s • √n = σ. Where “n” is the sample size and NOT the number of samples.”You keep saying this as if it was a useful result. Usually you have one sample, and you infer the population standard deviation from that sample standard deviation. If you have taken many different samples, you could infer the population standard deviation by looking at the standard deviation of the pooled results. There’s zero point in taking the sampling distribution of the multiple samples, and multiplying by root n.

“

First, if the stations are considered samples, then all the above applies.”Samples of what? You keep making these vague statements without ever specifying exactly what you are attempting to do. If you want to know, say, the global average anomaly for one specific month, how are you considering stations as samples? Do you mean treat the 30 daily values for any one station as one sample, or do you mean take each monthly average from an individual station as one value from the sample that is all stations?

“

YOU DO NOT DIVIDE THE SEM BY ANYTHING TO OBTAIN A SMALLER NUMBER.”Who has ever suggested you do? And why did this trivial point need to be written in capitals?

It seems obvious that thsi simply illustrates that you don’t know what the SEM is or how it is calculated.

“

You certainly don’t divide by the number of samples (stations)!”If you want to treat each individual station as being a sample unto itself over a particular period, fine. You have 1 single sample per instrument, and you could calculate it’s SEM just as in Exercise 2 of TN1900. But if you then average all of the instrument averages over the globe to get a global average, you are not producing a sampling distribution. Each station is a completely different sample. If you want to use multiple samples to estimate the sampling distribution they all have to come from the same population (IID remember).

So if what you are trying to say, is you can take the standard deviation of all the individual station means, and treat it as the SEM, you really haven’t understood anything. The global average is a single sample of all the station means. The standard deviation of all the station means is the standard deviation of that sample, and the SEM is that standard deviation divided by root N, where N is the number of stations.

And to be clear, none of this is how you would calculate the uncertainty in an actual global anomaly. Stations are not random observations from the surface. There are multiple ways of creating global averages from them, and all will require specific estimates of the uncertainty involved in each procedure.

Diminishing returns can bite in a big way with increasing sample sizes. It’ often useful to compare statistics (including variance/sd) for multiple samples to see whether they could come from the same population. That gives more information than pooling them.

Comparing samples seems to be what this whole topic is about 🙂

Your comment makes no sense. Multiple samples is why there are so many stations.

Once again, individual stations are not sampling the population. If you want a global average the population is the entire globe. One individual station can give you a sample from one point on the globe.

A sampling distribution requires multiple samples, each taken from the entire population. Unless your sample from an individual station is somehow sampling random points around the globe for each value it records, it cannot be considered a random sample from the global population.

Maybe it would be a fun exercise to simulate that. Take all the daily station data, and create simulated stations where every daily value is taken from a random station. Then each of these synthetic stations could be treated as a random sample of the global average for that month, and the distribution of all such stations would give you the standard error of the mean. But it would only be the error of the mean of a sample of size 30, so not much certainty.

“

NOAA lists CRN accuracy as ±3°C.”I think that should be ±0.3°C.

I laughed when I saw the output line of that on-line error propagation tool in figure 5 — ten digits!

Meaningless digits! I was teaching when hand-held calculators first came out in the ’70s. I’d give a problem on a quiz with all the inputs having 3 significant figures, and I’d get answers with all the digits displayed on their calculator. It was surprisingly difficult to get the students to understand that I expected them to be smarter than their calculators.