The further a society drifts from the truth, the more it will hate those who speak it. ... In a time of deceit, telling the truth is a revolutionary act. George Orwell

Sunday, July 11, 2010

China Study - Raw Data - more plant food = more heart disease!

At last! Finally the Raw Data behind the infamous "The China Study..." book by Dr. TC Campbell from Cornell University has emmerged out of some obscure "unobtainium" publication and became available on-line on the Clinical Trial Service Unit at Oxford University web site!

Below are the links to blogs and sources.

#1. Denise Minger on China Study - long and in depth analysis of the raw data with graphs. See also her article on Tuoli county the only county in the China Study that consumed a high fat medium carb diet:

According to our prominent vegan theorists such as Drs Campbel, Ornish, McDougall, Esselstyne et al, the Tuoli people ought to have been very sick or dead. As you can read from Denise analysis nothing is farthest from truth. Tuoli seems to be healthier than in most other China counties!

#2. Fantastic comment by Richard Kroeker on Amazon forum,
- giving his own analyzis of the raw date similar to and corroborationg an analysis by Denise Minger.  Note: you should start reading from that post and then move on to #1 above, since Kroeker's article is much shorter.


... This is not at all what Campbell's book implied the data said. As I said above, I am an engineer (with a PhD) with heart disease simply trying to find out what to eat. You do the math...

My day-job is analyzing hard drive failure statistics that result from usage and stress testing; I get paid to make the problems being studied "go away". I have also recently had a triple bypass, ...

For instance, the people who ate the most animal protein had 68.9% less heart disease (at 95% confidence) than those people who ate the least animal protein. The people who ate the most plant protein had 64.9% more heart disease (at 89% confidence) than those people who ate the least plant protein.

I am quoting here some interesting correlation (actually the risk ratio between the extreme sample bins for a given variable, '-' means improvement, '+' means harm) from Kroeker's post, the first column numbers are univariate (single-variable, uncorrected against possible confounders) risk ratios in %, the most negative numbers (blue) = low mortality, the most positive numbers (red) = high mortality. The second number in brackets are the "confidence" estimates in % as per Kroeker's definition (see here in his methodology document). This is for mortality of all vascular disease age 35-69.


-55.6% (90%) - ANIMAL FOOD INTAKE (g/day/ref)
-55.1% (94%) - FOLATE plasma FOLATE (ng/mL)
-54.8% (89%) - ANIMAL PROTEIN INTAKE (g/day/ref)
-54.1% (90%) - FISH INTAKE (g/day/ref)

-49.5% (84%) - TOTAL LIPID INTAKE (g/day/reference man)

-49.1% (87%) - PERCENTAGE ANIMAL FOOD INTAKE (for refere
-48.4% (83%) - MEAT INTAKE (red meat and poultry) (g/day
-48.0% (83%) - CHOLESTEROL INTAKE (mg/day/reference man)
-46.6% (81%) - RED MEAT (pork, beef, mutton) INTAKE (g/d
-42.2% (82%) - SATURATED FATTY ACID INTAKE (g/day/ref)
-40.7% (89%) - RICE INTAKE (g/day/reference man, air-dry
-38.0% (84%) - TOTAL CAROTENOID INTAKE (retinol equivale
-36.0% (84%) - POULTRY INTAKE (g/day/reference man, as-c
-42.9% (82%) - Se plasma SELENIUM (ug/dL)
-42.8% (85%) - TOTPROT plasma 1989 TOTAL PROTEIN (g/dL)
-42.6% (86%) - APOA1 plasma APOLIPOPROTEIN A1 (mg/dL) (non-pooled analysis
-40.7% (88%) - Zn plasma ZINC (mg/dL)
-38.7% (76%) - B-CAROT plasma BETA CAROTENE (ug/dL)
-38.0% (82%) - ANHYDLUT plasma ANHYDRO LUTEIN (ug/dL)
-34.6% (81%) - TOTCHOL plasma TOTAL CHOLESTEROL (mg/dL)
-34.1% (79%) - NON-HDL plasma CHOLEST.(mg/dL)[=LDL+Trig/5]

32.4% (79%) - plasma LDL to HDL ratio
35.6% (75%) - PLANT FOOD INTAKE (g/day/reference man)
37.5% (82%) - POTASSIUM INTAKE (mg/day/ref)
39.3% (76%) - SPICE INTAKE (g/day/ref)
40.0% (84%) - MAGNESIUM INTAKE (mg/day/ref)
42.2% (80%) - MANGANESE INTAKE (mg/day/ref)
43.0% (90%) - OTHER CEREAL INTAKE (g/day/ref)
46.4% (93%) - TOTAL PROTEIN INTAKE (g/day/ref)
47.7% (91%) - COPPER INTAKE (mg/day/ref)

50.5% (87%) - IRON INTAKE (mg/day/ref)
58.9% (95%) - PLANT PROTEIN INTAKE (g/day/reference man)
62.4% (97%) - WHEAT FLOUR INTAKE (g/day/reference man)

#3. Richard Nikoley's blog where I found the original links (thanks):

Stan (Heretic)


Update 13-July-2010

HILLARIOUS response (and also the comment #505 here (*) that is   ) by Dr. TC Campbell of Cornell University to Denise Minger!

No discussion of the data, instead plenty of ad-hominem attacks, pointing out her age, questioning her character integrity and weaving some conspiracy theory implying backing by some lobbying organization having "untold financial resources" such as Weston A. Price Foundation!   :)

Dr. TC Campbell of Cornell U. (probably) wrote:

"I find it very puzzling that someone with virtually no training in this science can do such a lengthy and detailed analysis in their supposedly spare time. I know how agricultural lobbying organizations do it–like the Weston A Price Foundation with many chapters around the country and untold amounts of financial resources. Someone takes the lead in doing a draft of an article, then has access to a large number of commentators to check out the details, technical and literal, of the drafts as they are produced. I have no proof, of course, whether this young girl is anything other than who she says she is, but I find it very difficult to accept her statement that this was her innocent and objective reasoning, and hers alone. If she did this alone, based on her personal experiences from age 7 (as she describes it), I am more than impressed."
- I am not!


*) If someone figured it out how to link to a comment by its number, on wordpress blog please let me know. Nothing obvious such as ?comment=505 etc seems to work.

Update 17-July-2010

Reordering and reformatting. It is interesting to notice that in China Study the higher total cholesterol, and the higher LDL+Triglycerides correlated with LOWER cardiovascular mortality; while higher HDL level correlated very strongly with lower cardiovascular mortality!

Update 29-July-2010

Added confidence levels in brackets (%) and a link to Rich Kroeker's methodology document.



Peter said...

-70.7, calories from fat! I just love it. Fat rules. Only observational but this is likely to be causal too, not that I'm biased at all!


Stan (Heretic) said...

Hi Peter,

I agree, though unlike you I _am_ slightly biased :)

Logically, strong negative correlations are probably lot more indicative of some ptrotecive effect of the food being correlated.

Strong positive correlation may on the other hand be more readily explained away by our vegan friends, by plant-correlated pollutants, infections & parasites carried by plants (but not meat!!!) etc. I am sure they will try!


Stan (Heretic) said...

Reposting my own comment from webmd ( )


The final discrediting of "The China Study" book by Dr. TC Campbell of Cornell University:

Chris Masterjohn's blog on WAPF

Remember our discussions on that topic, here and on before they kicked me out, on the role of powdered caseine in rats, promoting cancer when induced with aflatoxins?

We were speculating that perhaps, yes, caseine may be bad on its own or only for rats and perhaps only at high doses etc etc. We were naive!

Turns out, the truth was much more simple: it was a scientific misinformation and witholding of vital additional information about the trials, that were known to the authors! There is and there was nothing wrong with caseine!

This paper co-authored by Dr. TC Campbell himself:

- showed that that caseine based feed helped cancer grow because it was "good quality protein" in Campbell's own words!

Powdered caseine based feed was complete, while plant protein such as wheat gluten that Campbell's paper calls "low-quality protein" would stunt cancer growth just as it would probably stunt growth of any other growing tissue in rats or humans because it is defficient and incomplete.

The same man who has glorified plant based food is using the term "low quality protein" to describe wheat gluten, and is chosing the words "high quality protein" to describe dairy based rats' feed!

Think about his choice of the words! Think about the omission of his own paper on cancer-promoting gluten+lysine conbination, in his accusations against caseine (if he did include it, would have totally invalidated his caseine=cancer thesis!)

Take that all together plus Denise Minger's analysis - and think about it as the whole story:

- Does "The China Study" book look now in this new light like an honest interpretation mistake? Really?

Stan (Heretic)


O Primitivo said...

Dear Stan, these correlation values don't match the published values for M059-ALLVASCc. Please see this document, page 215 - I've also calculated the CHINA PROJECT correlations, and mine are equal to the oficial ones. See here - Also, here is my Excel database -

Stan (Heretic) said...

Hi O'Primitivo,

As far as I could figure it out based on Rich Kroaker's document , he binned the independent variables into sextiles and calculated the "risk ratios" rather than straight corelation factors as in the oxford's pdf document. He "warned" me also that he is a hard-core Bayesian so be prepared to read about conditional probabilities... 8-:)

His methods are very advanced (including numerical tools) and you should perhaps contact him directly if you too are seriously involved in statistical data processing.

I asked Rich to post some more definitiona of the terms he used, I sent him an email to ask for more comments. I hope it will go through his email server anti-spam barrier.


richard said...

Say Hey-

Yes, these are not simple correlation coefficients for the raw datasets. As Stan references, they are actually the ratio of the separation of sextiles chosen to be similar in magnitude to the raw correlation coefficients.

At work I have been analyzing the "mortality of hard drives" for decades. We build drives with various modifications, much as people with different diets. We systematically track down the sources of variance that correlate to our problem of interest. We have deadlines.

Our the last 30 years I have lived through such a series of problems that I have gotten a bit hard core about how to use statistics; and how not to use them. One lesson in particular stands out - people are awfully poor at understanding a problem with more than one source of variance.

More to the point, all datasets have intrinsic noise; meaning that if you did the exactly same thing again, the results would not be identical. Sometimes at work we do the same test 5 times and get five different answers.
The obvious meaning is that we do yet have a measurement on the root source of the variance.

Problem closure comes when the sum of all the correlations between variables and results equals one (there are of course rules on forming the proper summation). For heart disease this dataset based on diet has missing variables. I substitute the various items in the diet for the missing variables by over fitting the dataset.

I discard the overfit and use the reformulated distribution of results to form scatter plots against the diet item of interest.
This allows me to analyze the effect of an input multiple times against the same dataset. Each one is different to the degree that the other diet items are able to add to and subtract from the effect being investigated.

Stan has link to this example in detail. I expect you will find the correlation coefficent you seek inside the distribution of values that I calculated. As long as your value is near the mean value of my distribution, there is no functional difference.

This is a very robust method of detecting spurious correlations. It is similar to throwing out the highs and lows in a correlation and demanding that the correlation of the core items align with the extremes. I use it to avoid being misdirected by noise and to pickup effects of interactions.

I am concerned about my own personal health. I do not give a damn about convincing anybody about anything. These are the numbers I use for myself; and I only get one shot at life. I do this type of data analysis for a living, and I have been very successful at solving problems.

Why did I post it? I blame my daughter :) She keeps kicking me in da'ass and pushes me to post my personal datasets so they might be able to help other people. My personal attitude is do it yourself and use your own numbers; don't trust your life to some stranger.


PS But no matter how you crunch the numbers, you just can't miss the observation that the primary dietary item in the listing that relates to heart disease is wheat.
Oh excuse me, I'm wrong about that, aren't I? Dr. Campbell somehow missed it completely.

O Primitivo said...

Dear Stan and Richard, thanks for your explanations. Can we consider risk factors any more informative than just correlations? Also, an idea just occurred to me a few days ago. In order to find better correlations of diet factors with certain diseases, does it make sense to search for higher correlations of those diseases with linear combinations of diet components (real food, not PUFAs or SAFAs), that can be added? Is this a common procedure in Epidemiology? I'm just finishing some code to explore this subject, if I get to some relevant result I'll post them. Best regards.

Stan (Heretic) said...

Hi O Primitivo,k

Re: Can we consider risk factors any more informative than just correlations?

I will try to answer. I do not use statistics a lot for living, only occasionally in my design work to prove (or disprove!) to my clients that my sensors work; Rich may be able to explain it better.

The answer is "No" if the original data obeys normal distribution. By binning your data (independent variable) you are loosing some information.

The answer is "Yes", you may obtain a more stable and more meaningful statistics by binning if your original data contains outliers or errors. This is for the same reason that, for example, when you want to know just the average, in the first case the arithmetic average would produce more accurate results, wheres in the second case the mediane should be used.


O Primitivo said...

Dear Stan, thanks for your answer. Here is an attempt to correlate groups of diet items with vascular disease using the CHINA PROJECT data.

Principal Quattrano said...

The correct link to wordpress comment #101 is [permalink] + '/#comment-101'

Stan (Heretic) said...

Thanks Principal Quattrano, Stan