To categorize the nations to reflect the development status, to date, there are many conceptual frameworks. The Human Development index (HDI) that is published by the United Nations Development Programme is widely acc...To categorize the nations to reflect the development status, to date, there are many conceptual frameworks. The Human Development index (HDI) that is published by the United Nations Development Programme is widely accepted and practiced by many people such as academicians, politicians, and donor organizations. However, though the development of HDI has gone through many revisions since its formulation in 1990, even the current version of the index formulation published in 2016 needs research to better understand and to gap-fill the knowledge base that can enhance the index formulation to facilitate the direction of attention such as release of funds. Therefore, in this paper, based on principal component analysis and K-means clustering algorithm, the data that reflect the measures of life expectancy index (LEI), education index (EI), and income index (II) are analyzed to categorize and to rank the member states of the UN using R statistical software package, an open source extensible programming language for statistical computing and graphics. The outcome of the study shows that the proportion of total eigen value (i.e., proportion of total variance) explained by PCA-1 (i.e., first principal component) accounts for more than 85% of the total variation. Moreover, the proportion of total eigen value explained by PCA-1 increases with time (i.e., yearly) though the amount of increase with time is not significant. However, the proportions of total eigen value explained by PCA-2 and PCA-3 decrease with time. Therefore, the loss of information in choosing PCA-1 to represent the chosen explanatory variables (i.e., LEI, EI, and II) may diminish with time if the trend of increasing pattern of proportion of total eigen value explained by PCA-1 with time continues in the future as well. On the other hand, the correlation between EI and PCA-1 increases with time although the magnitude of increase is not that significant. This same trend is observed in II as well. However, in contrast to these observations, the correlation bet展开更多
In this work, we show that when there is insufficient equipment for detecting a disease whose prevalence is <em>t</em>% in a sub-population of size <em>N</em>, it is optimal to divide the <e...In this work, we show that when there is insufficient equipment for detecting a disease whose prevalence is <em>t</em>% in a sub-population of size <em>N</em>, it is optimal to divide the <em>N</em> samples into <em>n</em> groups of size r each and then, the value <img src="Edit_ce149849-3742-48fe-820b-02ccc0c92d83.bmp" alt="" /> allows systematic screening of all <em>N</em> individuals by performing less than <em>N</em> tests (In this expression, <img src="Edit_987eb236-a883-4894-ba2d-52bde5f35056.bmp" alt="" /> represents the floor function<sup>1</sup> of <em>x</em> ∈ R). Based on this result and on certain functions of the R software, we subsequently propose a probabilistic strategy capable of optimizing the screening tests under certain conditions.展开更多
目的分层分析是将数据按照某个(些)需要控制的变量进行分层,然后再估计暴露与结局之间关联强度的一种资料分析方法,是肿瘤流行病学研究中最常用的控制混杂的方法之一。本研究旨在应用R软件的epiR软件包实现肿瘤流行病学资料的分层分析,...目的分层分析是将数据按照某个(些)需要控制的变量进行分层,然后再估计暴露与结局之间关联强度的一种资料分析方法,是肿瘤流行病学研究中最常用的控制混杂的方法之一。本研究旨在应用R软件的epiR软件包实现肿瘤流行病学资料的分层分析,为识别和控制混杂因素提供新的统计学工具。方法结合2009年的一项关于乳腺癌的病例对照研究数据,以体质指数(body mass index,BMI)为分层变量,分析血清中抵抗素含量和乳腺癌发病的关系,计算采用epiR软件包实现。结果未经BMI调整时,血清中抵抗素含量与乳腺癌发生的关联强度ORc=3.431(95%CI:1.590~7.406),P=0.001;经BMI分层调整后,血清中抵抗素含量与乳腺癌发生的关联强度ORmh=3.809(95%CI:1.703~8.518),P=0.001。结论调整BMI混杂因素的影响之后,血清中抵抗素含量与乳腺癌发生的关联性依然存在,表明血清中抵抗素含量可能是乳腺癌的危险因素之一。epiR软件包程序书写简单,结果输出丰富,能方便地完成分层分析,可以为肿瘤流行病学研究人员开展分层分析提供参考。展开更多
文摘To categorize the nations to reflect the development status, to date, there are many conceptual frameworks. The Human Development index (HDI) that is published by the United Nations Development Programme is widely accepted and practiced by many people such as academicians, politicians, and donor organizations. However, though the development of HDI has gone through many revisions since its formulation in 1990, even the current version of the index formulation published in 2016 needs research to better understand and to gap-fill the knowledge base that can enhance the index formulation to facilitate the direction of attention such as release of funds. Therefore, in this paper, based on principal component analysis and K-means clustering algorithm, the data that reflect the measures of life expectancy index (LEI), education index (EI), and income index (II) are analyzed to categorize and to rank the member states of the UN using R statistical software package, an open source extensible programming language for statistical computing and graphics. The outcome of the study shows that the proportion of total eigen value (i.e., proportion of total variance) explained by PCA-1 (i.e., first principal component) accounts for more than 85% of the total variation. Moreover, the proportion of total eigen value explained by PCA-1 increases with time (i.e., yearly) though the amount of increase with time is not significant. However, the proportions of total eigen value explained by PCA-2 and PCA-3 decrease with time. Therefore, the loss of information in choosing PCA-1 to represent the chosen explanatory variables (i.e., LEI, EI, and II) may diminish with time if the trend of increasing pattern of proportion of total eigen value explained by PCA-1 with time continues in the future as well. On the other hand, the correlation between EI and PCA-1 increases with time although the magnitude of increase is not that significant. This same trend is observed in II as well. However, in contrast to these observations, the correlation bet
文摘In this work, we show that when there is insufficient equipment for detecting a disease whose prevalence is <em>t</em>% in a sub-population of size <em>N</em>, it is optimal to divide the <em>N</em> samples into <em>n</em> groups of size r each and then, the value <img src="Edit_ce149849-3742-48fe-820b-02ccc0c92d83.bmp" alt="" /> allows systematic screening of all <em>N</em> individuals by performing less than <em>N</em> tests (In this expression, <img src="Edit_987eb236-a883-4894-ba2d-52bde5f35056.bmp" alt="" /> represents the floor function<sup>1</sup> of <em>x</em> ∈ R). Based on this result and on certain functions of the R software, we subsequently propose a probabilistic strategy capable of optimizing the screening tests under certain conditions.
文摘目的分层分析是将数据按照某个(些)需要控制的变量进行分层,然后再估计暴露与结局之间关联强度的一种资料分析方法,是肿瘤流行病学研究中最常用的控制混杂的方法之一。本研究旨在应用R软件的epiR软件包实现肿瘤流行病学资料的分层分析,为识别和控制混杂因素提供新的统计学工具。方法结合2009年的一项关于乳腺癌的病例对照研究数据,以体质指数(body mass index,BMI)为分层变量,分析血清中抵抗素含量和乳腺癌发病的关系,计算采用epiR软件包实现。结果未经BMI调整时,血清中抵抗素含量与乳腺癌发生的关联强度ORc=3.431(95%CI:1.590~7.406),P=0.001;经BMI分层调整后,血清中抵抗素含量与乳腺癌发生的关联强度ORmh=3.809(95%CI:1.703~8.518),P=0.001。结论调整BMI混杂因素的影响之后,血清中抵抗素含量与乳腺癌发生的关联性依然存在,表明血清中抵抗素含量可能是乳腺癌的危险因素之一。epiR软件包程序书写简单,结果输出丰富,能方便地完成分层分析,可以为肿瘤流行病学研究人员开展分层分析提供参考。