| Title: | Marginal Hazard Ratio Estimation in Clustered Failure Time Data |
|---|---|
| Description: | Estimation of marginal hazard ratios in clustered failure time data. It implements a novel weighted estimating equation approach based on a semiparametric marginal proportional hazards model (See Niu, Y. Peng, Y.(2015). "A new estimating equation approach for marginal hazard ratio estimation"), accounting for within-cluster correlations. The package is designed for researchers in biostatistics and epidemiology who require accurate and efficient estimation methods for survival analysis in clustered data settings. Simulation functions, two real-world datasets including kidney infection dataset and diabetes dataset for demonstration purposes. |
| Authors: | Junyi Chen [aut, cre], Siqi Zhou [aut], Shida Li [aut], Yi Niu [aut] |
| Maintainer: | Junyi Chen <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-05-25 09:57:05 UTC |
| Source: | https://github.com/actinium-oxide/marginal-cox-model-for-correlation-survival-data------marcox |
A dataset containing clinical information from a diabetes study.
data(diabetes)data(diabetes)
A data frame with 166 rows and 6 variables:
riskNumeric: Risk score of the patient.
censBinary (0/1): Censoring indicator (1 = event occurred, 0 = censored).
timeNumeric: Time to event or censoring (in months).
idInteger: Patient ID.
trtBinary (0/1): Treatment indicator (1 = treated, 0 = control).
ageBinary (0/1): Age group indicator (1 = older, 0 = younger).
Hypothetical clinical study data.
data(diabetes) summary(diabetes)data(diabetes) summary(diabetes)
This function generates multiple datasets for survival analysis based on a Cox proportional hazards model.
The baseline hazard function follows either a Weibull or an exponential distribution, depending on the values of lambda.
The function ensures that the maximum observed time in both the control and treatment groups is checked for censoring.
If the maximum time is not censored, it is forced to be censored to maintain the desired censoring rate.
gendat( type = "bin", dimension = 10, K = 30, n = 2, lambda = c(1, 2), b1 = c(log(2), -0.1), theta = 8, censrate = 0.3 )gendat( type = "bin", dimension = 10, K = 30, n = 2, lambda = c(1, 2), b1 = c(log(2), -0.1), theta = 8, censrate = 0.3 )
type |
Character. If |
dimension |
Integer. The number of datasets to be generated. |
K |
Integer. The number of clusters (groups) within each dataset. |
n |
Integer. The number of samples within each cluster. |
lambda |
Numeric vector. A two-element vector specifying the parameters for the baseline distribution:
|
b1 |
Vector. The regression coefficient for the covariates, affecting the hazard function. We suggest that the maximum of |
theta |
Numeric. A parameter controlling the dependency structure between survival times within clusters. Higher values indicate stronger within-cluster correlation. |
censrate |
Numeric. The target censoring rate for the dataset. |
A list containing:
data - A list of data frames, each containing a generated dataset.
censoringrates - A numeric vector representing the censoring rate for each dataset.
mean(censoringrates) - The mean censoring rate across all datasets.
# Generate binary covariate datasets with 1 datasets, 10 clusters, and 6 samples per cluster print(gendat(type = 'bin', dimension = 1, K = 6, n = 10, lambda = c(1, 2), b1 = c(log(2),-log(2)), theta = 8, censrate = 0.5))# Generate binary covariate datasets with 1 datasets, 10 clusters, and 6 samples per cluster print(gendat(type = 'bin', dimension = 1, K = 6, n = 10, lambda = c(1, 2), b1 = c(log(2),-log(2)), theta = 8, censrate = 0.5))
A dataset containing survival analysis information related to kidney disease patients.
data(kidney_data)data(kidney_data)
A data frame with 76 rows and 5 variables:
timeNumeric: Time to event or censoring (in days).
censBinary (0/1): Censoring indicator (1 = event occurred, 0 = censored).
ageNumeric: Age of the patient in years.
sexBinary (0/1): Sex of the patient (1 = male, 0 = female).
typeCategorical (0,1,2,3): Kidney disease type classification.
Hypothetical survival study data.
data(kidney_data) summary(kidney_data)data(kidney_data) summary(kidney_data)
This function performs marcox analysis for Cox proportional hazards models, incorporating clustered data and handling time-dependent covariates. It estimates coefficients, standard errors, and p-values based on the specified formula and dataset.
marcox( formula, data, sep = NULL, method = "exchangeable", col_id = "id", div = NULL, k_value = 1, plot_x = NULL, x_axis = "Time", y_axis = "Survival Rates", size = 0.5, diagnose = FALSE )marcox( formula, data, sep = NULL, method = "exchangeable", col_id = "id", div = NULL, k_value = 1, plot_x = NULL, x_axis = "Time", y_axis = "Survival Rates", size = 0.5, diagnose = FALSE )
formula |
A model formula that uses the |
data |
The file path or the dataset(matrix) to be analyzed. If a file path is provided, the file will be loaded into a matrix. The file should be in a tabular format (e.g., .csv, .txt). |
sep |
Character. The |
method |
The method employed to solve the correlation coefficient:
|
col_id |
Character. The name of column that identifies the clusters. |
div |
Integer. The number of observation points per sample. If provided, the data will be divided accordingly. If the data has complex observational situations, please preprocess the data before using this function. |
k_value |
The k value only for k-dependent structure. The default value is 1. |
diagnose |
Diagnose option. |
The marcox() function is specifically designed for survival data analysis using Cox proportional hazards models. It handles both clustered and time-dependent covariates effectively.
The survival outcome must be defined using the Surv() function in the model formula, and covariates can be included directly or by converting categorical variables with the factormar() function.
A list containing the following components:
coef - The estimated regression coefficients.
exp(coef) - The exponentiated coefficients (hazard ratios).
se(coef) - The standard errors of the estimated coefficients.
z - The z-statistics for testing the significance of the coefficients.
p - The p-values associated with the coefficients.
(hidden).correlation - Correlation coefficients of the data.
formula <- Surv(time, cens) ~ sex + factormar('type', d_v=c(1,2,3)) marcox(formula, data = kidney_data, div = 2, method = 'exchangeable', plot = TRUE, plot_x = 'sex')formula <- Surv(time, cens) ~ sex + factormar('type', d_v=c(1,2,3)) marcox(formula, data = kidney_data, div = 2, method = 'exchangeable', plot = TRUE, plot_x = 'sex')