Package 'marcox'

Title: Marginal Hazard Ratio Estimation in Clustered Failure Time Data
Description: Estimation of marginal hazard ratios in clustered failure time data. It implements a novel weighted estimating equation approach based on a semiparametric marginal proportional hazards model (See Niu, Y. Peng, Y.(2015). "A new estimating equation approach for marginal hazard ratio estimation"), accounting for within-cluster correlations. The package is designed for researchers in biostatistics and epidemiology who require accurate and efficient estimation methods for survival analysis in clustered data settings. Simulation functions, two real-world datasets including kidney infection dataset and diabetes dataset for demonstration purposes.
Authors: Junyi Chen [aut, cre], Siqi Zhou [aut], Shida Li [aut], Yi Niu [aut]
Maintainer: Junyi Chen <[email protected]>
License: GPL-3
Version: 1.0.0
Built: 2026-05-25 09:57:05 UTC
Source: https://github.com/actinium-oxide/marginal-cox-model-for-correlation-survival-data------marcox

Help Index


Diabetes Study Data

Description

A dataset containing clinical information from a diabetes study.

Usage

data(diabetes)

Format

A data frame with 166 rows and 6 variables:

risk

Numeric: Risk score of the patient.

cens

Binary (0/1): Censoring indicator (1 = event occurred, 0 = censored).

time

Numeric: Time to event or censoring (in months).

id

Integer: Patient ID.

trt

Binary (0/1): Treatment indicator (1 = treated, 0 = control).

age

Binary (0/1): Age group indicator (1 = older, 0 = younger).

Source

Hypothetical clinical study data.

Examples

data(diabetes)
  summary(diabetes)

Generate Simulated Datasets for Cox Proportional Hazards Model

Description

This function generates multiple datasets for survival analysis based on a Cox proportional hazards model. The baseline hazard function follows either a Weibull or an exponential distribution, depending on the values of lambda. The function ensures that the maximum observed time in both the control and treatment groups is checked for censoring. If the maximum time is not censored, it is forced to be censored to maintain the desired censoring rate.

Usage

gendat(
  type = "bin",
  dimension = 10,
  K = 30,
  n = 2,
  lambda = c(1, 2),
  b1 = c(log(2), -0.1),
  theta = 8,
  censrate = 0.3
)

Arguments

type

Character. If type = 'bin', the covariates are generated as binary variables; if type = 'cont' continuous covariates are generated.

dimension

Integer. The number of datasets to be generated.

K

Integer. The number of clusters (groups) within each dataset.

n

Integer. The number of samples within each cluster.

lambda

Numeric vector. A two-element vector specifying the parameters for the baseline distribution:

  • If lambda = c(a, b), where a > 1, the baseline follows a Weibull distribution.

  • If lambda = c(1, b), the baseline follows an exponential distribution.

b1

Vector. The regression coefficient for the covariates, affecting the hazard function. We suggest that the maximum of b1 should be lower than 2.

theta

Numeric. A parameter controlling the dependency structure between survival times within clusters. Higher values indicate stronger within-cluster correlation.

censrate

Numeric. The target censoring rate for the dataset.

Value

A list containing:

  • data - A list of data frames, each containing a generated dataset.

  • censoringrates - A numeric vector representing the censoring rate for each dataset.

  • mean(censoringrates) - The mean censoring rate across all datasets.

Examples

# Generate binary covariate datasets with 1 datasets, 10 clusters, and 6 samples per cluster
print(gendat(type = 'bin', dimension = 1, K = 6, n = 10, lambda = c(1, 2),
      b1 = c(log(2),-log(2)), theta = 8, censrate = 0.5))

Kidney Disease Study Data

Description

A dataset containing survival analysis information related to kidney disease patients.

Usage

data(kidney_data)

Format

A data frame with 76 rows and 5 variables:

time

Numeric: Time to event or censoring (in days).

cens

Binary (0/1): Censoring indicator (1 = event occurred, 0 = censored).

age

Numeric: Age of the patient in years.

sex

Binary (0/1): Sex of the patient (1 = male, 0 = female).

type

Categorical (0,1,2,3): Kidney disease type classification.

Source

Hypothetical survival study data.

Examples

data(kidney_data)
  summary(kidney_data)

Analysis for Cox Proportional Hazards Models

Description

This function performs marcox analysis for Cox proportional hazards models, incorporating clustered data and handling time-dependent covariates. It estimates coefficients, standard errors, and p-values based on the specified formula and dataset.

Usage

marcox(
  formula,
  data,
  sep = NULL,
  method = "exchangeable",
  col_id = "id",
  div = NULL,
  k_value = 1,
  plot_x = NULL,
  x_axis = "Time",
  y_axis = "Survival Rates",
  size = 0.5,
  diagnose = FALSE
)

Arguments

formula

A model formula that uses the Surv() function to define the survival outcome. It should include both continuous and categorical covariates, where categorical variables must be specified using the factormar() function.

data

The file path or the dataset(matrix) to be analyzed. If a file path is provided, the file will be loaded into a matrix. The file should be in a tabular format (e.g., .csv, .txt).

sep

Character. The sep parameter specifies the character that separates the fields in each line of the file. For instance, for a comma-separated file, set sep = ",", and for a tab-separated file, set sep = "\t".

method

The method employed to solve the correlation coefficient:

  • Exchangeable correlation structure: method = 'exchangeable'<>

  • Autoregressive(AR-1): method = 'ar1'

  • k-dependent: method = 'kdependent'

  • Toeplitz: method = 'toeplitz'

  • Independent: method = 'independent'

  • Unstructured: method = 'unstructured'

col_id

Character. The name of column that identifies the clusters.

div

Integer. The number of observation points per sample. If provided, the data will be divided accordingly. If the data has complex observational situations, please preprocess the data before using this function.

k_value

The k value only for k-dependent structure. The default value is 1.

diagnose

Diagnose option.

Details

The marcox() function is specifically designed for survival data analysis using Cox proportional hazards models. It handles both clustered and time-dependent covariates effectively. The survival outcome must be defined using the Surv() function in the model formula, and covariates can be included directly or by converting categorical variables with the factormar() function.

Value

A list containing the following components:

  • coef - The estimated regression coefficients.

  • exp(coef) - The exponentiated coefficients (hazard ratios).

  • se(coef) - The standard errors of the estimated coefficients.

  • z - The z-statistics for testing the significance of the coefficients.

  • p - The p-values associated with the coefficients.

  • (hidden).correlation - Correlation coefficients of the data.

Examples

formula <- Surv(time, cens) ~ sex + factormar('type', d_v=c(1,2,3))
  marcox(formula, data = kidney_data, div = 2, method = 'exchangeable', plot = TRUE, plot_x = 'sex')