Survival analysis is concerned with the time it takes until a certain event occurs. This could be the death (or relapse) of a patient with cancer or the date when a student graduates from high school. Thus, the key event can be viewed as success (getting a law degree) or failure (death), although generally the terminology used is most suited to the second type of situation.
Failure (i.e. the key event) can correspond to a component breaking in an engineering context (reliability analysis), an organism dying in a biological context (survival analysis), or the end of an economic downturn in an economic context (duration analysis).
Censoring
A key complication in survival analysis is censoring. For example, there are 100 patients in a 10-year clinical trial for a cancer drug. But suppose two of the patients leave the program after 5 years. While we know that they survived for 5 years, we don’t know what happened to them after that.
Another example of censoring relates to the patients who are still alive when the clinical trial ends. We know that they survive 10 years, but we cannot tell when they die. This is called right censoring. Here the patient survives for more than the recorded time. There is also the case where the patient dies in a car accident after 7 years. Since we don’t know how much longer they would have lived without contracting cancer, this too can be considered to be right censoring.
There is also left censoring, where the patient survives for less than the recorded time. A typical case is where the event is a relapse of cancer symptoms after treatment and patients are screened every 3 months. If a patient is found to have cancer symptoms at the second such screening, we would know that they had a relapse prior to 6 months after treatment (actually between 3 and 6 months). This is also called interval censoring.
In the rest of this website when we refer to censored data we mean right censored data.
Survival Function
Definition 1: When a random variable takes time values and the pdf f(t) represents the probability that the first failure occurs at time t, then the function S(t) = 1 − F(t), where F(t) is the cumulative probability function corresponding to f(t), is called the survival function (aka survivor function or reliability function), and indicates the probability of survival to at least time t (i.e. the probability that no failure has occurred before time t).
For our purposes, t is the elapsed time from the beginning point t0, such as diagnosis of Alzheimer’s until the death from the disease or from an electronic component being put into service and its burning out. We consider the time of the event (patient’s death or electronic component’s failure) as relative to the start time for that subject). Thus in a 10-year clinical trial, if a patient enters the study after 2 years and dies 3 years later, we will view the death as occurring at time t = 3 years.
Hazard Function
Definition 2: The hazard function (aka hazard rate, instantaneous death rate, force of mortality, or failure rate) is the function h(t) = f(t)/S(t), i.e. the conditional probability that the first failure occurs at time t given that no failure has occurred before time t.
The cumulative hazard function H(t) is the area under the curve y = h(x) from x = 0 to x = t. For those with a calculus background H(t) = .
Properties
The relationship between the survival function and the cumulative hazard function is given in Property 1.
Property 1:
S(t) = e–H(t)
or equivalently
H(t) = −ln S(t)
Some properties of the survival function are:
S(0) = 1 | no one starts off dead |
= 0 | everyone dies sometime |
if t < u then S(t) ≥ S(u) | once you are dead you stay dead |
References
Wikipedia (2015) Survival analysis
https://en.wikipedia.org/wiki/Survival_analysis#:~:text=Survival%20analysis%20is%20a%20branch,and%20failure%20in%20mechanical%20systems.
Clark et al. (2003) Survival analysis part I: basic concepts and first analyses
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2394262/
Sullivan, L. (2016) Survival analysis
https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_survival/BS704_Survival_print.html
Doc, good afternoon, how can I calculate the sample size for a survival analysis?
Hello Gerardo,
I have not yet included this in the Real Statistics website or software, but here is an online calculator:
https://sample-size.net/sample-size-survival-analysis/
Charles
Thak you so much Sr.
Based on your feedback, the limit on the second equation does not display properly ..
It shows the s(t) –> 1 and t–> infinity.
AB,
Yes, this is not stated corrected. I have just updated the equation.
Thanks for finding this mistake.
Charles
Hi
just wondering if the second equation should be lim s(t)=0 when s->infinity?
thanks
The limit is stated correctly, namely s(t) –> 0 as s –> infinity.
Charles