Link Search Menu Expand Document
  1. Chapter 4: Results
    1. Data Collection
    2. Study Results
    3. Summary

Chapter 4: Results

The purpose of this quantitative study was to determine if there is a significant difference between digital and nondigital breaches of individual patient records for each of the three types of U.S. health care entities. The examination of digital and nondigital breaches amongst the three health care entities is essential to both reduce the number of data breaches and ensure proper allocation of resources to achieve that end. The independent variables were the types of information security breaches and health care entities, while the dependent variable was the number of breached individual patient records. To examine the difference between variables, I used statistical analysis of group means to estimate the differences in individual patient records affected between digital and nondigital breaches of health data in the three types of health care entities.

I developed the following research questions and corresponding hypotheses to aid in the examination of the impact of digital and nondigital security breaches on individual patient records nondigitalfor each of the three types of U.S. health care entities:

  • RQ1: Is there a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers?

    • H01: There is no significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers.

    • Ha1: There is a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers.

  • RQ2: Is there a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health plan providers?

    • H02: There is no significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health plan providers.

    • Ha2: There is a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health plan providers.

  • RQ3: Is there a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care clearinghouses?

    • H03: There is no significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care clearinghouses.

    • Ha3: There is a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care clearinghouses.

This chapter contains a restatement of the data collection procedures. In this chapter, I describe the samples and the results of the statistical tests conducted in this study. This chapter also includes the results of the hypothesis testing to address the research questions posed in the study. This chapter ends with a summary of the key results of the study.

Data Collection

I collected data for this study from the historical data available in the public database of HIPAA breach and violations (breach portal) maintained by the HHS. HHS’s breach portal is a publicly available database that can be accessed and downloaded online. No permission was necessary to access the data because it is freely available to anyone who wants to use it. HHS investigates the reports submitted by individual entities and posts the results of their findings in the data breach portal. The portal includes the following information: name of the breached entity, state the entity resides in, entity type, individuals affected, breach submission date, type of breach, location of breached information, if a business associate is present, and a text description of the incident. For this study, I obtained data from the HHS breach portal in the form of a Microsoft Excel document and truncated the unnecessary data (i.e., name of the entity, date of the breach, state the entity resides in, and if a business associate is present).

This study included reports beginning in 2010 to the most recent report available in the archival section of the breach portal, a total of 2,601 reports. The data did not include newer data because the breaches are still under investigation and the reports are thus subject to change. The data did not include data older than 2010 because that data may not be relevant to the modern day due to improvements in technology and data security. I used 100% of the total population within the stated timeframes.

I imported the data from the database to Microsoft Excel and recoded it into three separate sheets for health care providers, health plan providers, and health care clearing houses. I imported the resulting data into SPSS Version 25.0 and recoded them to numerically represent categorical variables. The type of breach was recoded as 2 for nondigital and 1 for digital breach. The individuals affected variable was considered a continuous variable.

Study Results

I gathered a total of 2,601 cases from 2010 to 2020 in database and included them in the study. The covered entity types have three categories: health care providers, health plan providers, and health care clearinghouses. The majority of the cases (n = 1,876, 72.13%) come under health care providers, followed by healthcare clearinghouses (n = 376, 14.45%). Finally, there are 349 cases (13.42%) covered by health plan providers.

Data on the types of the breach were also collected. The types of breach categories include nondigital and digital breaches. Digital breaches include hacking/IT incidents, while nondigital breaches include loss, theft, and improper disposal. Unauthorized disclosure breaches get reviewed manually to determine if the breach is digital or nondigital. Unauthorized disclosures on paper/film locations were considered as nondigital, while unauthorized disclosures on email, electronic medical records, network servers, and desktop computers were considered as digital breaches.

The results of frequencies and percentages showed that 69.59% of the cases were digital (n = 1,810) while 39.1% of the cases were nondigital (n = 791).

I performed the descriptive statistics of individuals affected by the breach. Because the individuals affected variable was continuous in nature, I used measures of central tendencies, such as the mean, standard deviation, and range values, to present the data. The minimum number of affected was 500 individuals, while the maximum number of affected was 78,800,000 individuals. The mean number of affected is 74,323 individuals (SD = 1,593,587).

For the first research question, I performed two samples t-test (i.e., individual samples t-test) analyses. The first set of hypotheses considered the type of breach as the independent variable, while the individuals affected in the health care providers was the dependent variable. The type of breach was recoded into dummy variables for digital and nondigital.

Levene’s test results for equality of variances has a significance of 0.002, based on which the hypothesis of equal variances was rejected in favor of the alternative hypothesis of unequal variances.

Table 1 shows the descriptive statistics of the original number of individuals affected in the health care provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 27,893.63, SD = 208,563.29) as compared to nondigital types of the breach (M = 8,917.79, SD = 57,863.41).

Table 1 Descriptive Statistics of the Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital 
1,398
27,893.63
208,563.29
18,712
37,075
Nondigital
478
8,917.79
57,863.41
4,556
13,280

Based on this, I conducted an independent samples t test by considering the equal variances are not assumed. The results in Table 2 show that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 3.073, p = .002). Therefore, there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers.

Table 2 Independent Samples t-Test Results of the Number of Individuals Affected Based on the Type of Breach for Raw Data



t test for Equality of Means

90% Confidence Interval of the Difference



t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual saffected
3.073
1,825.79
0.002
18,974.84
6,174.10
8,815.20
29,136.48

A histogram of the data helped to identify the skewness of the data. As shown in Figure 1, it is evident that the data are very highly skewed.

Figure 1 Histogram of Raw Data of Health Care Providers

Histogram of Raw Data of Health Care Providers

Because the raw data are highly skewed with a very long tail, I decided to trim the upper tail by removing the top 10% of the individual values. The results of the Levene’s test for the equality of the variances of the raw data. The level of significance of Levene’s test is 0.000, based on which the hypothesis of equal variances was rejected. Table 3 shows the descriptive statistics of the original number of individuals affected in the health care provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 4,094.74, SD = 4,590.11) as compared to nondigital types of the breach (M = 2,435.91, SD = 3,053.12).

Table 3 Descriptive Statistics of the 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital
1,230
4,094.74
4,590.11
3,879
4,310
Nondigital
458
2,435.91
3,053.12
2,201
2,671

Based on the results, as shown in Table 4, I conducted an independent samples t test by considering that equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 8.568, p = .000). This also shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for health care providers. In Table 9, the 90% confidence interval for digital and nondigital are shown.

Table 4 Independent Samples t-Test Results of the Number of Individuals Affected Based on the Type of Breach for 10% Trimmed Raw Data with 90% Confidence Interval



t test for Equality of Means

90% Confidence Interval of the Difference



t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual s affected
8.568
1,226.82
0.000
1,658.83
193.60
1,279
2,038.66

While comparing this value with the raw data, the value of the 10% trimmed value is less skewed. I analyazed a histogram of the data to see the skewness of the data. As shown in Figure 2, the skewness is much smaller with the top 10% excluded in comparison to the original raw data.

Figure 2 Histogram When Top 10% of the Values are Excluded

Histogram When Top 10% of the Values are Excluded

In order to improve this further, the loge of raw data excluding the top 10% of the values is considered. Since the trimmed data are still highly skewed, I decided to consider loge transformation.

As shown in Table 5, the loge of the raw data excluding top 10% was analyzed for the equality of the variances. Levene’s test for equality of variances had a significance of 0.002, based on which the null hypothesis of equal variances was rejected.

Table 5 Independent Samples t-Test Results of the Number of Individuals Affected Based on the Type of Breach for Loge of 10% Trimmed Raw Data With 90% Confidence Interval

Levene's Test for Equality of Variances





t test for Equality of Means

90% Confidence Interval of the Difference



F
Sig.
t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual saffected
8.568
.000
8.204
984.1 14
0.000
0.41310
0.05035
0.31429
0.51191

Based on this, as shown in Table 6, the independent samples t-test was conducted by considering the equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 8.204, p = .000). This shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare providers.

Table 6 Descriptive Statistics of the Loge of 10% trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital
1,230
7.7622
1.05
7.7130
7.8115
Nondigital
458
7.3491
0.87
7.2821
7.4155

Table 6 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that for the loge scale, the number of individuals affected is higher for digital types of the breach (M = 7.7622, SD = 1.05) as compared to nondigital types of the breach (M = 7.3491, SD = 0.87). To bring this back to the original scale, the exponential of the mean values of digital and nondigital breaches in the healthcare entity was obtained and the results are displayed in Table 13. The results of the original highly skewed raw data for digital types of the breach (90% lower CI = 18712, 90% upper CI = 37075) as compared to nondigital types of the breach (90% lower CI = 4556, 90% upper CI = 13280). Whereas after the exponential of the loge of the 10% trimmed data for digital types of the breach (90% lower CI = 2237, 90% upper CI = 2469) as compared to nondigital types of the breach (90% lower CI = 1454, 90% upper CI = 1663).

Table 7 Descriptive Statistics of the Exponential of Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach


N
M
Lower 90% CI
Upper 90% CI
Individuals affected
Digital
1,230
2,350.07
2237.06
2,468.79


Nondigital
458
1,554.8
1454.03
1,662.54

In Table 7, the 90% confidence interval for digital and nondigital are shown. shown in Figure 3, there is skewness in the histogram with the loge of the top 10% excluded data. However, the skewness is lower on loge scale than the original scale.

Figure 3 Histogram of Loge of Top 10% Excluded Data

Histogram of Loge of Top 10% Excluded Data

The last set of confidence intervals shown in Table 13 are the shortest confidence intervals, as they are based on the loge transformation which yields the best histograms shown in Figure 3. This graph still shows some skewness in the two histograms, but it is much smaller than those based on the raw data. In addition, the normal probability curve shows the typical patten for both the breaches. For the digital breaches, the average frequency is around 50, whereas the average frequency is at 25 for the nondigital breaches as shown in Figure 3.

In Table 8, the summary is shown with the difference based on the raw data, top 10% trimmed data, and the transformed data. These values come from the Tables 1, 3, and 6. It is evident that the trimming of the top 10% data and the transformation helped to make the data less skewed, as shown in Figure 3.

Table 8 Summary Table for Healthcare Providers

Type of Breach


Digital
 Nondigital
Raw Data
Mean
27,894
8,918


90% Lower Limit
18,712
4,556


90% Upper Limit
37,075
13,280
 
Width of 90% CI
18,363
8,724
10% Trimmed
Data

Mean
90% Lower Limit
4,095
3,879
2,436
2,201


90% Upper Limit
4,310
2,671
 
Width of 90% CI
431
470
Transformed Data
Mean
2,350
1,555


90% Lower Limit
2,236
1,454


90% Upper Limit
2,469
1,663

Width of 90% CI
233
209

As shown in Table 8, the width of the 90% confidence interval has narrowed considerably by transforming the data. As the significance of the tests shown in Tables 1, 3, and 6 is 0.000, we can reject the null hypothesis that there is no significant difference between the average number of individual patient records affected for digital breaches and nondigital breaches for healthcare providers.

For the second research question, two samples t-test (individual samples t-test) analyses were performed. The second set of hypotheses considered the type of breach as the independent variable while the individuals affected in the health plan providers as the dependent variable. The type of breach was recoded into dummy variables for digital and nondigital.

The results of Levene’s test for equality of variances has a significance of 0.016, based on which the hypothesis of equal variances was rejected in favor of the alternative hypothesis of unequal variances.

Table 9 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 616,066.07, SD = 6032878.13) as compared to nondigital types of the breach (M = 24,935.24, SD = 132296.62).

Table 9 Descriptive Statistics of the Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital
 176
 616,066.07
 6,032,878.13
1,135,904 1,368,036
Nondigital
172
24,935.24
132,296.62
8,252
41,618

Based on this, an independent samples t-test was conducted by considering the equal variances are not assumed. The results in Table 10 showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 1.30, p = .195). Therefore, there is no sufficient evidence to reject the null hypothesis, which stated that there is significant difference between the average number of individual patient records affected for digital breach and nondigital breach for health plan providers.

Table 10 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for Raw Data



t test for Equality of Means

90% Confidence Interval of the Difference



t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual s affected
1.30
175.17
0.195
591,130.82
454,857.17
-161,020
1,343,282

The histogram of the data helped to identify the skewness of the data. As shown in Figure 4, it is evident that the data are very highly skewed.

Figure 4 Histogram of Raw Data Health Plan Providers

Histogram of Raw Data Health Plan Providers

Because the data are highly skewed with a very long tail, I decided to trim the upper tail by removing the top 10% of the individual values. The results of the Levene’s test for the equality of the variances of the raw data. The signidviance of Levene’s test is 0.000, based on which the hypothesis of equal variances was rejected.

Table 11 shows the descriptive statistics of the original number of individuals affected in the health plan provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 6458.78, SD = 8766.68) as compared to nondigital types of the breach (M = 4001.19, SD = 5386.52).

Table 11 Descriptive Statistics of the 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital
152
6,458.78
8,766.68
5,282
7,636
Nondigital
161
4,001.19
5,386.52
3,299
4,704

Based on this, as shown in Table 12, the independent samples t-test was conducted by considering that equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.968, p = .003). This shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for health plan providers. In Table 9, the 90% confidence interval for digital and nondigital are shown.

Table 12 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for 10% Trimmed Raw Data With 90% Confidence Interval



t test for Equality of Means

90% Confidence Interval of the Difference



t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual s affected
2,968
248.079
0.003
2,457.60
828.15
1,090
3,825

While comparing this value with the raw data, the value of the 10% trimmed value is less skewed. The histogram of the data was analyzed to see the skewness of the data. As shown in Figure 5, the skewness is much better with the top 10% excluded in comparison to the original raw data.

Figure 5 Histogram of Top 10% Excluded Data

Histogram of Top 10% Excluded Data

In order to improve this further, the loge of raw data excluding the top 10% of the values is considered. Since the trimmed data are still highly skewed, I decided to consider loge transformation.

As shown in Table 13, the loge of the raw data excluding top 10% was analyzed for the equality of the variances. Levene’s test for equality of variances had a significance of 0.003, based on which the null hypothesis of equal variances was rejected.

Table 13 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for Loge of 10% Trimmed Raw Data With 90% Confidence Interval

Levene's Test for Equality of Variances



t test for Equality of Means

90% Confidence Interval of the Difference



Fsig
t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual s affected
1.252
.264
2.979
311
0.003
0.39061
0.13114
0.17425
0.60696

Based on this, as shown in Table 14, the independent samples t-test was conducted by considering the equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.979, p = 0.000). This shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare providers.

Table 14 Descriptive Statistics of the Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital
152
8.0242
1.215
7.8611
8.1873
Nondigital
161
7.6336
1.105
7.4895
7.7777

Table 14 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that for the loge scale the number of individuals affected is higher for digital types of the breach (M = 8.0242, SD = 1.215) as compared to nondigital types of the breach (M = 7.6336, SD = 1.105). To bring this back to the original scale, the exponential of the mean values of digital and nondigital breaches in the health plan provider entity was obtained and the results are displayed in Table 24. The results of the original highly skewed raw data for digital types of the breach (90% lower CI = -1135904, 90% upper CI = 1368036) as compared to nondigital types of the breach (90% lower CI = 8,252, 90% upper CI = 41,518). Whereas after the exponential of the loge of the 10% trimmed data for digital types of the breach (90% lower CI = 2592, 90% upper CI = 3605) as compared to nondigital types of the breach (90% lower CI = 1790, 90% upper CI = 2392).

Table 15 Descriptive Statistics of the Exponential of Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
MeanLower 90% CI
Upper 90% CI
Digital
152
3053.9825923605
Nondigital
161
2066.4817902392

In Table 15, the 90% confidence interval for digital and nondigital are shown. As shown in Figure 6, there is skewness in the histogram with the loge of the top 10% excluded data. However, the skewness is lower on loge scale than the original scale.

Figure 6 Histogram of Loge of Top 10% Excluded Data

Histogram of Loge of Top 10% Excluded Data

The last set of confidence intervals shown in Table 15 are the shortest confidence intervals, as they are based on the loge transformation which yields the best histograms shown in Figure 6. This graph still shows some skewness in the two histograms, but it is much smaller than those based on the raw data. In addition, the normal probability curve shows the typical pattern for both the breaches. For the digital breaches, the average frequency is around 10, whereas the average frequency is at 12 for the nondigital breaches as shown in Figure 6.

In Table 16, the summary is shown with the difference based on the raw data, top 10% trimmed data, and the transformed data. These values come from the Tables 9, 11, and 14. It is evident that the trimming of the top 10% data and the transformation helped to make the data less skewed, as shown in Figure 6.

Table 16 Summary Table for Health Plan Providers

Type of Breach


Digital 
 Nondigital 
Raw Data
Mean 
616,066.07 
24,935.24 


90% Lower 
Limit 
-1,135,904 
8,252 


90% Upper Limit 
1,368,036 
41,618 
 
Width of 90% CI 
2,503,940 
33,366 
10% Trimmed
Data
 
Mean 
90% Lower 
Limit 
6,458.78 
5,282 
4,001.19 
3,299 


90% Upper Limit 
7,636 
4,704 
 
Width of 90% CI 
2,354 
1,405 
Transformed Data 
Mean 
3,053.98 
2,066.48 


90% Lower 
Limit 
2,592 
1,790 


90% Upper Limit 
3,605 
2,392 
 
Width of 90% CI 
13 
602

As shown in Table 16, the width of the 90% confidence interval has narrowed considerably by transforming the data. As the significance of the tests shown in Tables 9, 11, and 14 is 0.003, I reject the null hypothesis that there is no significant difference between the average number of individual patient records affected for digital breaches and nondigital breaches for health plan providers.

For the third research question, two samples t-test (individual samples t-test) analyses were performed. The first set of hypotheses considered the type of breach as the independent variable while the individuals affected in the healthcare clearinghouses as the dependent variable. The type of breach was recoded into dummy variables for digital and nondigital.

Levene’s test results for equality of variances has a significance of 0.05, based on which the hypothesis of equal variances was considered in favor of the alternative hypothesis of equal variances.

Table 17 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 27893.63, SD = 208563.29) as compared to nondigital types of the breach (M = 8917.79, SD = 57863.41).

Table 17 Descriptive Statistics of the Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital
235
122204.93
586984.84
58972
185438
Nondigital
141
60611.07
441603.22
-968
122190

Based on this, an independent samples t-test was conducted by considering the equal variances are not assumed. The results in Table 28 showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 1.154, p = .249). Therefore, there is no sufficient evidence to reject the null hypothesis, which stated that there is significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare clearinghouses.

Table 18 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for Raw Data



t test for Equality of Means

90% Confidence Interval of the Difference



t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual s affected
1.154
355.29
0.249
61,593.86
53,378.36
-26,435
149,623

The histogram of the data helped to identify the skewness of the data. As shown in Figure 7, it is evident that the data are very highly skewed.

Figure 7 Histogram of Raw Data of Health Care Clearing Houses

Histogram of Raw Data of Health Care Clearing Houses

Because the raw data are highly skewed with a very long tail, I decided to trim the upper tail by removing the top 10% of the individual values. Table 29 shows the results of the Levene’s test for the equality of the variances of the raw data. The significance of Levene’s test is 0.000, based on which the hypothesis of equal variances was rejected.

Table 19 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 6,986.52, SD = 9,023.08) as compared to nondigital types of the breach (M = 4,592.81, SD = 5,935.37).

Table 19 Descriptive Statistics of the 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital 
208
6,986.52
9,023.08
5,953
8,020
Nondigital
130
4,592.81
5,935.37
3,730
5,455

Based on this, as shown in Table 20, the independent samples t-test was conducted by considering that equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.941, p = .003). This also shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare clearinghouses. In Table 29, the 90% confidence interval for digital and nondigital are shown.

Table 20 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for 10% Trimmed Raw Data With 90% Confidence Interval



t test for Equality of Means

90% Confidence Interval of the Difference



t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual s affected
2.941
335.103
0.003
1,658.83
193.60
1,279
2,038.66

While comparing this value with the raw data, the value of the 10% trimmed value is less skewed. The histogram of the data was analyzed to see the skewness of the data. As shown in Figure 8, the skewness is much better with the top 10% excluded in comparison to the original raw data.

Figure 8 Histogram of Top 10% Excluded Data

Histogram of Top 10% Excluded Data

In order to improve this further, the loge of raw data excluding the top 10% of the values is considered. Since the trimmed data are still highly skewed, I decided to consider loge transformation.

As shown in Table 21, the loge of the raw data excluding top 10% was analyzed for the equaity of the variances. Levene’s test for equality of variances had a significance of 0.007, based on which the null hypothesis of equal variances was rejected.

Table 21 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for Loge of 10% Trimmed Raw Data With 90% Confidence Interval

Levene's Test for Equality of Variances



t test for Equality of Means

90% Confidence Interval of the Difference



f
sig
t
df
Sig. (2- tailed )
Mean Difference
Std. Error Difference
Lower
Upper
Individual s affected
45.310
.181
2.726
289
0.007
0.35242
0.12930
0.13906
0.56578

Based on this, as shown in Table 22, the independent samples t-test was conducted by considering the equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.726, p = .007). This shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare providers.

Table 22 Descriptive Statistics of the Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
SD
Lower 90% CI
Upper 90% CI
Digital 
208
8.1367
1.21
7.9980
8.2754
Nondigital
130
7.7843
1.12
7.6214
7.9472

Table 22 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that for the loge scale the number of individuals affected is higher for digital types of the breach (M = 8.1367, SD = 1.21) as compared to nondigital types of the breach (M = 7.7843, SD = 1.12). To bring this back to the original scale, the exponential of the mean values of digital and nondigital breaches in the healthcare entity was obtained and the results are displayed in Table 35. The results of the original highly skewed raw data for digital types of the breach (90% lower CI = 58972, 90% upper CI = 185438) as compared to nondigital types of the breach (90% lower CI = -968, 90% upper CI = 122190). Whereas after the exponential of the loge of the 10% trimmed data for digital types of the breach (90% lower CI = 2981, 90% upper CI = 3944) as compared to nondigital types of the breach (90% lower CI = 2039, 90% upper CI = 2836).

Table 23 Descriptive Statistics of the Exponential of Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach

Type of Breach
N
M
Lower 90% CI
Upper 90% CI
Digital 
208
3417.62
2981
3944
Nondigital
130
2402.58
2039
2836

In Table 23, the 90% confidence interval for digital and nondigital are shown.. As shown in Figure 9, there is skewness in the histogram with the loge of the top 10% excluded data. However, the skewness is lower on loge scale than the original scale.

Figure 9 Histogram of Loge of Top 10% Excluded Data

Histogram of Loge of Top 10% Excluded Data

The last set of confidence intervals shown in Table 23 are the shortest confidence intervals, as they are based on the loge transformation, which yields the best histograms shown in Figure 9. This graph still shows some skewness in the two histograms, but it is much smaller than those based on the raw data. In addition, the normal probability curve shows the typical pattern for both breaches. For the digital breaches, the average frequency is around 12, whereas the average frequency is at 8 for the nondigital breaches as shown in Figure 9.

In Table 24, the summary is shown with the difference based on the raw data, top 10% trimmed data, and the transformed data. These values come from the Tables 17, 19, and 22. It is evident that the trimming of the top 10% data and the transformation helped to make the data less skewed, as shown in Figure 3.

Table 24 Summary Table for Healthcare Clearinghouses

Type of Breach


Digital 
 Nondigital 
Raw Data
Mean 
122,205 
60,611 


90% Lower Limit 
58,972 
-968 


90% Upper Limit 
185,438 
122,190 
 
Width of 90% CI 
126,466 
123,158 
10% Trimmed
Data

Mean 
90% Lower Limit 
6,983 
5,953 
4,593 
3,730 


90% Upper Limit 
8,020 
5,455 
 
Width of 90% CI 
2,067 
1,725 
Transformed Data
Mean 
3,418 
2,403 


90% Lower Limit 
2,981 
2,039 


90% Upper Limit 
3,944 
2,836 
 
Width of 90% CI 
963 
797

As shown in Table 36, the width of the 90% confidence interval has narrowed considerably by transforming the data. As the significance of the tests shown in Table 22 is 0.007, I reject the null hypothesis that there is no significant difference between the average number of individual patient records affected for digital breaches and nondigital breaches for healthcare clearinghouses.

Summary

The purpose of this quantitative study was to determine if there was a significant statistical difference between digital and nondigital breaches of individual patient records for each of the three types of healthcare entities in the United States. The examination of digital and nondigital breaches amongst the three healthcare entities is essential to both reducing the number of data breaches and ensuring proper allocation of resources to achieve that end. This study included reports beginning in 2010 to the most recent report available in the archival section of the breach portal, which was a total of 2,601 reports.

Most of the cases (n = 1876, 72.13%) were covered with healthcare providers, followed by healthcare clearinghouses (n = 376, 14.45%), and 349 cases (13.42%) were covered by health plan providers. When it comes to the healthcare providers, there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 8.204, p = .000). For the health plan providers, there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.979, p = .003). For the healthcare clearinghouses, there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.726, p = .007).


Table of Contents