Chapter 4: Results
The purpose of this quantitative study was to determine if there is a significant difference between digital and nondigital breaches of individual patient records for each of the three types of U.S. health care entities. The examination of digital and nondigital breaches amongst the three health care entities is essential to both reduce the number of data breaches and ensure proper allocation of resources to achieve that end. The independent variables were the types of information security breaches and health care entities, while the dependent variable was the number of breached individual patient records. To examine the difference between variables, I used statistical analysis of group means to estimate the differences in individual patient records affected between digital and nondigital breaches of health data in the three types of health care entities.
I developed the following research questions and corresponding hypotheses to aid in the examination of the impact of digital and nondigital security breaches on individual patient records nondigitalfor each of the three types of U.S. health care entities:
RQ1: Is there a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers?
H01: There is no significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers.
Ha1: There is a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers.
RQ2: Is there a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health plan providers?
H02: There is no significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health plan providers.
Ha2: There is a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health plan providers.
RQ3: Is there a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care clearinghouses?
H03: There is no significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care clearinghouses.
Ha3: There is a significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care clearinghouses.
This chapter contains a restatement of the data collection procedures. In this chapter, I describe the samples and the results of the statistical tests conducted in this study. This chapter also includes the results of the hypothesis testing to address the research questions posed in the study. This chapter ends with a summary of the key results of the study.
Data Collection
I collected data for this study from the historical data available in the public database of HIPAA breach and violations (breach portal) maintained by the HHS. HHS’s breach portal is a publicly available database that can be accessed and downloaded online. No permission was necessary to access the data because it is freely available to anyone who wants to use it. HHS investigates the reports submitted by individual entities and posts the results of their findings in the data breach portal. The portal includes the following information: name of the breached entity, state the entity resides in, entity type, individuals affected, breach submission date, type of breach, location of breached information, if a business associate is present, and a text description of the incident. For this study, I obtained data from the HHS breach portal in the form of a Microsoft Excel document and truncated the unnecessary data (i.e., name of the entity, date of the breach, state the entity resides in, and if a business associate is present).
This study included reports beginning in 2010 to the most recent report available in the archival section of the breach portal, a total of 2,601 reports. The data did not include newer data because the breaches are still under investigation and the reports are thus subject to change. The data did not include data older than 2010 because that data may not be relevant to the modern day due to improvements in technology and data security. I used 100% of the total population within the stated timeframes.
I imported the data from the database to Microsoft Excel and recoded it into three separate sheets for health care providers, health plan providers, and health care clearing houses. I imported the resulting data into SPSS Version 25.0 and recoded them to numerically represent categorical variables. The type of breach was recoded as 2 for nondigital and 1 for digital breach. The individuals affected variable was considered a continuous variable.
Study Results
I gathered a total of 2,601 cases from 2010 to 2020 in database and included them in the study. The covered entity types have three categories: health care providers, health plan providers, and health care clearinghouses. The majority of the cases (n = 1,876, 72.13%) come under health care providers, followed by healthcare clearinghouses (n = 376, 14.45%). Finally, there are 349 cases (13.42%) covered by health plan providers.
Data on the types of the breach were also collected. The types of breach categories include nondigital and digital breaches. Digital breaches include hacking/IT incidents, while nondigital breaches include loss, theft, and improper disposal. Unauthorized disclosure breaches get reviewed manually to determine if the breach is digital or nondigital. Unauthorized disclosures on paper/film locations were considered as nondigital, while unauthorized disclosures on email, electronic medical records, network servers, and desktop computers were considered as digital breaches.
The results of frequencies and percentages showed that 69.59% of the cases were digital (n = 1,810) while 39.1% of the cases were nondigital (n = 791).
I performed the descriptive statistics of individuals affected by the breach. Because the individuals affected variable was continuous in nature, I used measures of central tendencies, such as the mean, standard deviation, and range values, to present the data. The minimum number of affected was 500 individuals, while the maximum number of affected was 78,800,000 individuals. The mean number of affected is 74,323 individuals (SD = 1,593,587).
For the first research question, I performed two samples t-test (i.e., individual samples t-test) analyses. The first set of hypotheses considered the type of breach as the independent variable, while the individuals affected in the health care providers was the dependent variable. The type of breach was recoded into dummy variables for digital and nondigital.
Levene’s test results for equality of variances has a significance of 0.002, based on which the hypothesis of equal variances was rejected in favor of the alternative hypothesis of unequal variances.
Table 1 shows the descriptive statistics of the original number of individuals affected in the health care provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 27,893.63, SD = 208,563.29) as compared to nondigital types of the breach (M = 8,917.79, SD = 57,863.41).
Table 1 Descriptive Statistics of the Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 1,398 | 27,893.63 | 208,563.29 | 18,712 | 37,075 |
Nondigital | 478 | 8,917.79 | 57,863.41 | 4,556 | 13,280 |
Based on this, I conducted an independent samples t test by considering the equal variances are not assumed. The results in Table 2 show that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 3.073, p = .002). Therefore, there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected by digital breaches and nondigital breaches for health care providers.
Table 2 Independent Samples t-Test Results of the Number of Individuals Affected Based on the Type of Breach for Raw Data
| t test for Equality of Means | 90% Confidence Interval of the Difference | |||||
| t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual saffected | 3.073 | 1,825.79 | 0.002 | 18,974.84 | 6,174.10 | 8,815.20 | 29,136.48 |
A histogram of the data helped to identify the skewness of the data. As shown in Figure 1, it is evident that the data are very highly skewed.
Figure 1 Histogram of Raw Data of Health Care Providers
Because the raw data are highly skewed with a very long tail, I decided to trim the upper tail by removing the top 10% of the individual values. The results of the Levene’s test for the equality of the variances of the raw data. The level of significance of Levene’s test is 0.000, based on which the hypothesis of equal variances was rejected. Table 3 shows the descriptive statistics of the original number of individuals affected in the health care provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 4,094.74, SD = 4,590.11) as compared to nondigital types of the breach (M = 2,435.91, SD = 3,053.12).
Table 3 Descriptive Statistics of the 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 1,230 | 4,094.74 | 4,590.11 | 3,879 | 4,310 |
Nondigital | 458 | 2,435.91 | 3,053.12 | 2,201 | 2,671 |
Based on the results, as shown in Table 4, I conducted an independent samples t test by considering that equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 8.568, p = .000). This also shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for health care providers. In Table 9, the 90% confidence interval for digital and nondigital are shown.
Table 4 Independent Samples t-Test Results of the Number of Individuals Affected Based on the Type of Breach for 10% Trimmed Raw Data with 90% Confidence Interval
| t test for Equality of Means | 90% Confidence Interval of the Difference | |||||
| t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual s affected | 8.568 | 1,226.82 | 0.000 | 1,658.83 | 193.60 | 1,279 | 2,038.66 |
While comparing this value with the raw data, the value of the 10% trimmed value is less skewed. I analyazed a histogram of the data to see the skewness of the data. As shown in Figure 2, the skewness is much smaller with the top 10% excluded in comparison to the original raw data.
Figure 2 Histogram When Top 10% of the Values are Excluded
In order to improve this further, the loge of raw data excluding the top 10% of the values is considered. Since the trimmed data are still highly skewed, I decided to consider loge transformation.
As shown in Table 5, the loge of the raw data excluding top 10% was analyzed for the equality of the variances. Levene’s test for equality of variances had a significance of 0.002, based on which the null hypothesis of equal variances was rejected.
Table 5 Independent Samples t-Test Results of the Number of Individuals Affected Based on the Type of Breach for Loge of 10% Trimmed Raw Data With 90% Confidence Interval
Levene's Test for Equality of Variances | | | t test for Equality of Means | 90% Confidence Interval of the Difference | |||||
| F | Sig. | t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual saffected | 8.568 | .000 | 8.204 | 984.1 14 | 0.000 | 0.41310 | 0.05035 | 0.31429 | 0.51191 |
Based on this, as shown in Table 6, the independent samples t-test was conducted by considering the equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 8.204, p = .000). This shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare providers.
Table 6 Descriptive Statistics of the Loge of 10% trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 1,230 | 7.7622 | 1.05 | 7.7130 | 7.8115 |
Nondigital | 458 | 7.3491 | 0.87 | 7.2821 | 7.4155 |
Table 6 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that for the loge scale, the number of individuals affected is higher for digital types of the breach (M = 7.7622, SD = 1.05) as compared to nondigital types of the breach (M = 7.3491, SD = 0.87). To bring this back to the original scale, the exponential of the mean values of digital and nondigital breaches in the healthcare entity was obtained and the results are displayed in Table 13. The results of the original highly skewed raw data for digital types of the breach (90% lower CI = 18712, 90% upper CI = 37075) as compared to nondigital types of the breach (90% lower CI = 4556, 90% upper CI = 13280). Whereas after the exponential of the loge of the 10% trimmed data for digital types of the breach (90% lower CI = 2237, 90% upper CI = 2469) as compared to nondigital types of the breach (90% lower CI = 1454, 90% upper CI = 1663).
Table 7 Descriptive Statistics of the Exponential of Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | | N | M | Lower 90% CI | Upper 90% CI |
Individuals affected | Digital | 1,230 | 2,350.07 | 2237.06 | 2,468.79 |
| Nondigital | 458 | 1,554.8 | 1454.03 | 1,662.54 |
In Table 7, the 90% confidence interval for digital and nondigital are shown. shown in Figure 3, there is skewness in the histogram with the loge of the top 10% excluded data. However, the skewness is lower on loge scale than the original scale.
Figure 3 Histogram of Loge of Top 10% Excluded Data
The last set of confidence intervals shown in Table 13 are the shortest confidence intervals, as they are based on the loge transformation which yields the best histograms shown in Figure 3. This graph still shows some skewness in the two histograms, but it is much smaller than those based on the raw data. In addition, the normal probability curve shows the typical patten for both the breaches. For the digital breaches, the average frequency is around 50, whereas the average frequency is at 25 for the nondigital breaches as shown in Figure 3.
In Table 8, the summary is shown with the difference based on the raw data, top 10% trimmed data, and the transformed data. These values come from the Tables 1, 3, and 6. It is evident that the trimming of the top 10% data and the transformation helped to make the data less skewed, as shown in Figure 3.
Table 8 Summary Table for Healthcare Providers
Type of Breach | | Digital | Nondigital |
Raw Data | Mean | 27,894 | 8,918 |
| 90% Lower Limit | 18,712 | 4,556 |
| 90% Upper Limit | 37,075 | 13,280 |
| Width of 90% CI | 18,363 | 8,724 |
10% Trimmed Data | Mean 90% Lower Limit | 4,095 3,879 | 2,436 2,201 |
| 90% Upper Limit | 4,310 | 2,671 |
| Width of 90% CI | 431 | 470 |
Transformed Data | Mean | 2,350 | 1,555 |
| 90% Lower Limit | 2,236 | 1,454 |
| 90% Upper Limit | 2,469 | 1,663 |
| Width of 90% CI | 233 | 209 |
As shown in Table 8, the width of the 90% confidence interval has narrowed considerably by transforming the data. As the significance of the tests shown in Tables 1, 3, and 6 is 0.000, we can reject the null hypothesis that there is no significant difference between the average number of individual patient records affected for digital breaches and nondigital breaches for healthcare providers.
For the second research question, two samples t-test (individual samples t-test) analyses were performed. The second set of hypotheses considered the type of breach as the independent variable while the individuals affected in the health plan providers as the dependent variable. The type of breach was recoded into dummy variables for digital and nondigital.
The results of Levene’s test for equality of variances has a significance of 0.016, based on which the hypothesis of equal variances was rejected in favor of the alternative hypothesis of unequal variances.
Table 9 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 616,066.07, SD = 6032878.13) as compared to nondigital types of the breach (M = 24,935.24, SD = 132296.62).
Table 9 Descriptive Statistics of the Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 176 | 616,066.07 | 6,032,878.13 | 1,135,904 | 1,368,036 |
Nondigital | 172 | 24,935.24 | 132,296.62 | 8,252 | 41,618 |
Based on this, an independent samples t-test was conducted by considering the equal variances are not assumed. The results in Table 10 showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 1.30, p = .195). Therefore, there is no sufficient evidence to reject the null hypothesis, which stated that there is significant difference between the average number of individual patient records affected for digital breach and nondigital breach for health plan providers.
Table 10 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for Raw Data
| t test for Equality of Means | 90% Confidence Interval of the Difference | |||||
| t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual s affected | 1.30 | 175.17 | 0.195 | 591,130.82 | 454,857.17 | -161,020 | 1,343,282 |
The histogram of the data helped to identify the skewness of the data. As shown in Figure 4, it is evident that the data are very highly skewed.
Figure 4 Histogram of Raw Data Health Plan Providers
Because the data are highly skewed with a very long tail, I decided to trim the upper tail by removing the top 10% of the individual values. The results of the Levene’s test for the equality of the variances of the raw data. The signidviance of Levene’s test is 0.000, based on which the hypothesis of equal variances was rejected.
Table 11 shows the descriptive statistics of the original number of individuals affected in the health plan provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 6458.78, SD = 8766.68) as compared to nondigital types of the breach (M = 4001.19, SD = 5386.52).
Table 11 Descriptive Statistics of the 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 152 | 6,458.78 | 8,766.68 | 5,282 | 7,636 |
Nondigital | 161 | 4,001.19 | 5,386.52 | 3,299 | 4,704 |
Based on this, as shown in Table 12, the independent samples t-test was conducted by considering that equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.968, p = .003). This shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for health plan providers. In Table 9, the 90% confidence interval for digital and nondigital are shown.
Table 12 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for 10% Trimmed Raw Data With 90% Confidence Interval
| t test for Equality of Means | 90% Confidence Interval of the Difference | |||||
| t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual s affected | 2,968 | 248.079 | 0.003 | 2,457.60 | 828.15 | 1,090 | 3,825 |
While comparing this value with the raw data, the value of the 10% trimmed value is less skewed. The histogram of the data was analyzed to see the skewness of the data. As shown in Figure 5, the skewness is much better with the top 10% excluded in comparison to the original raw data.
Figure 5 Histogram of Top 10% Excluded Data
In order to improve this further, the loge of raw data excluding the top 10% of the values is considered. Since the trimmed data are still highly skewed, I decided to consider loge transformation.
As shown in Table 13, the loge of the raw data excluding top 10% was analyzed for the equality of the variances. Levene’s test for equality of variances had a significance of 0.003, based on which the null hypothesis of equal variances was rejected.
Table 13 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for Loge of 10% Trimmed Raw Data With 90% Confidence Interval
Levene's Test for Equality of Variances | | t test for Equality of Means | 90% Confidence Interval of the Difference | ||||||
| F | sig | t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual s affected | 1.252 | .264 | 2.979 | 311 | 0.003 | 0.39061 | 0.13114 | 0.17425 | 0.60696 |
Based on this, as shown in Table 14, the independent samples t-test was conducted by considering the equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.979, p = 0.000). This shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare providers.
Table 14 Descriptive Statistics of the Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 152 | 8.0242 | 1.215 | 7.8611 | 8.1873 |
Nondigital | 161 | 7.6336 | 1.105 | 7.4895 | 7.7777 |
Table 14 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that for the loge scale the number of individuals affected is higher for digital types of the breach (M = 8.0242, SD = 1.215) as compared to nondigital types of the breach (M = 7.6336, SD = 1.105). To bring this back to the original scale, the exponential of the mean values of digital and nondigital breaches in the health plan provider entity was obtained and the results are displayed in Table 24. The results of the original highly skewed raw data for digital types of the breach (90% lower CI = -1135904, 90% upper CI = 1368036) as compared to nondigital types of the breach (90% lower CI = 8,252, 90% upper CI = 41,518). Whereas after the exponential of the loge of the 10% trimmed data for digital types of the breach (90% lower CI = 2592, 90% upper CI = 3605) as compared to nondigital types of the breach (90% lower CI = 1790, 90% upper CI = 2392).
Table 15 Descriptive Statistics of the Exponential of Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | Mean | Lower 90% CI | Upper 90% CI |
Digital | 152 | 3053.98 | 2592 | 3605 |
Nondigital | 161 | 2066.48 | 1790 | 2392 |
In Table 15, the 90% confidence interval for digital and nondigital are shown. As shown in Figure 6, there is skewness in the histogram with the loge of the top 10% excluded data. However, the skewness is lower on loge scale than the original scale.
Figure 6 Histogram of Loge of Top 10% Excluded Data
The last set of confidence intervals shown in Table 15 are the shortest confidence intervals, as they are based on the loge transformation which yields the best histograms shown in Figure 6. This graph still shows some skewness in the two histograms, but it is much smaller than those based on the raw data. In addition, the normal probability curve shows the typical pattern for both the breaches. For the digital breaches, the average frequency is around 10, whereas the average frequency is at 12 for the nondigital breaches as shown in Figure 6.
In Table 16, the summary is shown with the difference based on the raw data, top 10% trimmed data, and the transformed data. These values come from the Tables 9, 11, and 14. It is evident that the trimming of the top 10% data and the transformation helped to make the data less skewed, as shown in Figure 6.
Table 16 Summary Table for Health Plan Providers
Type of Breach | | Digital | Nondigital |
Raw Data | Mean | 616,066.07 | 24,935.24 |
| 90% Lower Limit | -1,135,904 | 8,252 |
| 90% Upper Limit | 1,368,036 | 41,618 |
| Width of 90% CI | 2,503,940 | 33,366 |
10% Trimmed Data | Mean 90% Lower Limit | 6,458.78 5,282 | 4,001.19 3,299 |
| 90% Upper Limit | 7,636 | 4,704 |
| Width of 90% CI | 2,354 | 1,405 |
Transformed Data | Mean | 3,053.98 | 2,066.48 |
| 90% Lower Limit | 2,592 | 1,790 |
| 90% Upper Limit | 3,605 | 2,392 |
| Width of 90% CI | 13 | 602 |
As shown in Table 16, the width of the 90% confidence interval has narrowed considerably by transforming the data. As the significance of the tests shown in Tables 9, 11, and 14 is 0.003, I reject the null hypothesis that there is no significant difference between the average number of individual patient records affected for digital breaches and nondigital breaches for health plan providers.
For the third research question, two samples t-test (individual samples t-test) analyses were performed. The first set of hypotheses considered the type of breach as the independent variable while the individuals affected in the healthcare clearinghouses as the dependent variable. The type of breach was recoded into dummy variables for digital and nondigital.
Levene’s test results for equality of variances has a significance of 0.05, based on which the hypothesis of equal variances was considered in favor of the alternative hypothesis of equal variances.
Table 17 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 27893.63, SD = 208563.29) as compared to nondigital types of the breach (M = 8917.79, SD = 57863.41).
Table 17 Descriptive Statistics of the Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 235 | 122204.93 | 586984.84 | 58972 | 185438 |
Nondigital | 141 | 60611.07 | 441603.22 | -968 | 122190 |
Based on this, an independent samples t-test was conducted by considering the equal variances are not assumed. The results in Table 28 showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 1.154, p = .249). Therefore, there is no sufficient evidence to reject the null hypothesis, which stated that there is significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare clearinghouses.
Table 18 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for Raw Data
| t test for Equality of Means | 90% Confidence Interval of the Difference | |||||
| t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual s affected | 1.154 | 355.29 | 0.249 | 61,593.86 | 53,378.36 | -26,435 | 149,623 |
The histogram of the data helped to identify the skewness of the data. As shown in Figure 7, it is evident that the data are very highly skewed.
Figure 7 Histogram of Raw Data of Health Care Clearing Houses
Because the raw data are highly skewed with a very long tail, I decided to trim the upper tail by removing the top 10% of the individual values. Table 29 shows the results of the Levene’s test for the equality of the variances of the raw data. The significance of Levene’s test is 0.000, based on which the hypothesis of equal variances was rejected.
Table 19 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that the number of individuals affected is higher for digital types of the breach (M = 6,986.52, SD = 9,023.08) as compared to nondigital types of the breach (M = 4,592.81, SD = 5,935.37).
Table 19 Descriptive Statistics of the 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 208 | 6,986.52 | 9,023.08 | 5,953 | 8,020 |
Nondigital | 130 | 4,592.81 | 5,935.37 | 3,730 | 5,455 |
Based on this, as shown in Table 20, the independent samples t-test was conducted by considering that equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.941, p = .003). This also shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare clearinghouses. In Table 29, the 90% confidence interval for digital and nondigital are shown.
Table 20 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for 10% Trimmed Raw Data With 90% Confidence Interval
| t test for Equality of Means | 90% Confidence Interval of the Difference | |||||
| t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual s affected | 2.941 | 335.103 | 0.003 | 1,658.83 | 193.60 | 1,279 | 2,038.66 |
While comparing this value with the raw data, the value of the 10% trimmed value is less skewed. The histogram of the data was analyzed to see the skewness of the data. As shown in Figure 8, the skewness is much better with the top 10% excluded in comparison to the original raw data.
Figure 8 Histogram of Top 10% Excluded Data
In order to improve this further, the loge of raw data excluding the top 10% of the values is considered. Since the trimmed data are still highly skewed, I decided to consider loge transformation.
As shown in Table 21, the loge of the raw data excluding top 10% was analyzed for the equaity of the variances. Levene’s test for equality of variances had a significance of 0.007, based on which the null hypothesis of equal variances was rejected.
Table 21 Independent Samples t-Test Result for the Number of Individuals Affected Based on the Type of Breach for Loge of 10% Trimmed Raw Data With 90% Confidence Interval
Levene's Test for Equality of Variances | | t test for Equality of Means | 90% Confidence Interval of the Difference | ||||||
| f | sig | t | df | Sig. (2- tailed ) | Mean Difference | Std. Error Difference | Lower | Upper |
Individual s affected | 45.310 | .181 | 2.726 | 289 | 0.007 | 0.35242 | 0.12930 | 0.13906 | 0.56578 |
Based on this, as shown in Table 22, the independent samples t-test was conducted by considering the equal variances are not assumed. The result showed that there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.726, p = .007). This shows that there is sufficient evidence to reject the null hypothesis, which stated that there is no significant difference between the average number of individual patient records affected for digital breach and nondigital breach for healthcare providers.
Table 22 Descriptive Statistics of the Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | SD | Lower 90% CI | Upper 90% CI |
Digital | 208 | 8.1367 | 1.21 | 7.9980 | 8.2754 |
Nondigital | 130 | 7.7843 | 1.12 | 7.6214 | 7.9472 |
Table 22 shows the descriptive statistics of the original number of individuals affected in the healthcare provider entity. The results showed that for the loge scale the number of individuals affected is higher for digital types of the breach (M = 8.1367, SD = 1.21) as compared to nondigital types of the breach (M = 7.7843, SD = 1.12). To bring this back to the original scale, the exponential of the mean values of digital and nondigital breaches in the healthcare entity was obtained and the results are displayed in Table 35. The results of the original highly skewed raw data for digital types of the breach (90% lower CI = 58972, 90% upper CI = 185438) as compared to nondigital types of the breach (90% lower CI = -968, 90% upper CI = 122190). Whereas after the exponential of the loge of the 10% trimmed data for digital types of the breach (90% lower CI = 2981, 90% upper CI = 3944) as compared to nondigital types of the breach (90% lower CI = 2039, 90% upper CI = 2836).
Table 23 Descriptive Statistics of the Exponential of Loge of 10% Trimmed Raw Number of Individuals Affected Based on the Type of Breach
Type of Breach | N | M | Lower 90% CI | Upper 90% CI |
Digital | 208 | 3417.62 | 2981 | 3944 |
Nondigital | 130 | 2402.58 | 2039 | 2836 |
In Table 23, the 90% confidence interval for digital and nondigital are shown.. As shown in Figure 9, there is skewness in the histogram with the loge of the top 10% excluded data. However, the skewness is lower on loge scale than the original scale.
Figure 9 Histogram of Loge of Top 10% Excluded Data
The last set of confidence intervals shown in Table 23 are the shortest confidence intervals, as they are based on the loge transformation, which yields the best histograms shown in Figure 9. This graph still shows some skewness in the two histograms, but it is much smaller than those based on the raw data. In addition, the normal probability curve shows the typical pattern for both breaches. For the digital breaches, the average frequency is around 12, whereas the average frequency is at 8 for the nondigital breaches as shown in Figure 9.
In Table 24, the summary is shown with the difference based on the raw data, top 10% trimmed data, and the transformed data. These values come from the Tables 17, 19, and 22. It is evident that the trimming of the top 10% data and the transformation helped to make the data less skewed, as shown in Figure 3.
Table 24 Summary Table for Healthcare Clearinghouses
Type of Breach | | Digital | Nondigital |
Raw Data | Mean | 122,205 | 60,611 |
| 90% Lower Limit | 58,972 | -968 |
| 90% Upper Limit | 185,438 | 122,190 |
| Width of 90% CI | 126,466 | 123,158 |
10% Trimmed Data | Mean 90% Lower Limit | 6,983 5,953 | 4,593 3,730 |
| 90% Upper Limit | 8,020 | 5,455 |
| Width of 90% CI | 2,067 | 1,725 |
Transformed Data | Mean | 3,418 | 2,403 |
| 90% Lower Limit | 2,981 | 2,039 |
| 90% Upper Limit | 3,944 | 2,836 |
| Width of 90% CI | 963 | 797 |
As shown in Table 36, the width of the 90% confidence interval has narrowed considerably by transforming the data. As the significance of the tests shown in Table 22 is 0.007, I reject the null hypothesis that there is no significant difference between the average number of individual patient records affected for digital breaches and nondigital breaches for healthcare clearinghouses.
Summary
The purpose of this quantitative study was to determine if there was a significant statistical difference between digital and nondigital breaches of individual patient records for each of the three types of healthcare entities in the United States. The examination of digital and nondigital breaches amongst the three healthcare entities is essential to both reducing the number of data breaches and ensuring proper allocation of resources to achieve that end. This study included reports beginning in 2010 to the most recent report available in the archival section of the breach portal, which was a total of 2,601 reports.
Most of the cases (n = 1876, 72.13%) were covered with healthcare providers, followed by healthcare clearinghouses (n = 376, 14.45%), and 349 cases (13.42%) were covered by health plan providers. When it comes to the healthcare providers, there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 8.204, p = .000). For the health plan providers, there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.979, p = .003). For the healthcare clearinghouses, there is a significant difference between the average number of individuals affected in digital and nondigital types of breaches (t = 2.726, p = .007).
Table of Contents
- Chapter 1 - Introduction to the Study
- Chapter 2 - Literature Review
- Chapter 3 - Research Method
- Chapter 4 - Results
- Chapter 5 - Discussion, Conclusions, and Recommendations
- References