Chapter 1 – Introduction
Chapter 2 – Description
Chapter 3 – Evaluation
Chapter 4 – Summary, Conclustions, Recommendations

CHAPTER 3 EVALUATION OF THE PROJECT

Evaluating the Program Development

Overview

The first hypothesis was that a suitable program would be developed within the time frames of the proposal. The acceptance of the first hypothesis was based on the development described in chapter 2 and on the securing of and development of the four measures described below: (1) a review by a member of the director's doctor of ministry committee; (2) a review by an expert in criminal justice ministry; (3) the documents of the program; and (4) five evaluation instruments.

Review by Committee Member

The director worked with Alan Jackson who was a member of the director's doctor of ministry committee. With prior approval, the director sent drafts of the program lesson plans and handouts to Jackson for review prior to implementation: a copy of the letter that accompanied the drafts was placed in appendix 1, marked item #1. After receiving and reviewing the drafts, Jackson gave approval by phone on 16 July 1996.

Review by Expert in Criminal Justice Ministry

The director worked with Vance Drum, senior chaplain at the Eastham State Prison in Lovelady, Texas. With prior approval, the director sent drafts of the program lesson plans and handouts to Drum for review prior to implementation. The director asked Drum for a written response that also included Drum's qualifications as an expert in criminal justice ministry. On 17 July 1996, Drum responded with a letter containing an evaluation of the program sessions. Drum's letter of response was placed in appendix 1, marked item #2.

The Documents of the Program

The program lesson plan drafts that were sent to Jackson and Drum were finalized. With the finalizing prior to implementation, the lesson plans and overheads themselves became the third measure of the validation of the first hypothesis, as the lesson plans and overheads represented the essence of the program that was given to the men in the experimental group. The lesson plans were placed in appendix 2, and the overheads were placed in appendix 3.

Five Evaluation Instruments

The fourth measure of the first hypothesis was finding and developing the evaluation instruments used throughout the program. Five instruments were used: two validated questionnaires were used, and three other questionnaires were developed by the director specially suited to measure various portions of the program.

The two validated instruments were selected prior to program implementation: one, Stokes and Lautenschlater's Counselor Response Questionnaire (CRQ)[97] was used in its entirety; the other, part of Carkhuff's "Responding: Knowledge and Skills Assessment" was used in part, but the title was changed to Responding Questionnaire (RQ).[98] Both of these were used as pretests and posttests, and both were approved by director's committee chairman as suitable assessment instruments prior to being used.

In addition to the committee chairman's validation of the CRQ and the RQ, other validations were considered of these instruments. Professional validations of the CRQ were placed in the background information at the beginning of appendix 5 under the sub-heading: "validation studies." The RQ assessment was considered validated because of the repeated publication of Carkhuff's model for training in helping skills, and this assumption was approved by the director's committee chairman prior to implementation. The RQ and background information related to the RQ were placed in appendix 6.

The director developed three instruments to aid in data collection during the various stages of the program implementation: one, a Preprogram Background Questionnaire (PBQ); two, a Postprogram Interview Questionnaire (PIQ); and three, a Postprogram Helpee Follow-up Questionnaire (PHFQ). The PBQ was used to gather some sociological data to help divide the experimental and control groups, and that was placed in appendix 4. The PIQ was used to gather data from the experimental group after the last program session and in a one-on-one setting, and that was placed in appendix 9. The PHFQ was used to gather data from the men in the Christian congregation who had been the recipients of the participants' helping efforts, and that was placed in appendix 10. All three of the instruments were developed and submitted to the director's committee chairman prior to implementation, and all three were approved by director's committee chairman as suitable assessment instruments prior usage.

Evaluating the Program Enlistment

Overview

The acceptance of the second hypothesis was based upon two factors. The first was the enlistment described in chapter 2, and the second was the determination and development of three measures described below: (1) the effect of the advertisements and announcements; (2) the experimental and control group rosters and worksheets; and (3) the posttesting of the control group.

Effect of Advertisement and Announcements

From the advertisement and announcements, sixty-seven men were nominated to participate. After the nominees were screened and invitations were given to the sixty-seven men, all of them showed up for the pretesting stage of the project. When sixty-seven men had been enlisted and had arrived for the pretesting, this arrival indicated that part of the second hypothesis was fulfilled.

Experimental and Control Group Worksheets and Rosters

The director developed three data collection instruments. Two of the instruments were worksheets used to record the data from the CRQ and RQ pretesting and posttesting of both groups. The third instrument was a basic attendance roster developed to chronicle the attendance of the experimental and control groups. Copies of the CRQ and RQ data collection worksheets were placed in appendix 11, marked respectively as items #1 and #2. A copy of the attendance roster was placed in appendix 12, marked item #2.

After the pretesting, the CRQ and RQ scores of each of the sixty-seven men were placed on the data collection worksheets bearing the name of the participants. That was done for both the experimental and control groups.

As attendance was kept throughout the program sessions of the experimental group, twenty-seven of the participants stayed with the program. The basic attendance roster for the experimental group indicated who attended and who was absent. The attendance on the roster was reflected in the pastoral observations and reflections collected in appendix 8 and that were summarized above in chapter 2 under the subsection, summarization of daily lessons.

At the end of the administration of the program sessions to the experimental group, the posttests were given to the experimental group. The posttest scores from the CRQ and RQ were placed on the data collection worksheets of each individual man.[99]

A separate list was maintained of the control group. After the experimental group was given the program and the posttests, the control group was recalled on the following Saturday, 28 September 1996. The control group attendance roster indicated that six men were absent, and a follow-up indicated that the six men had moved from the prison and were no longer available to participate. The remaining twenty-eight men were given the posttests. When twenty-eight men in the original control group showed up to complete posttesting, that arrival indicated that part of the second hypothesis was fulfilled with respect to the two groups' attendance throughout the implementation of the program.

Posttesting the Control Group

When the control group was recalled on 28 September 1996, they were given the CRQ and RQ as posttests. The posttesting of the control group was the last phase of the program that involved the experimental and control group participants.

Therefore, the second hypothesis was fulfilled in three phases: when the affect of advertising drew sixty-seven men, when the experimental group and the control group attendance rosters and worksheets indicated attendance, and with the administration of the CRQ and RQ as posttests. Twenty-seven men in the experimental group and twenty-eight men in the control group stayed with the program from beginning to end. That number of men with the data collected was deemed sufficient to justify an evaluation.

Three reasons were found to accept the second hypothesis. The first reason was the enlistment described in chapter 2. The second was the general effect of advertisement. The third reason was that the attendance rosters, worksheets, and posttesting indicated that fifty-five men had remained with the entire program. Therefore, the second hypothesis was accepted in that the men remained with the program.

Evaluating the Program Implementation

Overview

The third hypothesis was that the program would increase the inmate's ability to use several helping skills. Six methods of the evaluation of the program implementation indicated the accomplishment of the third hypothesis. The six methods were divided into two parts: (1) two professional evaluations; and (2) four statistical evaluations.

The professional evaluations included: (1) an evaluation of a program session by a professional chaplain, and (2) the project director's pastoral observations and reflections. After discussing how the final scores were adjusted to compensate for absentees, the four statistical evaluations included analyses of four program instruments: (1) the counselor response questionnaire statistical analysis, (2) the responding questionnaire statistical analysis, (3) the postprogram interview questionnaire analysis, and (4) the postprogram helpee follow-up questionnaire analysis.

Professional Evaluations

Professional Chaplain

Alex Taylor sat in on the seventh and last program session. His evaluation indicated that part of the third hypothesis was fulfilled in that a program had been implemented that improved the inmate's ability to use some helping skills.

Taylor was the regional chaplain for the Texas Department of Criminal Justice. He was asked to view a program session and write an evaluation based upon the session's objective and upon his experience. The director asked if Taylor would respond with a letter offering his evaluation and outlining his qualifications as an expert in criminal justice ministry.

Taylor arrived on 21 September 1996. The director gave Taylor a copy of the program lesson plans for that day and a copy of the handouts that were given to the men that day. Several days after the program, the director received from Taylor a letter and evaluation on 10 October 1996, and that letter and evaluation were placed in appendix 1, marked item #3.

Pastoral Observations and Reflections on Implementation

After each daily session of the program, the director took notes on his observations and reflections on various aspects of the program and about the responses of the men in the experimental group. The pastoral observations and reflections indicated that part of the third hypothesis was fulfilled in the chronicle of the men's participation and growth throughout the program. The observations and reflections were placed in appendix 8.

The observations and reflections detailed how the presentation of the lesson plans and overheads affected the participants in the program and as well as the director. Some of the aspects observed and reflected upon were how the director presented various parts of the program, his feelings about the presentation, how the men in general responded to various parts of the program, and the unexpected responses or distractions that arose in the program. The sum of the director's observations and reflections indicated that the men not only learned some empathy skills but that they enjoyed the whole process and wished that the program could continue so that they could continue to refine their empathic skills.

Adjusting the Pretest and Posttest CRQ Scores

As was seen in chapter 2, the Preprogram Background Questionnaire (PBQ) and the Counselor Response Questionnaire (CRQ) were used in the determination of the experimental and control groups from among the sixty-seven men. Thirty-four men were placed in the control group, and thirty-three men were placed in the experimental group.

After the posttesting of both groups was finished, the director recorded that several men from both groups did not remain to finish the posttesting. During the program, six men dropped out of the experimental group for various reasons. The men who dropped out were from a variety of sociological categories, and their CRQ scores were dropped from the experimental group's preprogram statistical calculations: 1 YBA (25), 1 NBNA (14), 1 WNA (21), 2 NWA (38, 32), and 1 YHNA (23).[100]

After seven weeks, six men had left the prison who had been in the control group. Those men were from a variety of sociological categories, and their CRQ scores were dropped from the control group's preprogram statistical calculations: 1 YBA (32), 1 YWNA (34), 1 YHA (29), 1 NBNA (24), 1 NBA (21), and 1 NWNA (28).[101]

Since twenty-seven men in the experimental group had finished the program and twenty-eight men in the control group had completed the CRQ posttesting, one other man's score was deleted from the control group to allow both groups the same number of observations. The score chosen was a midrange score from the group of black-aggravated men who had had no regular visits during the month: NBA (28). The midrange score was chosen for two reason: (1) because of the leptokurtic distribution of the scores in both groups, and (2) because the "aggravated" time being served was represented by the largest number of men. Thereby, the removal of the "NBA" midrange score was perceived to have the least effect on the overall distribution. With the last removal, twenty-seven men remained in each group as was reported below in table 4.

Table 4.--Adjusted Preprogram CRQ Scores

X₁: Adjusted Preprogram Experimental Group CRQ Scores Categorized

"Yes" "No"

black "na" 32, 23 black "na" 24

black "a" 37, 30, 17 black "a" 41, 31, 27, 25, 21, 20, 9

white "na" 34 white "na" 41, 28, 25,

white "a" 38, 35, 21 white "a" 29, 24

Hispanic "na" 25 Hispanic "na" 30

Hispanic "a" 21 Hispanic "a" 26, 14

X₃: Adjusted Preprogram Control Group CRQ Scores Categorized

"Yes" "No"

black "na" 32 black "na" 24

black "a" 37, 29 black "a" 40, 33, 28, 24, 24, 20, 16

white "na" 29 white "na" 27, 26

white "a" 42, 38, 35, 24 white "a" 40, 28, 24

Hispanic "na" 23 Hispanic "na" 30, 24

Hispanic "a" 11 Hispanic "a" 44, 27

Statistics on the adjusted scores were calculated. They were reported in table 5.

Table 5.--Adjusted Preprogram CRQ Statistics

X₁ X₃

Range = 9.0 - 41.0 11.0 - 44.0

Mode = 21.0, 25.0 24.0

Median = 26.0 27.0

= 26.962963 28.851852

X = 728.0 779.0

X² = 21,216.0 24,097.0

= 7.666577 7.749330

² = 58.776406 60.052126

g₁ = -0.087179 0.058473

g₂ = 2.778344 2.73434

The statistics of X₁ and X₃ indicated a more equal distribution of scores than was indicated by the preadjusted scores during the enlistment phase.[102] Given the sociological data and the distribution of CRQ scores, the two groups were considered matched evenly enough for the purposes of the program.

Counselor Response Questionnaire Statistical Analysis

Overview

The Counselor Response Questionnaire (CRQ) was the first pretest and posttest administered to both the experimental and control groups. The CRQ was designed to measure the participants' level of skill in the use of empathic skills. The two groups of twenty-seven men each‑‑determined above‑‑were used in the following statistical analysis. The highest possible score was fifty.

The statistical analysis was divided into three parts: (1) measures of central tendency and variability, (2) measures of frequency, and (3) three t-test calculations. All measures indicated an accomplishment of the third hypothesis in that the men in the experimental group improved in their use of empathic skills.

Measures of Central Tendency and Variability

After the end of the program, the experimental and control groups were given the CRQ again as a posttest. The tabulation and statistics on the pretest and posttest scores were reported below in table 6.

Table 6.--Adjusted Pretest and Posttest CRQ Statistics

Experimental Group Control Group

X₁ X₂ X₃ X₄

1. 21 24 24 27

2. 21 30 29 32

3. 25 25 28 31

4. 20 15 28 37

5. 17 38 24 23

6. 26 44 38 42

7. 34 44 29 24

8. 9 31 27 29

9. 35 40 24 28

10. 29 47 40 23

11. 25 29 35 39

12. 31 38 30 36

13. 30 43 42 41

14. 28 28 23 26

15. 41 47 33 21

16. 24 31 26 22

17. 24 40 20 27

18. 25 25 24 26

19. 21 29 37 41

20. 30 31 40 26

21. 23 37 16 17

22. 41 47 27 24

23. 32 39 11 21

24. 38 47 44 42

25. 37 43 24 24

26. 27 39 32 25

27. 14 41 24 32

X₁ = experimental group pretest scores X₃ = control group pretest scores

X₂ = experimental group posttest scores X₄ = control group posttest scores

X₁ X₂ X₃ X₄

Range = 9.0 - 41.0 15.0 - 47.0 11.0 - 44.0 11.0 - 44.0

Mode = 21.0, 25.0 47.0 24.0 26.0

Median = 26.0 38.0 27.0 28.0

= 26.962963 36.0 28.851852 29.111111

X = 728.0 972.0 779.0 786.0

X² = 21,216.0 36,886.0 24,097.0 24,262.0

= 7.666577 8.375449 7.749330 7.150930

² = 58.776406 70.148148 60.052126 51.135802

g₁ = -0.087179 -0.509484 0.058473 0.053101

g₂ = 2.778344 2.479727 2.73434 2.112768

In table 6 above, the results indicated a statistically significant improvement in the CRQ scores of the experimental group over the control group. The highest score obtainable was fifty. The modes, medians, and means of X₁, X₃, and X₄ indicated close similarity and contrasted enough with X₂ to indicate a significant improvement in overall skill level in the experimental group. The sums of the scores and the sums of the squares of X₁, X₃, and X₄ were similar and also contrasted enough with X₂ to indicate significant improvement. The measures of variability represented in the variance and standard deviation of X₂ were only a little higher than X₁, X₃, and X₄. When the measures of variability of X₂ were compared with the measures of skewness and kurtosis for all four variables, the comparison indicated that the whole distribution of X₂ scores was significantly higher than the scores X₁, X₃, and X₄. These indicated that the third hypothesis was accomplished.

Measures of Frequency

The difference between the pretest and posttest scores of the experimental and control groups was made more clear through a calculation of the frequency and percentages of the top ten scores between the two groups. The frequency and percentages were reported below in table 7.

Table 7.--Frequency Analysis of Top Ten CRQ Scores

Pretest Frequency Analysis

Experimental Group Control Group

Score Freq. Percent Score Freq. Percent

25 3 11.11 24 6 22.22

21 3 11.11 40 2 7.41

41 2 7.41 29 2 7.41

30 2 7.41 28 2 7.41

24 2 7.41 27 2 7.41

38 1 3.70 44 1 3.70

37 1 3.70 42 1 3.70

35 1 3.70 38 1 3.70

34 1 3.70 37 1 3.70

32 1 3.70 35 1 3.70

Posttest Frequency Analysis

Experimental Group Control Group

Score Freq. Percent Score Freq. Percent

47 4 14.81 26 3 11.11

31 3 11.11 24 3 11.11

44 2 7.41 42 2 7.41

43 2 7.41 41 2 7.41

40 2 7.41 32 2 7.41

39 2 7.41 27 2 7.41

38 2 7.41 23 2 7.41

29 2 7.41 21 2 7.41

25 2 7.41 39 1 3.70

41 1 3.70 37 1 3.70

From the above frequency analysis, the experimental group did significantly better than did the control group on the CRQ posttests.

Three t-Test Calculations

The basic statistics for the three t-test were calculated. Those statistics were reported below in table 8.

Table 8.--Analysis of CRQ Deviations

Experimental Group Control Group

X₂ - X₁ = d₁ X₄ - X₃= d₂

1. 24 21 3 27 24 3

2. 30 21 9 32 29 3

3. 25 25 0 31 28 3

4. 15 20 -5 37 28 9

5. 38 17 21 23 24 -1

6. 44 26 16 42 38 2

7. 44 34 10 24 29 -5

8. 31 9 22 29 27 2

9. 40 35 5 28 24 4

10. 47 29 18 23 40 -17

11. 29 25 4 39 35 4

12. 38 31 7 36 30 6

13. 43 30 13 41 42 -1

14. 28 28 0 26 23 3

15. 47 41 6 21 33 -12

16. 31 24 7 22 26 -4

17. 40 24 16 27 20 7

18. 25 25 0 26 24 2

19. 29 21 8 41 37 4

20. 31 30 1 26 40 -14

21. 37 23 14 17 16 1

22. 47 41 6 24 27 -3

23. 39 32 5 21 11 10

24. 47 38 9 42 44 -2

25. 43 37 6 24 24 0

26. 39 27 12 25 32 -7

27. 41 14 27 32 24 8

Experimental Group Control Group

= 8.8888889 = 0.185185

d₁ = 240.0 d₂ = 5.0

d₁² = 3,632.0 d₂² = 1,161.0

SSd₁ = 1,498.6667 SSd₂ = 1,159.92

= 53.805211 = 42.965707

= 7.450246 = 6.554823

= 1.433800 = 1.261476

g₁ = 0.526499 g₁ = -0.982848

g₂ = 2.813055 g₂ = 3.548039

By comparing the means, sums, and sums of squares of d₁ and d₂ in table 8, a sharp contrast became evident even before the t-test calculations. Though the skewness and kurtosis were more contrasting than before, both distributions were still similarly leptokurtic. By comparing the measures of variance, standard deviation, and standard error with the skewness and kurtosis, once again, the comparison indicated that the whole distribution of scores was higher in the experimental group. The deviations indicated a very large and significant statistical improvement in the experimental group scores.

The calculations in table 8 were used to perform three t-tests on the deviations. The three tests were: (1) a one-tailed t-test on the deviations between pretest and posttest scores of the experimental group (as denoted above, d₁ = X₂ - X₁); (2) a two-tailed t-test on the deviations between pretest and posttest scores of the control group (as denoted above, d₂ = X₄ - X₃); and (3) an independent groups t-test on the deviations between d₁ and d₂.[103] The null and alternative hypotheses for each projected t-test and the t-test results according to the standard critical values of t were reported below in table 9.

Table 9.--CRQ t-Test Analyses[104]

One-tailed or directional t-test on the deviations between pretest and posttest scores of the experimental group seen in d₁ in table 8

H_o: µ_d1 0 alpha level .05 with 26df gave a critical value of 1.706

H_a: µ_d1 0 a t = 6.0343956 was found with p < .0005

H_o was rejected and H_a was accepted; therefore the experimental
group improved in posttesting

Two-tailed or nondirectional t-test on the deviations between pretest and posttest scores of the control group seen in d₂ in table 8

H_o: µ_d2 = 0 alpha level .05 with 26df gave a critical value of 2.056

H_a: µ_d2 0 a t = .1468004 was found with p > .20

H_o was accepted; therefore the control group did not improve

Independent groups t-test on the sets of d₁ and d₂ deviations seen in table 8

H_o: µ_d1 µ_d2 alpha level .05 with 26df gave a critical value of 2.056

H_a: µ_d1 µ_d2 a t = 4.4017786 was found with p < .001

H_o was rejected and H_a was accepted; therefore, the experimental
group significantly improved over the control group

The three t-test results indicated that the control group did not improve during the implementation of the program, but the experimental group made significant improvements. Therefore, based upon the three statistical analyses‑‑(1) measures of central tendency, (2) measures of frequency, and (3) three t-tests‑‑the third project hypothesis was accepted. The men improved in their empathy skills.