# On the Effectiveness of Active-Engagement Microcomputer-Based Laboratories: Part 3

## RESULTS

In this section we describe the results obtained. We first discuss the results of the multiple choice questions, beginning with a presentation of the overall FCI results to provide a normalization of the overall effectiveness of the tutorial environment for general concept building. We then present the specific results of the VQ and of the Newton 3 cluster of the FCI. Finally, we discuss the implications of the free-response-problem results. Note that all evaluations were not used in all classes.

### Multiple Choice

#### Overall FCI

We display the results of pre- and post-FCI tests in tutorial and non-tutorial classes in Table 1. Ten of our classes gave the FCI as pre- and post-tests. Five were taught with tutorials, five with recitations. The data shown are matched, that is, only those students who took both the pre- and post-tests are included. The number of matched students is listed in Table 1 under N. Comparing the averages of the pre-test scores for all students taking the pre-test with the matched subset show that there is not a significant selection.

The results are displayed as a figure of merit (h) histogram in Fig. 4. The classes taught without tutorials are shown as red bars, while those taught with tutorials are shown as green bars.

Fig. 4: Figure of merit histogram. h = fraction of possible gain obtained on the full FCI, for tutorial (gray bars) and non-tutorial (solid bars) lecture classes.

The tutorial classes systematically produced better overall FCI gains than the non-tutorial classes. The average fractional gains of the classes are

<h> = 0.18 (5 classes, with recitations)
<h> = 0.35 (5 classes, with tutorials)

Note that this average is taken as equally weighted over lecture class, not by student. Every tutorial class had a larger h than every non-tutorial class[20] Even the tutorial results are somewhat disappointing, achieving only about 1/3 of the possible gain. Both results, however, are consistent with Hake's survey (ref. 18). The non-tutorial scores are consistent with those of other traditional classes, and the tutorial results are about halfway between those of traditional and highly interactive non-traditional classes.Assuming that all 10 classes are drawn from the same population, the probability that the shift of the means is random is less than 2% using a 2-tailed t-test with pooled variance.[21] If class E is excluded as an outlier, the probability that the shift in the means is random is less than 1%.

The same amount of instruction was offered students in both environments (3 hours of lecture and 1 hour of small class section). The primary difference in the tutorial and traditional classes is that the tutorial classes spend one hour per week on explicit concept building in a small-class group-learning-style environment, while the traditional classes spend one problem solving hour per week in a small-class lecture-style environment.

#### Velocity

The VQ were given in two of our classes taught by the same professor. In class A1, the professor (an award winning teacher and a popular lecturer) did his best to teach the material explicitly in lecture, devoting nearly three full lecture hours to the topic of instantaneous velocity. Lecture demonstrations with the same MBL apparatus as in the tutorial were used in a careful demonstration with much student interaction and discussion. The professor had the students watch and plot the professor's motion as he walked a variety of paths, and a number of problems relating to students' personal experience were presented, but no worksheets were distributed. In recitation sections, graduate teaching assistants spent one hour going over textbook problems on the same material.

In class A3, the tutorial system was in place, and one hour of tutorial was given as described in section III. The professor reduced the lecture time on the topic to a single hour, which was more typical of a traditional lecture and had little student interaction. In both classes, the questions were given as part of an examination and, contrary to Thornton and Sokoloff, were not previously given to the students as homework. The results for the error rates are given in Table 2 and shown in Fig. 5.

Table 2: Percentage error on the VQ with and without MBL
Instruction without MBLQ1Q2Q3Q4Q5
U. Md.(a)297623714
Tufts(b)1874118.517
6 school av.(c)411763376
Instruction with MBLQ1Q2Q3Q4Q5
U. Md.(d)6230195
Tufts(b)2753
6 school av.(c)11.5213117

(a) UMd, prof. A, no tutorial (N = 100)
(b) Thornton and Sokoloff (N = 177 reported in ref 3)
(c) Thornton, (N = 505 reported in ref 5)
(d) UMd, prof. A, MBL tutorial, (N = 161)

Fig. 5: Error rate on T&S velocity questions (VQ).

The Maryland results with four hours of traditional instruction and no tutorial (class A1) resembled the 6-school average of traditional lecture classes reported in Thornton's lecture at the Raleigh conference (ref. 5). The Maryland result with one hour of MBL tutorial and one hour of lecture was substantially improved, but not as good as the improvement shown with four hours of MBL.

These results are consistent with those given by Thornton and Sokoloff. The fact that these results have been obtained with both the lecturer and the time of instruction controlled strongly supports the results in Thornton and Sokoloff and answers our question Q1 in favor of the MBL. The MBL activities play a significant role in the improvement of student understanding of the concept of velocity. It is not simply the extra time that is responsible. It also suggests as a partial answer to our question Q3: simply enhancing lectures is not effective in producing an improvement in the learning of the velocity concept for a significant number of students.

#### Newton 3

The Newton 3 tutorial was evaluated using the four FCI questions 2, 11, 13, and 14 (N3 FCI). The results are given in Table 3 and shown as a histogram in Fig. 6. The table gives the fraction of (matched) students answering each of the N3 FCI questions correctly at the beginning (pre) and end (post) of the semester. A figure of merit, h = (class post-test average- class pre-test average)/(100 - class pre-test average), is calculated for each question in analogy with the Hake figure of merit for the full FCI. The four h-values are then averaged in the last column to give a figure of merit for the Newton 3 cluster, hN3.

Table 3: Results on the FCI Newton 3 questions.
Question2111314hN3
ClassPrePosthPrePosthPrePosthPrePosth
A226%66%0.5453%82%0.6137%34%-0.0463%82%0.500.40
A339%89%0.8244%87%0.7714%52%0.4473%88%0.570.65
B122%67%0.5741%78%0.6218%52%0.4163%74%0.300.48
B237%53%0.2547%68%0.4026%58%0.4368%68%0.000.27
C123%32%0.1231%46%0.2121%43%0.2854%66%0.250.22
C239%44%0.0939%39%0.0028%28%0.0050%67%0.330.11
D138%91%0.8645%90%0.8229%70%0.5759%93%0.820.77
D235%66%0.4740%73%0.5521%46%0.3258%90%0.750.52
E19%45%0.3252%60%0.1524%40%0.2260%64%0.120.20
F26%76%0.6846%85%0.7324%56%0.4360%89%0.730.64

Given for each class are the percentage of students answering each question at the beginning (pre) and end (post) of the class. For each question, h is calculated to be (post-pre)/(100-pre). The column headed hN3 gives the average of the four h values in each row. Classes using the N3 MBL tutorial are indicated in bold. (Note class A2 used tutorials, but not the MBL ones.)

Fig. 6: Histogram of average figures of merit for the Newton 3 FCI cluster. Solid bars are for classes not using the MBL tutorial, gray bars for those using the MBL tutorial.

The results are systematically better for the tutorial classes. Indeed, every tutorial class has a higher value of hN3 than every non-tutorial class (though a similar statement is not true for the h-values for every individual question). The average values of hN3 for each cluster of classes are:

<hN3> = 0.28 (6 classes, without Newton 3 MBL tutorial)
<hN3> = 0.64 (4 classes, with Newton 3 MBL tutorial)

In the first semester in which tutorials were tested, there was no tutorial specifically oriented towards Newton 3. Our tutorial was written for the subsequent semester. As a result, the first Maryland tutorial class, A2, used tutorials but not a Newton 3 MBL tutorial. This, therefore, gives us a control for individual lecturer as well as for the presence of tutorials. (No special effort was devoted to Newton 3 in lecture in either case.) The result is:

<hN3> = 0.40 (A2: without Newton 3 MBL tutorial)
<hN3> = 0.65 (A3: with Newton 3 MBL tutorial)

### Long Problem

While the multiple choice questions tell us whether students "have" the desired information, it gives no information on whether they can access it in an appropriate complex problem. In order to test this, we developed an examination problem that required students to display both an un-derstanding of a velocity graph and to use Newton's third law in a complex physical situation. The problem shown in Fig. 3 was given on the final exam in one tutorial class (D2) and one non-tutorial class (B2). Overall, performance on the problem was better for the tutorial than for the non-tutorial students. However, in this paper we will only discuss issues related to the velocity graph and Newton's third law.

#### Velocity

Part of the examination question asked the student to generate a velocity vs. time graph for a complicated situation. The critical elements of a fully correct solution show the velocity starting at 0, increasing linearly until t=3 seconds, and then decreasing linearly to some negative value.[22]

Students from both classes struggled with this question. Table 4 shows a breakdown of student responses.

Table 4: Results on student construction of velocity graph in long exam question.
% correct (a)% apparently correct, but ending at v=0 (b)% other incorrect responses
Recitation (N=50)121078
Tutorial (N=82)222157

(a) These students demonstrated understanding of the critical features of the graph.
(b) While showing some of the critical features of a correct graph, these students mistakenly ended the graph at v=0, often citing the return of the cart to its initial position as the reason.

Only a small fraction of the students in either class were able to draw a graph that reflected the critical features, but the tutorial students did better than the students in the recitations. After traditional instruction, 12% of the students drew a correct graph. After MBL tutorials, 22% of the students drew a correct graph.

Analysis of the incorrect graphs along with the accompanying explanations revealed some of the students' difficulties. Many students showed in a variety of ways that they had the well-documented confusion between position and velocity. Some drew graphs that at first glance appear correct: the graph increased linearly for the first 3 seconds and then decreased linearly after. However the graph ended at v=0, and some of these students indicated that this coincided with the cart returning to its starting location. Many drew graphs that had incorrect combinations of linear segments, including discontinuities in velocity. Others drew dramatically curved features in their velocity-time graphs. Most of these graphs indicated severe conceptual difficulties even if interpreted as a position vs. time graph. It is worth noting that it is clear from many of their explanations that the students intended to draw a velocity vs. time graph.

Both the percentage of correctly drawn graphs and the nature of the incorrect graphs confirm that while student difficulties understanding kinematics is pervasive even after instruction, the modified instruction described earlier in this paper appears to be helping address these difficulties. Although the VQ were not given in these classes, approximately 70% of the students in the comparable tutorial class A3 answered all of the multiple choice questions correctly, while only about 40% of those in the recitation class A1 answered them all correctly. The relative results on the long-problem are qualitatively consistent with the results of the VQ, but the absolute number of students getting correct answers on the long-problem was substantially lower (22% of the tutorial students correct vs. 12% of recitation students correct). Since no classes were evaluated with both the VQ and the long problem, we cannot completely answer Question Q2, but our indications are that the VQ may not suffice. Our results suggest that answering multiple-choice questions correctly is not sufficient to guarantee a robust and fully functional understanding of the relevant concepts for a significant number of students.

#### Newton 3

Another part of the same examination question tested student facility with dynamical concepts. The students were asked to draw a free body diagram of each cart shown in Fig. 3 and to rank the magnitudes of the horizontal forces. Note in particular that by Newton's third law, the magnitude of the force of cart A on cart B is equal to that of cart B on cart A.

The breakdown of student responses to this part of the question is shown in Table 5.

Table 5: Results on student use of Newton's third law in long exam question.
% correct % used the same symbol but did not compare forces% stated third law force pair have different magnitudes % no identification of contact forces% other incorrect response
Recitation (N=50)426221416
Tutorial (N=82)5504014

In the tutorial classes, 55% of the students correctly identified and compared the third law force pair. In the non-tutorial class 42% identified and correctly compared these forces.[23] (This result favoring the tutorial class is particularly notable since their pre-test N3 FCI scores were lower than the recitation classes's score, 38% correct to 44% correct.) Many students identified that the two carts were exerting forces on one another, but stated explicitly that the two forces were not of equal magnitude. In addition, there were also many students who did not even recognize that the two carts exert forces on each other. This was particularly common in the non-tutorial class.

These results should be compared with the results on the post-test N3 FCI questions for the same two classes, 69% and 62% respectively. The discrepancy between the multiple-choice and long-answer problems (in this case both questions were done by both groups) also suggests that the answer to question Q2 might be: the short answer results provide an indication, but overestimate the students' knowledge.

## SUMMARY AND CONCLUSIONS

In this paper we have discussed an experiment to test the effectiveness of replacing one hour of problem-solving recitation by one hour of active-engagement MBL addressing the issues of instantaneous velocity and Newton's third law delivered in a University of Washington style tutorial.

The velocity issue was probed by using the multiple choice velocity graph-matching questions given in ref. 3 in two classes taught by the same professor. In one class, the material was taught in lecture with additional lectures given on the subject and the professor doing his best to "teach to the test" without actually doing the test questions in class. In a second class, the professor ignored the test but a single hour of tutorial based on Tools for Scientific Thinking was given. In the non-tutorial class the results were very close to the six school average of lecture classes reported in ref. 5. In the tutorial class, the error rates fell by more than a factor of 2 for all questions. Although this result is not as dramatic as those produced by Thornton and Sokoloff after four additional hours of MBL laboratory, the results are still impressive, especially since we controlled for both the instructor and the time of instruction.

In our second experiment, we constructed a tutorial using MBL on the subject of Newton's third law. In this case, we used the four relevant questions from the FCI in pre- and post-testing as an evaluation tool. Of the ten classes tested, the tutorial was given in 4 lecture classes with three different professors, and it was not given in 6 lecture classes with three different professors. One of the professors taught a class in each group giving us a specific control for instructor. Both the absolute gains and the final total scores favor the tutorial classes with every tutorial class scoring a higher fraction of the possible gain (hN3) than every non-tutorial class. The professor who did both a recitation and tutorial section, found his class's value of hN3 increase by 60% when he used the tutorial.

We therefore conclude that our answer to question Q1 is: targeted MBL tutorials can be effective in helping students build conceptual understanding, but do not provide a complete solution to the problem of building a robust and functional knowledge for many students.

A long problem requiring the application of the velocity concept, the building of a velocity graph, and the application of Newton's third law in a complex situation was also given to one tutorial and one recitation class. The tutorial students performed better than the recitation students. In the N3 case where both short and long answer data were available for the same class, the long answer results favored the tutorial students slightly more strongly than the multiple choice questions. But in all cases, the number of students able to produce the correct concept in a complex situation was significantly less than suggested by the multiple choice questions. This indicates that the answer to question Q2 is: multiple choice tests are qualitatively indicative of the direction of change, but cannot be used to determine the extent of robust and functional knowledge developed by the class.

In this experiment we did not test for "side effects",[24a] Since the MBL activities were added at the expense of problem-solving recitations we should also test whether there was a deterioration in problem-solving for students who did tutorials instead of recitations. We do not expect a significant effect as our personal anecdotal evidence suggests that recitations are effective for only a small fraction of students. This should, however, be tested in more detailed studies. There are strong indications from earlier work[24] that successful problem solving at the introductory level is often not associated with a growth in conceptual understanding. It may be that only a small fraction of students can successfully learn physics in the order: algorithms first, understanding second; and that it would be more effective for most students to reverse the order.

Thornton and Sokoloff conjectured that the MBL activities they had designed were unusually effective for five reasons:

1. Students focus on the physical world.
2. Immediate feedback is available.
3. Collaboration is encouraged.
4. Powerful tools reduce unnecessary drudgery.[24b]
5. Students understand the specific and familiar before moving to the more general and abstract.

These conjectures are consistent with modern theories of learning,[25] including those built on the work of Piaget and Vygotsky, and on our current understanding of the structure of short and long-term memory buffers.[26] To this list we add a sixth conjecture: 6. Students are actively engaged in exploring and constructing their own understanding.

The Thornton-Sokoloff conjectures appear to be confirmed by a variety of anecdotes describing the success of the substitution of active-engagement MBL activities for traditional labs, and by the failure of the same equipment when used as traditional labs without the engagement/discovery component.[27] These have not, unfortunately, been documented in the literature. It would be useful to have additional detailed experiments comparing different methods in order to build an understanding of exactly what components of MBL activities are proving effective.

Since we relied on all of the Thornton and Sokoloff conjectures (and one of our own) in building our units, we are unable to distinguish which of the elements are critical.[28] Note that the impact of all of the six conjectures could well be achieved without the use of MBL equipment. It would be most interesting to carry out additional large scale studies of the effectiveness of active engagement activities that do not include MBL on the learning of specific concepts.

## ACKNOWLEDGMENTS

We would like to thank the faculty at the University of Maryland who participated in this study. One of us (EFR) would like to thank Cliff Swartz for a number of thought-provoking discussions probing the Thornton and Sokoloff results. We would also like to thank Ron Thornton, David Sokoloff, and Michael Wittmann for useful discussions and comments on the paper. Work supported in part by NSF Grants RED-9355849 and DUE-9455561. Computer facilities provided by NSF Grant DUE-9550890 and by the University of Maryland Computer Science Center.

## Endnotes:

[20] In absolute final scores, one tutorial class with a low pre-test score finished below some of the non-tutorial classes, and one non-tutorial class with a high pre-test score finished above some of the tutorial classes.

[21] Howell, David C., Statistical Methods for Psychology, 3rd Ed. (Duxbury Press, Belmont, California 1992) pp. 181-185.

[22] A graph reversed with respect to the horizontal axis would also be considered correct.

[23] A few students in the non-tutorial class used the same symbol for these two forces, but did not state whether the forces were equal, so it was impossible to determine if they were identifying these two forces as having equal magnitudes. Note that many students used the same symbol to represent forces which clearly had different magnitudes.

[24] Frederick Reif, "Millikan Lecture 1994: Understanding and teaching important scientific thought processes" Am. J. Phys. 63 (1995) 17-32.

[24a] Evidence from the University of Washington group suggests that replacing a problem solving lecture with tutorials does not adversely affect students' ability to solve quantitative problems. Personal communication, Lillian C. McDermott.

[24b]Note that when a student is learning what a graph means, constructing it by hand the first few times may very well be "necessary" drudgery. However, when one is focusing on a connection between a physical phenomenon and a graph, a long time delay between seeing the phenomenon and seeing the graph may well produce a "cognitive disconnect" which prevents them from building the understandings they need.

[25] See references in ref. 15.

[26] Daniel L. Schacter, "Memory", in Foundations of Cognitive Science, M. I. Posner, ed. (MIT Press, Cambridge MA, 1989) 683-725.

[27] Ronald K. Thornton, private communication, January 1994.

[28] From our personal teaching experience, we expect that number 6, active engagement, is the most significant.

©1997, American Association of Physics Teachers.