Revista Electrónica de Investigación Educativa

Vol. 9, Num. 2, 2007

Methodological Proposal for the Formulation of a
Conceptualization of the Language Use of an English
Certification Test as a Foreign Language

Virginia Velasco Ariza

Instituto de Investigación y Desarrollo Educativo
Universidad Autónoma de Baja California

A.P. 453, C. P. 22830
Ensenada, Baja California, México

Maria Luz Anguiano López Paliza
Escuela Normal Estatal

Rosas y Eucaliptos
Fracc. Valle Verde, 22839
Ensenada, Baja California, México

Norma Larrazolo Reyna

Instituto de Investigación y Desarrollo Educativo
Universidad Autónoma de Baja California

A.P. 453
C.P. 22830
Ensenada, Baja California, México

(Received: February 14, 2007; accepted for publishing: October 5, 2007)



This article describes the procedure followed to formulate a conceptualization of the Target Language Use Domain for the Graduation Test of English (EXEDII). Language tests consider four abilities: listening, reading, speaking and writing, but a controversy exists about the conceptualization of the measured construct since the use of language in the real environment is neither fractioned in separate entities, nor used in a single situation. Thus it seems useful to make a conceptualization of the field before constructing or validating tests. Taking into account the adaptability of language use depending on the situation, four concatenated activities for collecting information were designed. A Focus Group and interviews provided the opinion of experts in teaching. Also, comparisons were made between the descriptions of the intermediate level of proficiency used by international institutions, a centre of language teaching and a couple of English language proficiency tests. The integration of this information produced the conceptualization which will be used as a criterion for the content validation of the EXEDII test.

Key words: Validity, English (Foreign Language), target language use, content validity.



The English Language Graduation Test (Spanish acronym EXEDII) is a computerized criteria test which is aligned with the intermediate level of the course On Target in the Scott Foresman English Series (Intermediate Level), developed by James Purpura and Diane Pinkley (1991) for certification in English as a foreign language. The test was created by a group of researchers at the Universidad Autónoma de Baja California (Spanish acronym UABC) Institute for Educational Research and Development [Instituto de Investigación y Desarrollo Educativo] (Spanish acronym IIDE), in Mexico. It is applied to UABC graduates who are unable to provide documented proof of their command of English. The EXEDII includes 100 multiple choice items, grouped in three areas of evaluation: listening comprehension, grammar, and reading comprehension.

The EXEDII, like other standardized tests, can be considered high impact due to its consequences for students who take it. For this reason, it is essential to have evidence of its validity as a measurement instrument. Although validity is a unitary concept (Messick, 1993), it is common, in the literature on test validation, to find definitions of the type of evidence sought, depending on the nature of the instrument and the interpretations made from the scores obtained by subjects. In the EXEDII’S case, evidence of validity should refer to content, because it is a criteria and certification test (Popham, 1990), and to the construct, because it constitutes the most important type of evidence in any kind of test (Anastasi, 1977; Heaton, 1988; Cureton, 1951; Messick, 1993; Bachman & Palmer, 1996).

Traditionally, both English courses and language evaluations have been defined in terms of four skills: spoken discourse (speaking), writing, listening comprehension, and reading comprehension (reading). The first two are considered productive skills and the other two receptive skills. It has been thought that speaking and listening relate language expressed through hearing; in contrast, reading and writing do so through sight. While it is true that in a conversation hearing may be the most important sense, part of the meaning of discourse is derived from visual codes, such as gestures and facial and bodily movements in general. On the other hand, when a person reads the person is not completely passive, given that the meaning of the written message is recreated by the reader through reasoning and the reader’s socio-cultural context. Writing can be understood as an action that produces a message with more than one meaning, depending on the reader. Widdowson (1978) presents an analysis of the ambiguity of these affirmations and proposes a broader interpretation, which allows for other elements that had not been considered in approaches prior to the communicative approach, such as attention to the characteristics of tasks and those of the subject in a given situation.


I. Background and theoretical framework

Language, as an object of study, has been conceptualized from different perspectives which, in the literature on the subject, are commonly referred to as approaches. Thus, the study of linguistics in general, and the teaching and evaluation of language in particular, were influenced by the structural approach, focused on knowledge of the grammatical rules of language. In the nineteen seventies Chomsky’s approach gave a new turn to the structural approach with the concepts of competency and use. After Chomsky (1970), Hymes (1971) developed the concept of communicative competency, which assumes knowledge of the contextual or socio-linguistic use of language. The work of Halliday (1982) contributed the concepts of notions and functions of language. The integration of the approaches of these and some other authors produced another approach, known as the communicative approach, which considers the user of language the protagonist, with communication needs in the different situations in which they arise.

In the field of psychology, the study of language has been addressed from different points of view, from the concept of language as a conduct subject to operant conditioning (Skinner, 1957/1981) to its conception as a manifestation of thought (Piaget, 1969), or in accordance with the posture of Vygotsky, as a tool that a person uses not only to communicate with others, but that allows the person to construct the notion of the surrounding world (Hernandez Rojas, 2005). It is important to note the diversity of theoretical approaches regarding the origin, development, and function of language, which, in one way or another, have influenced the teaching and evaluation of languages, and by extension the construction of tests as instruments to measure skills, abilities, and competencies or learning.

The Dictionary of the Spanish Language (2001) defines “certify” (certificar) as to assure, affirm, or hold something as true. To certify the use of language is to validate the display of a person’s command of that language, comparing that person’s performance against established criteria. To make this comparison it is necessary to use a measuring instrument (test) that allows one to implement tasks in which the subject uses language in ways essentially equal to those he would encounter in situations outside the evaluation. The instrument should be designed, constructed, and validated so that the results obtained can be interpreted as evidence of whether or not the subject possesses the skills evaluated.

Most procedures used to validate the contents of a test include the comparison of the questions or objectives of the instrument against a particular program or course, which represents the content that students should know after instruction (Messick, 1993; Popham, 1990). On the other hand, criteria tests should be aligned with a criterion, which often consists of the curriculum or objectives of a given course (Nitko, 1994). When the instrument has the purpose of certification it is harder to choose a criterion because, by definition, certification tests evaluate subjects’ skills independently of the manner in which they have been acquired (Heaton, 1988). If it is necessary to validate the content of a criteria test whose purpose is to certify subjects in their command of a language, it is inadequate to use the same criteria used to construct and validate it. The communicative approach helps overcome the issue of using course objectives as criteria in the process of validating a criteria test for language certification. The argument is to appeal to the concept of language use (Bachman & Palmer, 1996).

The use an individual makes of language to communicate in the natural environment acquires specific characteristics, depending on the situation in which it occurs. Each of these circumstances has its own domain: the domain of language use (DLU). For measurement purposes, it can be broken down into specific language use tasks, based on a particular situation called domain of target language use (DTLU). DTLU is broader than the sub-domain covered by the items in a test, but if the tasks in the test are relevant and pertinent, they may represent it and the interpretations made based on the test results will be generalized to the entire domain. DTLU can be defined as the set of specific language use tasks that the subject is likely to encounter outside the test situation and on which the tester seeks to generalize inferences about the subject’s language skills (Bachman & Palmer, 1996).

The work presented here is a methodological proposal, as it suggests the application of a series of activities interrelated to formulate the conceptualization of language use that an instrument will measure. The procedure starts by collecting the opinions of experts in university teaching and/or English teaching, by means of a procedure that allows them to express themselves without encumbrances. This is followed by a consultation of published international standards and comparing the two sources of information to arrive at a conceptualization of the domain, which serves as the operational definition of the test construct, for the purposes of studies of its validity.

The task of formulating a conceptualization of the domain a test such as the EXEDII measures follows from the need to find the most relevant indicators of use of the target language in order to integrate them in an operational description of the tasks of the domain to be measured, which are consistent with the characteristics of the subjects, in the particular situations that are likely to arise.

To gain a clear idea of the type of tasks a person making use of a foreign language might face (e.g. the task of communicating with a waiter in a restaurant), and represent them in a test of the use of that language, we need to know the type of situations subjects are likely to face (e.g. the situation of ordering a breakfast consisting of fried eggs and coffee). For this purpose, it was considered necessary to collect the opinions of two groups of persons, one of which had to know the characteristics of the subjects and their English usage needs, and the other had to be fluent in the language and have experience teaching students with characteristics equal or similar to those of the test subjects.

To achieve these ends, we investigated the opinions of university and English language teachers by means of two techniques: focus group (Alvarez-Gayou, 2005) and interviews. Then the information obtained was compared with the criteria that different associations and organizations use to teach, evaluate, or certify use of English. The associations and organizations consulted are: the American Council of Teachers of Foreign Languages (ACTFL), the Association of Language Teachers of Europe (ALTE), Cambridge University, and the University of Buenos Aires, Argentina, Certification program in English as a foreign language (Spanish acronym CILE). The reasons these organizations were chosen are related to their extensive use of international standards at language teaching and evaluation centers, both in the Americas and in Europe and Asia. Also, the impact of the Cambridge University testing system on admissions criteria at universities on those three continents is well known. Finally, the CILE was chosen in an attempt to homologate the circumstances of the EXEDII with those of a Latin American university, given that in that language teaching and certification institute English is taught as a foreign language to students whose native language is Spanish.


II. Method

In the phase of EXEDII’S design and construction a definition of the test’s construct was developed, which in turn established that the EXEDII ought to measure subjects’ command of English as a foreign language, at the intermediate level, in UABC graduates. This definition served as a general frame of reference in the development of the test, which, as a criteria test, was aligned with the course On Target developed by Purpura and Pinkley (1991). Notwithstanding, the study of validation of the EXEDII’S content required a more specific definition, in order to evaluate the relevance, representativity, and pertinence of the test items in relation to the definition of the content and the construct, and the authenticity of the tasks evaluated by the test items.

Four activities were developed: a) focus groups; b) interviews with two experts; c) comparative analysis of descriptions of intermediate command of English as a foreign language, from four different sources; and d) conceptualization of the EXEDII’S DTLU.

First activity: Focus groups

Purpose: To collect opinions from an intentional sample of university teachers on two topics: a) the EXEDII as a measurement instrument, and b) knowledge and skills in English that UABC students should possess to graduate from the university and practice their professions or continue their studies.

This method was chosen to collect participants’ opinions because the objective was to form a group in which a pair of predefined questions would be posed and ask participants to answer them, allowing an exchange of opinions. According to Loera Varela (2000) and Alvarez-Gayou (2005), the answers given would be different from those that would be given individually in a personalized interview, because they are built from interaction with group members.

Participants: Sixteen active teachers were invited to participate in the focus group and 14 accepted, distributed as follows:

Procedure: Before the focus group, participants had the opportunity to take the EXEDII under the same conditions as it is administered to UABC graduates. In the focus group session the panelists were asked the following questions: “How much English should a UABC graduate know?” and “What is your opinion of the EXEDII as an instrument to measure command of English in UABC students?” The session had a duration of three hours and a video was made to keep an objective record of the participations. According to Alvarez Gayou (2005), after recording all the ideas expressed by the panelists, they were grouped in conceptual categories and organized in a table.

Second activity: Interviews

Purpose: To obtain detailed responses on the opinions of English teaching specialists, to acquire elements to construct the conceptualization of the DTLU.

Participants: Two active English teachers, one in the State Normal School and the other is the principal of an academy that teaches English, in accordance with the program of “English for specific purposes”.

Procedure: Interviews were conducted with a professor who did not attend the focus group and with one of the panelists, who did not have the opportunity to express his opinions. The interview format was semi-structured, and the interviewees were asked one of the questions from the focus group: “How much English should a UABC graduate know?” No other questions were asked because one of the interviewees did not have the chance to familiarize himself with the EXEDII and the other interviewee had already expressed his opinions in the focus group. The information collected in the two interviews was analyzed in two parts: listening to the recording or reading the notes, and then writing down the general ideas expressed and dividing them into categories, exemplified with segments from the interviewees’ remarks. Finally, a table was made with each interviewee’s categories, which is included in the section on Results.

Third activity: Comparison of international standards corresponding to the level to be measured by the EXEDII

Purpose: To compare standards for the intermediate level in four international organizations to identify the skills, competencies, or learning pertinent to the level and context of UABC students. The corresponding organizations and levels were: a) ACTFL at the lower intermediate and middle intermediate levels; b) ALTE levels A1 and B1, under the headings of general skills, students, work, and tourist-social; c) CILE, with regard to minimum passing levels for the Certification program in English as a Foreign Language, in its descriptions of the pre-intermediate and intermediate stages; d) Cambridge University, in accordance with the passing criteria for the Preliminary English Test (PET) and the First Certificate in English (FCE).

Procedure: American (ACTFL) and European (ALTE) standards were chosen as a means of identifying criteria from English speaking countries. Also, the criteria applied in a program used by a Latin American university (CILE) were chosen to certify command of English as a foreign language, and finally, considering that the tests applied by Cambridge University (UK) are widely used as criteria for competence in English language teaching programs. The PET and the FCE were chosen because they correspond to the intermediate level in accordance with the criteria of Cambridge University’s ESOL (English for Speakers of Other Languages) tests (ALTE, 2007). The information obtained was used to construct a table with two entries, which is shown in the section on Results.

Fourth activity: Conceptualization of the EXEDII’S DTLU

Purpose: To express the conceptualization of Domain of Target Language Use, based on the combined results from the three previous activities.

Procedure: Tables 5, 6, 7, 8, and 9 were examined carefully with the intention of using them as the basis to describe the criteria for competency at the intermediate level in terms of tasks. These tasks correspond to a sample of those the subject would face in real-life situations, and therefore should be included in a test like the EXEDII. Then Tables I and III were examined to adapt the content of the others to the specific context of the EXEDII and the characteristics of its subjects.

The result of this activity was a text which expresses the conceptualization of the DTLU and is presented in the section on Results.


III. Results

First activity. The topics discussed by the panelists were noted and organized in conceptual categories that covered several similar ideas, as shown in Table I. The purpose of making this kind of groupings is to condense the information collected in a few phrases, which can be interpreted more easily. The criterion used to group the ideas in categories was their thematic similitude. The first column contains the categories and the second contains some of the ideas that exemplify them.

Table I. Conceptual categories formed based on the ideas expressed
in the opinions of panelists in the Focus Group

Second activity. The information collected in the interviews was analyzed following a procedure similar to the analysis in the Focus Group.

First, the ideas expressed in each of the interviewees’ opinions were recorded, as shown in Table II.

Table II. Sample of ideas expressed in interviews with two experts in English teaching

The content of the ideas expressed by the interviewees was organized in conceptual categories, grouping several similar ideas, as shown in Table III. Again, the purpose of making this kind of groupings is to condense all the information collected and summarize it in a few phrases that can be interpreted more easily. The criterion with which the ideas were grouped in categories was their thematic similitude.

Table III. Categories derived from interviews with two experts in English teaching

Third activity. The information collected on the standards of the organizations consulted was organized in a table, for purposes of comparison. Table V shows a fragment of the standards consulted, with phrases that express information differentiated based on two criteria: a) tasks that exceed survival level and are associated with the characteristics of the subjects, and b) tasks that imply receptive skills (reading and oral comprehension, based on the traditional classification) and productive skills. This selection was made based on the characteristics of the EXEDII, which does not measure productive skills (spoken discourse and writing).

Table IV. Description of some minimum competencies/skills required for intermediate level English according to ACTFL, ALTE, the University of Buenos Aires CILE, and the Cambridge University ESOL tests

Given that the description of the standards consulted is not organized in terms of four skills (two productive and two receptive), the descriptions in Table IV were included in tables V, VI, VII, and VIII, regrouping them in the EXEDII’S areas of measurement: listening comprehension, reading comprehension, and grammar, when standards include the latter.

Table V. Integration of information from Table IV, based on EXEDII areas of measurement: listening and reading comprehension, according to ACTFL

Table VI. Integration of information from Table IV, based on EXEDII areas of measurement: listening and reading comprehension, according to ALTE

Table VII. Integration of information from Table IV, based on EXEDII areas of measurement: listening and reading comprehension: Listening, Reading, and Grammar comprehension, according to CILE

Table VIII. Integration of information from Table IV, based on EXEDII areas of measurement: Listening and reading comprehension, according to ESOL

Based on the criteria behind the choice of indicators of competencies/skills in the context of teaching and evaluation of English as a foreign language for the areas of listening and reading comprehension, avoiding the need to recur to the curriculum with which the EXEDII is aligned, we accepted the suggestion of one of the experts interviewed to consult intermediate level indicators of Barnes & Noble’s SparkCharts™.

According to the expert interviewed, SparkCharts™ are a kind of atlas or charts developed by experts at Harvard University, which include relevant and representative content on the subject treated and are being widely used at institutions that teach English for specific purposes. It is noteworthy that English for Specific Purposes has become a technical term that refers to a language teaching methodology and responds to the learning needs of specific population groups. For example, students who need to prepare for a certification test, businesspeople who need to learn to speak English to communicate in the environment of a maquiladora, executives who need to learn or perfect their use of a language, to communicate in the context of international business relations.

To know and evaluate whether SparkCharts™ have the academic quality necessary to be used as indicators of the minimum grammatical knowledge necessary for the intermediate level, we asked the other expert interviewed for her opinion on the subject, and she considered them adequate.

The advertisement appearing on the SparkCharts™ website describes them as follows:

Imagine if the top student in your course organized the most important points from your textbook or lecture into an easy-to-read, laminated chart that could fit directly into your notebook or binder.

SparkCharts™ - created by Harvard students for students everywhere - serve as study companions and reference tools that cover a wide range of subjects, including Business, Math, Science, History, Humanities, Foreign Language, and Writing. Titles like Presentations and Public Speaking, Essays and Term Papers, Resumes and Cover Letters, and Test Prep give you what it takes to find success in college and beyond. Outlines and summaries cover key points, while diagrams and tables make difficult concepts easier to digest (SparkNotes, s.f.).

Consequently, the English Grammar content of SparkCharts™ was used as criteria to contribute to the conceptualization of the DTLU the EXEDII ought to measure. Table IX shows the content derived from English Grammar.

Table IX. Indicators of grammatical skills and learning for intermediate level,
according to SparkCharts™

Fourth activity

The last activity in the exercise to construct a conceptualization consisted of writing the DTLU.

The literature on validity of measurement instruments stresses the importance of ensuring that the construct a test seeks to measure be clearly defined (e.g. Messick, 1993; Popham, 1990). Therefore, every effort should be made to arrive at an adequate definition of the construct to be measured. Considering that the definition of the construct for the EXEDII used in its design and construction is too general to guide the process of validating content, it was necessary to develop a conceptualization of what the test ought to measure, to be judged by the experts, as it constitutes the explicit definition of the EXEDII’S construct.

The conceptualization and characterization of the Domain for Target Language Use (DTLU) of EXEDII subjects is described below, in accordance with the ideas expressed by the participants:


IV. Discussion and conclusions

The purpose of this paper was to offer a methodology that will facilitate the task of defining the construct for a language test. Validation studies of psychological and educational measurement instruments, as well as theoretical literature on their validity, stress the need to clearly define the test construct, as in the cases of Heaton (1988), Popham (19900, Messick (1993), Nitko (1994), Bachman & Palmer (1996); and Wang, Bachman, Carr, Kamei, Kim & Llosa (2000). Possibly, the most important, but also the most difficult step in defining the construct that a psychological or educational instrument ought to measure is the very definition of the construct.

The methodological proposal presented here has the following qualities:

The proposed methodology is not original in its components. What makes it innovative is the combination of those components, to facilitate so important and complex a task, in the construction and validation of measurement instruments in the social sciences.



Alvarez-Gayou, J. L. (2005). Cómo hacer investigación cualitativa. Mexico: Paidós.

American Council of Teachers of Foreign Languages (1985). ACTFL Proficiency Guidelines [Electronic version], Hastings-on-Hudson, NY: ACTFL Materials Center. Retrieved January 26, 2007 from:

Anastasi, A. (1977). Tests psicológicos (3ª. ed.). Mexico: Aguilar.

Association of Language Testers in Europe (2006). Overall general ability, social & tourist typical abilities, work typical abilities, study typical abilities. Framework and Can-Do. Retrieved January 26, 2007 from:

Association of Language Testers in Europe (2007). University of Cambridge Examinations (Cambridge ESOL). Retrieved October 3, 2007 from:

Bachman, L. & Palmer, A. (1996). Language testing in practice. Oxford:Oxford University Press.

Chomsky, N. (1970). Aspectos de la teoría de la sintaxis (C. P. Otero, Trans.). Madrid: Aguilar. (Original work published 1965).

Cureton, E. E. (1951). Validity. In A. W. Ward, H. W. Stoker & M. Murray-Ward (Eds.), Educational measurement: Origins, theories and explications: Vol 1. Basic concepts and theories (pp. 133-178). New York: University Press of America.

Halliday, M. A. K. (1994). El lenguaje como semiótica social. México: Fondo de Cultura Económica (Original work published 1978).

Heaton, J. B. (1988). Writing English language tests. Nueva York: Longman.

Hernández Rojas, G. (2005). La comprensión y la composición del discurso escrito desde el paradigma histórico-cultural. Perfiles educativos, 27 (107), 85-117. Retrieved October 5, 2007 from:

Hymes, D. (1971). Competence and performance in linguistic theory. In R. Huxley & E. Ingram (Eds.), Language acquisition: Models and methods (pp. 3-24). New York: Academic Press.

Loera Varela, A. (2000). Los grupos de enfoque en la investigación cualitativa. INDES-BID. Retrieved October 5, 2007 from:

Messick, S. (1993). Validity. In R. L. Linn (Ed.), Educational Measurement (pp. 13-103). Phoenix, AZ: American Council on Education & The Orix Press.

Nitko, A. J. (1994, July). A model for development curriculum-driven criterion-referenced and norm-referenced examination for certification and selection of students. Paper presented at Conference of Education, Evaluation and Assessment for the Association Studies of Educational Evaluation, South Africa.

Piaget, J. (1984). El lenguaje y el pensamiento del niño pequeño (Elba Mendolia, Trans.). Barcelona: Paidós. (Original work Publisher 1924).

Popham, W. J. (1990). Modern educational measurement: A practitioner’s perspective. Washington, DC: Allyn & Bacon.

Purpura, J. E. & Pinkley, D. (1991). On target 1 (2ª. ed.). Glenview, IL: Scott Foresman.

Real Academia Española. (2001). Diccionario de la lengua española (22ª. ed.). Retrieved October 2, 2007 from:

Skinner, B. F. (1981). Conducta verbal. Mexico: Trillas. (Original work published 1957).
SparkNotes (n.d.). English Grammar SparkCharts™. Barnes & Noble. Retrieved October 3, 2007 from:

Wang, L., Bachman, L. F., Carr, N., Kamei, G., Kim, M. & Llosa, L. (2000, March). A cognitive-psychometric approach to construct validation of Web-based language assessment. Work-in-progress. Paper presented at the 22th Annual Language Testing Research Colloquium, Vancouver, BC, Canada.

Widdowson, H. G. (1978). Teaching language as communication. Oxford: Oxford University Press.

Please cite the source as:

Velasco Ariza, V., Anguiano López, M. L. & Larrazolo Reyna, N. (2007). Methodological proposal to formulate a conceptualization of the construct for a certification test of English as a foreign language. Revista Electrónica de Investigación Educativa, 9 (2). Retrieved month day, year from: