|
Division of Social and Organizational Psychology |
Version française
See also the good references guide
The PROTAN software of
computer-aided content analysis
Presentation
PROTAN (for PROTocol ANalyzer) is a computer-aided
content analysis system. Being aided by the computer means, in the present case,
that PROTAN does the many tedious tasks of textual analysis that a human being
can do but generally avoids doing, like counting words. Not infrequently, without
further notice, PROTAN will do its job "by default", that is, by assuming
that parameters have the values given initially to the system. For instance,
some system's tasks require as little information as a semicolon, picking in
its memory for the rest of the information required. Never, however, PROTAN
does automatic content analysis.
What kind of text can one analyze with
the help of PROTAN?
PROTAN can handle any textual material such
as narratives, clinical interviews, scientific publications, titles or abstracts
of scientific journals through their publication years, poetry, advertising
blurbs, and many other forms of textual material. The limitations of PROTAN
are those imposed either by statistical constraints, by the unavailability of
dictionaries necessary for analyzing a particular sort of text, or by the lack
of hypotheses on the analyst's side.
The text itself must be presented to PROTAN
in ASCII format and may not spill over column 70. Columns 73 to 80 are filled
with indications of interview, unit, and speaker that mean what the analyst
has decided them to mean.
What does PROTAN do to help one analyze
a text?
The aims of the PROTAN software
PROTAN is tuned to two very different tasks,
corresponding to two different content-analytic strategies (Weber, 1983). In
the first one, PROTAN addresses the question of how does the text look like.
Is it generally abstract, does it become ever more abstract, or less? What is
the profile of the main affective connotations (Anderson & McMaster, 1986)
of the text? For example, one could show that the general mood in Hamlet -using
Whissell's dictionary of affect (Hogenraad,
McKenzie, & Martindale, 1997; Whissell et al., 1986)- progresses as
an inverted-U, with the second branch of the inverted-U going
much lower than the first one. Such a finding does not cut ice: We always suspected
it to be so. This is the very reason to pick this classic text. To achieve this
first task, PROTAN rests on a series of semantic dictionaries that are part
of the system.
The second task to which PROTAN is tuned is
to answer the question of what the text is talking about. What are the main
themes in it? A theme, like any interest, is never fixed. We usually want to
know how the interests in a text come and go. The trick of PROTAN, as of Iker's
WORDS (Iker & Klein, 1974) system from which we got the idea, is to postulate
that there is enough information in the relations between words to allow for
themes to emerge by simply analysing these relations.
The tools of the PROTAN software
To accomplish its tasks, PROTAN avails itself
of three tools. These are the segmentation, the lemmatization, and the dictionaries.
Segmentation stands for what it means to each of us. One has to divide the text into as many parts as one feels appropriate. If possible, these segments should be meaningful, i.e. letters, chapters of a book, or acts of a playwright. One can also divide the text into artificial units, i.e. segments of 700 words each, or one may have reasons to decide that one needs to divide the text into 20 equal parts.
One program takes care of the job of segmenting.
Its name is CSCUT. This program can be complex. This step must be taken great
care of. Indeed, all further analyses depend on it.
Lemmatization is a barbarism to designate
the operation by which the various endings of words (plurals, conjugations,
etc.) are transformed into a simpler form, for example, the infinitive for verbs.
Dictionaries are systems of categories (great
dimensions of the mind) that an analyst may be interested in. PROTAN is equipped
with several such dictionaries in different languages. PROTAN is indeed moderately
polyglot.
Standard Operating Procedures
PROTAN is composed of 30 programs. These programs
are modular. This means that each of them has a specific role in a chain. For
instance, program CRWSTRIP, that lemmatizes words, takes its input from program
CSCUT (the one that takes care of segmenting texts) and produces an output (a
system file) to be processed by other ones.
All programs produce at least one output,
i.e. a listing of results. Occasionally, programs produce several outputs: a
list of results and either a system file ready to be used by the next
program or a numeric file to be processed by some statistical package, or both.
In our analysis of Hamlet, the output from the comparison between text and dictionary
is sent out to the SAS statistical package for polynomial analysis. We did not
equipped PROTAN with statistical software.
A list of programs
Following is a list of programs that are currently
part of PROTAN. These are the things that the system can do. Not all these programs
are necessary to have a successful run. Many of these programs are for creating
or editing dictionaries, or striplists, or for editing the text. For convenience,
the list is alphabetical.
Platforms
A distinctive feature of PROTAN is its portability
to several platforms, DOS, UNIX, and Macintosh.
There is no installing procedure; the user can install immediately the 30 programs
and organize the inputs (texts, strip dictionaries, parameter files) and outputs
(listings and punch files) as preferred. Punch files are formatted to be easily
exported towards most statistical packages.
Technical specifications
There are no minimal computer requirements, but with corpora over 100,000 words,
PROTAN will run faster on powerful platforms such as a UNIX one. PROTAN is written
in C. Each program has been tested in several studies that used PROTAN
as a support. PROTAN has never been submitted for reviews in computer software
magazines or scientific journals.
Further information
Further information or request for assistance concerning the software PROTAN
may be obtained from Robert Hogenraad:
Office:
Dr. Robert Hogenraad
Psychology Department, Catholic University of Louvain
10 place du Cardinal Mercier
B-1348 Louvain-la-Neuve, Belgium
Ph.: ..32-(0)10-47 4411
Fax: ..32-(0)10-47 3774
E-mail: hogenraad@upso.ucl.ac.be
Private:
63 Avenue Constant Montald, B-1200 Brussels, Belgium
Ph. & Fax: ..32-(0)2-763 2012
Documentation and references
User's manual:
Hogenraad, R., Daubies, C., & Bestgen, Y. (1995). Une théorie
et une méthode générale d'analyse textuelle assistée
par ordinateur. Le système PROTAN (PROTocol ANalyzer) (Version March
2, 1995). Louvain-la-Neuve, Belgium: Psychology Department, Catholic University
of Louvain. (In French).
Bibliographic references :
Anderson, C. W., & McMaster, G. E. (1986). Modeling emotional tone in stories using tension levels and categorical states. Computers and the Humanities, 20(1), 3-9.
Bestgen, Y. (1994). Can emotional valence in stories be determined from words ? Cognition and Emotion, 8(1), 21-36.
Hogenraad, R. (1991). Retratos de Fernando Pessoa. Revista de Comunicação e Linguagens, 14, 91-110.
Hogenraad, R. (1994). Über den Versuch, das Leben der Wörter zu messen. Inhaltsanalytische Verfahren und Literatur. Achim Barsch, Gebhard Rusch, & R. Viehoff (Eds.), Empirische Literaturwissenschaft in der Diskussion (pp. 306323). Frankfurt am Main: Suhrkamp.
Hogenraad, R., & Bestgen, Y. (1989). On the thread of discourse: Homogeneity, trends, and rhythms in texts. Empirical Studies of the Arts, 7(1), 1-22.
Hogenraad, R., Bestgen, Y., & Durieux, J. F. (1992). Psychology as literature. Genetic, Social, and General Psychology Monographs, 118(4), 455478.
Hogenraad, R., Bestgen, Y., & Nysten, J.L. (1995). Terrorist rhetoric: Texture and architecture. In E. Nissan & K. M. Schmidt (Eds.), From information to knowledge. Conceptual and content analysis by computer (pp. 5467). Oxford, England: Intellect.
Hogenraad, R., Boulard, R., & McKenzie, D. (1994). Les mots qui ont fait les relations industrielles. Québec: Presses de l'Université Laval.
Hogenraad, R., Boulard, R., & McKenzie, D. P. (in preparation). An assessment of the creativity of industrial relations journals: An integrative view. Journal of Organizational Behavior.
Hogenraad, R., Kaminski, D., & McKenzie, D. P. (1995). Trails of social science: The visibility of scientific change in criminological journals. Social Science Information, 34(4), 663-685.
Hogenraad, R., McKenzie, D. P., & Martindale, C. (1997). The enemy within: Autocorrelation bias in content analysis of narratives. Computers and the Humanities, 30 (6), 433-439.
Hogenraad, R., McKenzie, D. P., Morval, J., & Ducharme, F. A. (1995). Paper trails of psychology: The words that made applied behavioral sciences. Journal of Social Behavior and Personality, 10(3), 491-516.
Iker, H. P. & Klein, R. H . (1974). WORDS: A computer system for the analysis of content. Behavior Research Methods & Instrumentation, 6(4), 430438.
Weber, R. P. (1983). Measurement models for content analysis. Quality and Quantity, 17, 127-149.
Whissell, C., Fournier, M., Pelland, R., Weir, D., & Makarec, K. (1986).
A dictionary of affect in language. IV. Reliability, validity, and applications.
Perceptual and Motor Skills, 62, 875888.