Methods of Lexicological Analysis


Methods of Lexicological Analysis




Growing interest in methods of study is one of the most symptomatic features of present-day linguistics.

The research methods used in lexicology have always been closely connected with the general trends in linguistics. The principles of comparative linguistics have played an important role in the development of a scientific approach to historical word study. They have brought an enormous increase in ordered and classified information about the English vocabulary in their proper perspective. The methods applied consisted in observation of speech, mostly written, collection and classification of data, hypotheses, and systematic statements. Particular stress was put on the refinement of methods for collecting and classifying facts. The study of vocabulary became scientific.

19th century scientific language study having recognized variety and change in language, comparative philology insisted on regarding the descriptive statements as subordinate, not worth making for their own sake. Its aim was to reconstruct the fundamental forms and meanings which have not come down to us. With the use of sets of phonetic correspondence philologists explored and proved genetic relationships between words in different languages. They rejected prescriptive trends characteristic of the previous stage. It was realized that the only basis for correctness is the usage of the native speakers of each language. They destroyed the myth of a Golden Age when all the words had their primary "correct" meaning and when the language was in a state of perfection from which it has deteriorated. It became clear from intensive work on the great historical dictionaries that multiple meaning for words is normal, not an "exception". Comparative studies showed that, save for specific technical terms, there are no two words in two languages that cover precisely the same area.

The process of scientific investigation may be subdivided into several stages:




Due to these processes the certain classification of the methods of lexicological analysis has appeared.

Nowadays scientists distinguish:

Contrastive analysis

Statistical methods of analysis

Immediate Constituents analysis

Distributional analysis and co-occurrence

Transformational analysis

Componential analysis

Method of semantic differential

Contextual analysis

The detailed description of these methods will be shown further.


I.1. Contrastive Analysis

In fact contrastive analysis grew as the result of the practical demands of language teaching methodology where it was empirically shown that the errors which are made recurrently by foreign language students can be often traced back to the differences in structure between the target language and the language of the learner. This naturally implies the necessity of a detailed comparison of the structure of a native and a target language which has been named contrastive analysis.

It should be borne in mind that though objective reality exists outside human beings and irrespective of the language they speak every language classifies reality in its own way by means of vocabulary units. In English, the word foot is used to denote the extremity of the leg. In Ukrainian there is no exact equivalent for foot. The word denotes the whole leg including the foot.

Classification of the real world around us provided by the vocabulary units of our mother tongue is learned and assimilated together with our first language. Because we are used to the way in which our own language structures experience we are often inclined to think of this as the only natural way of handling things whereas in fact it is highly-arbitary.

One example is provided by the words watch and clock. It would seem natural for Ukrainian speakers to have a single word to refer to all devices that tell us what time it is; yet in English they are divided into two semantic classes depending on whether or not they are customarily portable. We also find it natural that kinship terms should reflect the difference between male and female: brother or sister, father or mother, uncle or aunt, yet in English we fail to make this distinction in the case of cousin (the Ukrainian , ).

Contrastive analysis also brings to light what can be labelled problem pairs, the words that denote two entities in one language and correspond to two different words in another language.

Compare, for example in Ukrainian and clock, watch in English, in Ukrainian and artist, painter in English.

Contrastive analysis on the level of the grammatical meaning reveals that correlated words in different languages may differ in the grammatical component of their meaning.

To take a simple instance Ukrainians are liable to say the news are good, the money are on the table, her hair are black, as the words , , have the grammatical meaning of plurality in the Ukrainian language.

Contrastive analysis brings to light the essence of what is usually described as idiomatic English, idiomatic Ukrainian the peculiar way in which every language combines and structures in lexical units various concepts to denote extra-linguistic reality.

For example, a typical Ukrainian word-group used to describe the way somebody performs an action, or the state in which a person finds himself, has the structure that may be represented by the formula adverb followed by a finite form of a verb (or a verb + an adverb), , // . In English we can also use structurally similar word-groups and say he smokes a lot, he learns slowly (fast). The structure of idiomatic English word-groups however is different. The formula of this word-group can be represented as an adjective + deverbal noun, he is a heavy smoker, a poor learner, The Englishman is a slow starter but there is no stronger finisher" (Galsworthy). Another English word-group used in similar cases has the structure verb to be + adjective + the infinitive, (He) is quick to realize, (He) is slow-to cool down,which is practically non-existent in the Ukrainian language. Commonly used English words of the type (he is) an early-riser, a music-lover, have no counterparts in the Ukrainian language and as a rule correspond to phrases of the type () pao , () .

Last but not least contrastive analysis deals with the meaning and use of situational verbal units, words, word-groups, sentences which are commonly used by native speakers in certain situations.

For instance when we answer a telephone call and hear somebody asking for a person whose name we have never heard the usual answer for the Ukrainian speaker would be (). The Englishman in identical situation is likely to say Wrong number .

To sum up contrastive analysis cannot be overestimated as an indispensable stage in preparation of teaching material, in selecting lexical items to be extensively practiced and in predicting typical errors. It is also of great value for an efficient teacher who knows that to have a native like command of a foreign language, to be able to speak what we call idiomatic English, words, word-groups and whole sentences must be learned within the lexical, grammatical and situational restrictions of the English language.

I.2. Statistical Methods of Analysis

An important and promising trend in modern linguistics which has been making progress during the last few decades is the quantitative study of language phenomena and the application of statistical methods in linguistic analysis.

The first requirement for a successful statistical study is the representativeness of the objects counted for the problem in question, its relevance from the linguistic point of view. Statistical approach proved essential in the selection of vocabulary items of a foreign language for teaching purposes.

It is common knowledge that very few people know more than 10% of the words of their mother tongue. It follows that if we do not wish to waste time on committing to memory vocabulary items which are never likely to be useful to the learner, we have to select only lexical units that are commonly used by native speakers.

It goes without saying that to be useful in teaching statistics should deal with meanings as well as sound-forms as not all word-meanings are equally frequent.

Besides, the number of meanings exceeds by far the number of words. The total number of different meanings recorded and illustrated in Oxford English Dictionary for the first 500 words of the Thorndike Word List is 14,070, for the first thousand it is nearly 25,000. Naturally not all the meanings should be included in the list of the first two thousand most commonly used words. Statistical analysis of meaning frequencies resulted in the compilation of A General Service List of English Words with Semantic Frequencies. The semantic count is a count of the frequency of the occurrence of the various senses of 2,000 most frequent words as found in a study of five million running words. The semantic count is based on the differentiation of the meanings in the OED and the frequencies are expressed as percentage, so that the teacher and textbook writer may find it easier to understand and use the list. An example will make the procedure clear.

room (space)

takes less room, not enough room to turn round (in)

make room for (figurative)

room for improvement 12%

come to my room, bedroom, sitting room; drawing room, bathroom 83%

(plural = suite, lodgings)

my room in college

to let rooms 2%

It can be easily observed from the semantic count above that the meaning part of a house (sitting room, drawing room,) makes up 83% of all occurrences of the word room and should be included in the list of meanings to be learned by the beginners, whereas the meaning suite, lodgings is not essential and makes up only 2% of all occurrences of this word.

In Ukrainian:

ʳ ( . , ) 41%

. 17%

, . (. , ) 3%

, . (. , ) 7%

³ (, ) - 29%

( , , ) 3%

One more specific feature must, however, be stressed here. All modern methods aim at being impersonal and objective in the sense that they must lead to generalizations verifiable by all competent persons. In this effort to find verifiable relationships concerning typical contrastive shapes and arrangements of linguistic elements, functioning in a system, the study of vocabulary has turned away from chance observation and made considerable scientific progress.

Thus, statistical analysis is applied in different branches of linguistics including lexicology as a means of verification and as a reliable criterion for the selection of the language data provided qualitative description of lexical items is available.

I.3. Immediate Constituents Analysis

The theory of Immediate Constituents (IC) was originally elaborated as an attempt to determine the ways in which lexical units are relevantly related to one another. It was discovered that combinations of such units are usually structured into hierarchically arranged sets of binary constructions. For example in the word-group a black dress in severe style we do not relate a to black, black to dress, dress to in. but set up a structure which may be represented as a black dress / in severe style. Thus the fundamental aim of IC analysis is to segment a set of lexical units into two maximally independent sequences or ICs thus revealing the hierarchical structure of this set. Successive segmentation results in Ultimate Constituents (UC), two-facet units that cannot be segmented into smaller units having both sound-form and meaning. The Ultimate Constituents of the word-group analysed above are: a | black | dress | in | severe | style.

It is mainly to discover the derivational structure of words that IC analysis is used in lexicological investigations. For example, the verb denationalise has both a prefix de- and a suffix -ise (-ize). To decide whether this word is a prefixal or a suffixal derivative we must apply IC analysis. The binary segmentation of the string of morphemes making up the word shows that *denation or *denational cannot be considered independent sequences as there is no direct link between the prefix de- and nation or national. In fact no such sound-forms function as independent units in modern English. The only possible binary segmentation is de | nationalise, therefore we may conclude that the word is a prefixal derivative. There are also numerous cases when identical morphemic structure of different words is insufficient proof of the identical pattern of their derivative structure which can be revealed only by IC analysis. Thus, comparing, snow-covered and blue-eyed we observe that both words contain two root-morphemes and one derivational morpheme. IC analysis, however, shows that whereas snow-covered may be treated as a compound consisting of two stems snow + covered, blue-eyed is a suffixal derivative as the underlying structure as shown by IC analysis is different, (blue+eye)+-ed. In Ukrainian: //, ///, ///, ////, //, ///, /////, ////, //, ///, ////.

It may be inferred from the examples discussed above that ICs represent the word-formation structure while the UCs show the morphemic structure of polymorphic words.

I.4. Distributional Analysis and Co-occurrence

Distributional analysis in its various forms is commonly used nowadays by lexicologists of different schools of thought. By the term distribution we understand the occurrence of a lexical unit relative to other lexical units of the same level (words relative to words / morphemes relative to morphemes). In other words by this term we understand the position which lexical units occupy or may occupy in the text or in the flow of speech. It is readily observed that a certain component of the word-meaning is described when the word is identified distributionally. For example, in the sentence The boy home the missing word is easily identified as a verb The boy went, came, ran, home. Thus, we see that the component of meaning that is distributionally identified is actually the part-of-speech meaning but not the individual lexical meaning of the word under analysis. It is assumed that sameness / difference in distribution is indicative of sameness / difference in part-of-speech meaning.

According to Z. Harris, "The distribution of an element is the total of all environments in which it occurs, the sum of all the (different) positions (or occurrences) of an element relative to the occurrence of other elements". In Soviet linguistics this definition has been improved, applied on different levels and found fruitful in semasiology. The "total" mentioned by Z. Harris is replaced by configurations, combining generalized formulas of occurrence with valency. Defining word classes for distributional analysis depends on the structural use of the word in the sentence.

Observation is facilitated by coding. In this, words are replaced by conventional word-class symbols. Each analyst suggests some variant suitable to his particular purpose. A possible version of notation is N for nouns and words that can occupy in the sentence the same position, such as personal pronouns. To indicate the class to which nouns belong subscripts are used; so that Np means a personal noun, Nm a material noun, Ncoll a collective noun, etc. V stands for verbs. A for adjectives and their equivalents, D for adverbs and their equivalents. Prepositions and conjunctions are not coded.

Observation is further facilitated by simplifying the examples so that only words in direct syntactic connection with the head-word remain.

Thus, when studying the verb make, for example: The old man made Henry laugh aloud may be reduced to The man made Henry laugh.

Until recently the standard context was taken to be the sentence, now it is often reduced to a phrase, so that this last example may be rewritten as to make somebody laugh.

When everything but the head-word of the phrase is coded we obtain the distributional formula: make+ Np + V

The examples collected are arranged according to their distributional formulas, and the analyst receives a complete idea of the environments the language shows for the word in question. The list of structures characteristic of the word's distribution is accompanied by examples:

Make + a + N - make a coat, a machine, a decision

Make + (the) + N + V - make the machine go, make somebody work

Make + A - make sure

Make + a + A+N - make a good wife.

In each of these examples the meaning of make is different. Some of these patterns, however, may be used for several meanings of the word make, so that the differentiation of meanings is not complete. Compare, for instance, the following sentences, where the pattern make + N remains unchanged, although our intuition tells us that the meaning of make is not the same:

60 minutes make an hour.

60 people make a decision.

A phrase, all elements of which, including the head-word, are coded, is called a distributional pattern, for instance to make somebody laugh to V1 Np V2

Another example:

Get + N (receive) get letter

Get + Adj (become) get angry

Get + Vinf (start) get to think

In Ukrainian:

- + N rainfalls

- + N train runs

- + N man goes (walks)

- + N it smokes

- + N winter approaches




( ) moves the knight

To conclude, distribution defined as the occurrence of a lexical unit relative to other lexical units can be interpreted as co-occurrence of lexical items and the two terms can be viewed as synonyms.

It follows that by the term distribution we understand the aptness of a word in one of its meanings to collocate or to co-occur with a certain group, or certain groups of words having some common semantic component.

I.5. Transformational Analysis

Transformational analysis in lexicological investigations may be defined as re-patterning of various distributional structures in order to discover difference or sameness of meaning of practically identical distributional patterns.

Word-groups of identical distributional structure when re-patterned also show that the semantic relationship between words and consequently the meaning of word-groups may be different. For example, in the word-groups consisting of a possessive pronoun followed by a noun, his car, his failure, his arrest, his goodness, etc., the relationship between his and the following nouns is in each instant different which can be demonstrated by means of transformational procedures.

his car (pen, table) may be re-patterned into he has a car (a pen, a table) or in a more generalised form may be represented as A possesses B.

his failure (mistake, attempt) may be represented as he failed (was mistaken, attempted) or A performs which is impossible in the case of his car (pen, table).

his arrest (imprisonment, embarrassment) may be re-patterned into he was arrested (imprisoned and embarrassed) or A is the goal of the action B.

his goodness (kindness, modesty) may be represented as he is good (kind, modest) or is the quality of A.

In Ukrainian:

- ,

- ,

- , ()

- , 䳺

- , ,

- , ,

- , ,

- ,

- ,

- ,

- ,

Types of transformation differ according to purposes for which transformations are used.

There are:



additin (or expansion)


Transformational procedures are also used as will be shown below in componental analysis of lexical units.

I.6. Componential Analysis

Componential analysis is thus an attempt to describe the meaning of words in terms of a universal inventory of semantic components and their possible combinations.

Componential approach to meaning has a long history in linguistics.

L. Hjelmslev's commutation deals with similar relationships and may be illustrated by proportions from which the distinctive features d1, d2, d3 are obtained by means of the following procedure:

d1 = 'boy' = 'man' = 'bull'

'girl' 'woman' 'cow'


d2 = 'boy' = 'girl'

'man' 'woman'

d3 = 'boy' = 'girl'

'bull' 'cow'

As the first relationship is that of male to female, the second, of young to adult, and the third, human to animal, the meaning 'boy' may be characterized with respect to the distinctive features d1, d2, d3 as containing the semantic elements 'male', 'young' and 'human'. The existence of correlated oppositions proves that these elements are recognized by the vocabulary.

In criticizing this approach, the English linguist Prof. W. Haas argues that the commutation test looks very plausible if one has carefully selected examples from words entering into clear-cut semantic groups, such as terms of kinship or words denoting colours. It is less satisfactory in other cases, as there is no linguistic framework by which the semantic contrasts can be limited. The commutation test borrows its restrictions from philosophy.

A very close resemblance to componential analysis is the method of logical definition by dividing a genus into species and species into subspecies indispensable to dictionary definitions. It is therefore but natural that lexicographic definitions lend themselves as suitable material for the analysis of lexical groups in terms of a finite set of semantic components. Consider the following definitions given in Hornby's


Cow a full grown female of any animal of the ox family.

Calf the young of the cow.

The first definition contains all the elements we have previously obtained from proportional oppositions. The second is incomplete but we can substitute the missing elements from the previous definiton. It is possible to describe parts of the vocabulary by formalising these definitions and reducing them to some standard form according to a set of rules.

Componential analysis may be also arrived at through transformational procedures. It is assumed that sameness / difference of transforms is indicative of sameness / difference in the componental structure of the lexical unit. The example commonly analysed is the difference in the transforms of the structurally identical lexical units, puppydog, bulldog, lapdog. The difference in the semantic relationship between the stems of the compounds and hence the difference in the component of the word-meaning is demonstrated by the impossibility of the same type of transforms for all these words. Thus, a puppydog may be transformed into a dog (which) is a puppy, bull-dog, however, is not a dog which is a bull, neither is a lapdog a dog which is a lap. A bulldog may be transformed into a bulllike dog, or a dog which looks like a bull, but a lapdog is not a dog like a lap.

In Ukrainian:

- ( ') () () () (, ) ( )

- ( ') () () () () (, )

( ) ( ) ( )

( ) ( ) ( ) (, ).

I.7. Method of Semantic Differential

All the methods of semantic analysis discussed above are aimed mainly or exclusively at the investigation of the denotational component of the lexical meaning.

The analysis of the differences of the connotational meaning is very hard since the nuances are often slight, difficult to grasp and do not yield themselves to objective investigation and verification.

An attempt to establish and display these differences was developed by a group of American psycholinguists. They set up a technique known as the semantic differential by means of which, as they claim, meaning can be measured. It is perfectly clear, however, that what semantic differential measures is not word-meaning in any of accepted senses of the term but the connotational component of meaning or to be more exact the emotive charge.

Their technique requires the subjects to judge a series of concepts with respect to a set of bipolar (antonymic) adjective scales. For example, a concept like horse is to be rated as to the degree to which it is good or bad, fast or slow, strong or weak, etc.











The meaning of the seven divisions is, taking as an example the first of the scales represented above, from left to right: extremely good, quite good, slightly good, neither good nor bad (or equally good and bad) slightly bad, quite bad, extremely bad.

In the diagram above horse is described as neither good nor bad, extremely fast, quite strong, slightly hard, equally happy and sad. The responses of the subjects produce a semantic profile representing the emotive charge of the word.

In Ukrainian:



_ +








I.8. Contextual Analysis

Contextual analysis concentrates its attention on determining the minimal stretch of speech and the conditions necessary to reveal in which of its individual meanings the word in question is used. In studying this interaction of the polysemantic word with the syntactic configuration and lexical environment contextual analysis is more concerned with specific features of every particular language than with language universals.

Roughly, context may be subdivided into lexical, syntactical and mixed. Lexical context, for instance, determines the meaning of the word black in the following examples. Black denotes colour when used with a key-word naming some material or thing, black velvet, black gloves. When used with keywords denoting feeling or thought, it means 'sad', 'dismal': black thoughts, black despair. With nouns denoting time, the meaning is 'unhappy', 'full of hardships': black days, black period.

In Ukrainian: ; - , , , - , - , . - , , , , () - , , , .

If, on the other hand, the indicative power belongs to the syntactic pattern and not to the words which make it up, the context is called syntactic. Make means 'to cause' when followed by a complex object: I couldn't make him understand a word 1 said.

A purely syntactic context is rare. As a rule the indication comes from syntactic, lexical and sometimes morphological factors combined. Thus late, when ussd predicatively, means 'after the right, expected or-fixed time', as to be late for school. When used attributively with words denoting periods of time, it means 'towards the end of the period', in late summer. Used attributively with proper personal nouns and preceded with a definite article, late means 'recently dead'.

To sum up, the study of details may be more exact with the contextual method.


Acquaintance with the currently used procedures of linguistic investigation shows that contrastive analysis and statistical analysis are widely used in the preparation of teaching material and are of primary importance for teachers of English.

The special interest of contemporary science in methods of linguistics research extends over a period of about twenty five years. The present status of principles and techniques in lexicology, although still far from satisfactory, shows considerable progress. The structural synchronic approach may be said to have grown into a whole system of procedures which can be used either successively or alternately.

The main procedures belonging to this system are the analysis into immediate constituents; distributional analysis with substitution test as part of it; transformational analysis; componential analysis, and statistical analysis.

Bach of these techniques viewed separately has its limitations but taken together they complete one another, so that each successive procedure may prove helpful where the previous one has failed. We have considered these devices time and again in discussing separate aspects of the vocabulary system. All these are formalized methods in the sense that they replace the original words in the linguistic material sampled, for analysis by symbols that can be discussed without reference to the particular elements they stand for, and then state precise rules for the combination and transformation of formulas thus obtained.


