The meaning of meaning

The meaning of Meaning

Abstract

The study of knowledge, especially its use by humans to support cognition and organise behaviour, is arguably an immature, flawed science, in that it is based upon an incomplete and/or inconsistent theoretical foundation (set of fundamental physical principles).

A new, completely physicalist definition of meaning is developed which relies upon the idea of the perceptual feature, as described by Treisman & Gelade (1981). This revised model of meaning significantly improves the ability of science to model cognition. Therefore, it promises to remedy some of Cognitive Science's inherited failings.

This new conceptualisation of meaning unites separately developed yet functionally overlapping viewpoints from the semiotic, linguistic and computational sciences within a single consistent, but much simpler paradigm.

The new physical model of meaning should lead to increased rate of progress in those research topics which are by nature interdisciplinary (and therefore more vulnerable), like Cognitive Science, Artificial Intelligence, Autonomous Robotics, Machine Learning, Computational Linguistics and Vision Systems.

Introduction

Meaning, knowledge, thought, memory and language are all defined in terms of the others, forming a referentially circular system. Changing the way that one of these concepts is defined will impact upon all of the others.

By 'meaning' the first thing that comes to mind is the experientially referential interpretation possessed by all forms of language, typically a spoken or written communication or description. That is, language is used between people to share salient aspects of certain real or imagined worldly situations. The concept of meaning is used to discuss and evaluate the value and content of images, drawings and pictures as well.

In a way, a spoken sentence has a similar power to a drawing. The voice's modulation of the vibrations in the air describes various parts of an experience or situation in a similar manner to the way a pencil draws shapes and lines upon a sheet of paper, so depicting the thing that the artist sees. Indeed, we often speak of an artist or designer's clever (or clumsy) use of 'visual' language. Hence meaning, as a concept, can be profitably applied to both serial and parallel information structures.

The smallest familiar building block of spoken meaning is the word. For a word's meaning, we customarily turn to the word's dictionary entry. The word 'table' for example, refers to an imaginary examplar which could be any table at all. The word appears in the singular form, but it actually applies to a great plurality- the entire class of table-like things.

Using words in sentences has the effect of making this generic dictionary meaning apply in a much more specific way. Each of the words in a sentence when considered individually has a very wide 'semantic field', ie the list of possible referents (objects or actions) the word (noun or verb) can refer to. As soon as the words are brought together into a sentence, they all have the effect on each other of mutually reducing the span of their respective semantic fields.

In this sense, words behave just like people. An unemployed but qualified engineer is ready and willing to do any manner of engineering tasks. After being employed by one specific company who make, say, microwidgets, the engineer's potential range of tasks is reduced down to an actual task, the job of designing microwidgets. Indeed, the same thing could be said of all the other people the engineer works with. The specific task of making microwidgets has narrowed and focussed the collective functions of the group.

Semantics is dependency-based

Whether considering people who have come together to cooperate on making one product or service, or words which have been used in a sentence to produce meaning, the mechanism is one of context dependent functional inter-relationships. Meaning (semantics) is therefore a dependency-based concept. It is global, refering to the entire semantic unit, and combinational, ie it is a function of the choice, or selection of the words used, not their order, or sequence.

How do we know what word combinations mean? We are taught by our parents in infancy. The 'game' of language acquisition by the infant involves noticing what situational difference occurs when the parent (typically, mum) introduces each new word. First, the infant must consistently recognise each word as a unique token, a repeated collection of sounds with no meaning yet. The analysis of Treisman & Gelade (1981) demonstrate clearly that language acquisition consists of two quite separate stages, symbol creation (token identification), and symbol binding (token implementation, connecting the symbol to a semantic referent). Note that this is not the only theory of language learning - Chang, Dell & Bock (2006) suggest a connectionist mechanism based on Jeff Elman's Simple Recurrent Networks (SRN's) to posit a 'one symbol at a time' learning mechanism using a non-discrete context consisting of a recurrent (not recursive) series of analog network weights. Treisman's system requires that many or all of the lower level symbols (the learning context) must first be remembered. This is not mere detail, but goes to the heart of the true nature of all cognition- is it sub-symbolic, as the connectionists claim, or is it a discrete symbolic computer as CTM believers advocate. The CHI theory is a discrete symbol theory.

Language use is a very schizophrenic business. The semantic producer's task and the semantic consumer's task are distinct and different from each other, but must be seamlessly combined, and smoothly integrated within each person, since conducting a conversation or using text messaging clearly involves a continual swapping of the two roles within each correspondent.

It can be demonstrated that, within the one person, both roles are always simultaneously active. Within the physical listener is a virtual speaker, which uses feedback to compare the incoming words against the lexical expectations of an internal mimic. Within the physical speaker is a virtual listener, giving the speaker continual feedback about how easily what is being said can be understood by the actual listener.

Even when we consider more one-sided semantic activities, like reading a book, listening to the radio, delivering a lecture, or writing an essay, these arguments are equally applicable. In all cases, a predictive monitoring system is at work, functionally equivalent to the mirror-neuron circuits in laboratory macaques. Whether an action is performed in the first person, directed in the second person, or described in the third person, the neural circuits involved are identical.

For an explanation, we rely upon cybernetics, the science of systems organisation and control. When we perform physical activities, like driving a car, we operate in predictive mode, exploiting to the fullest extent both feedback and feedforward cybernetics modes and techniques to maximise motion control and minimise action effort. Speaking and listening are no different in this respect from any other equally time-critical behaviour.

The producer's task

Lets consider the semantic producer's task first. The principle of minimisation of effort applies, in that the author/communicator of the semantic representation would usually like to convey the required semantic value with existing resources. For example, we wish to construct sentences from a particular discourse with our existing lexicon, that is, without having to learn new words.

Roughly speaking, words have lexical value in a similar manner to the way that coins have monetary value. Each time we make a purchase, we really don't want to have to mint new coins to the exact value of the good or service purchased. Rather, we use a custom combination of tokens (the money we own), drawn from a finite set of symbols (the denominations the government issues), each with a mandated nominal 'face' value. The comparison with the way words are constructed from alphabets is not too obvious to avoid mention.

Syntax is constituency-based

The consumer's task is the easiest to define- they must 'understand' the representation they have received, ie they must work out its semantic value. This is done in two steps. First, they must make an educated guess about constituency, then they must decode core dependency relationships. Words change their function when placed in sentences, sometimes radically, but this context-dependent puzzle must wait until the customary constituency sub-structures (packaging techniques) can be determined.

Syntax, or constituency, refers to the way that the producer has gone about building the messages. The consumer must then (typically) break the sentences down into primary and subordinate clauses, then break the clauses down into noun and verb phrases, and so on. There is no real mystery here, as long as a suitably accessible analogy, eg that of automobile manufacture and use, is at hand.

Automobile engines from different car brands are all built in a pretty standardised manner, within a particular period of time, for example, 1970's American V8 engines are made pretty much in the same way. Linguistically, a given type of speaker, making a given type of message, will use similar methods of packaging the words into sentences, and assembling sentences into narrative depictions. When the time comes for the consumer (listener, reader) to operate the communicative 'vehicle', the driving 'controls' will all fall pretty readily to hand. As the listener attempts to understand the sentence (drive the 'vehicle' over a semantic distance), the individual variations in proprietary design are quickly factored away, allowing the core functions of the 'vehicle' (ignition, steering, brakes, gas pedal, gears, lights, turn signals, etc) to be accessed, and the vehicle can be driven.

The producer's task is constructing the syntax of the communicative representation. For example, the speaker needs to work out how to 'build' each sentence. During the plastic infant stage of first language acquisition, new words are as easy to learn as using the ones we already know. As adults, new words are much more difficult to acquire, and the easiest method of constructing new semantic values (ie composing new sentences) is to select the words which in combination come closest to the required semantic target. Then we construct semantic constituents, or combinations of known words in which a subordinate word (typically an adjective, adverb or preposition) modifies the dependency value of the core word, by virtue of its close positional association.

Syntax is a process of semantic interpolation. It involves finding a combined value that is intermediate between the individual constituent values. By the same measure, Semantics (ie ascertaining the core semantic values of words) is a process of semantic extrapolation. Before we can show why this is true, we need to define semantic value in terms of building blocks called 'features' and 'measures'.

--------------------------------@---------------------------------

How does the meaning conveyed by a sentence depend on the choice and arrangement of words? This is currently an open problem in structural linguistics. A possible solution to this problem will be presented.

There are two competing theories of grammar. The first is Dependency Theory, which gets its name from the idea that the meaning of a given word in a sentence depends on its context, ie semantic values are a function of relative position (context dependency). The second is Constituency Theory, which maintains that semantic value of a word is a function of its role in constituent structures, typically phrases and clauses. This is demonstrated in the diagram above, taken from' Dependency Grammar and Dependency Parsing' by Joakim Nivre.

The semantic connection between a word and it's referent (eg, the word 'table' and the furniture item with a flat surface and four legs) is only straightforward in a small percentage of cases, eg simple concrete nouns such as the example given. Ultimately, a word can take on any meaning whatsoever, if that word was successfully applied in a given situation. Confusion about the meaning of that word arises when the sentence it appears in is examined in isolation. When the sentence is examined in its enclosing paragraph, however, these problems usually, but not always disappear.

The fundamentally indirect nature of the relationship was first described in its modern form by C. S. Pierce, who criticised the most popular model at the time, that of F. Saussure. Saussure's model was typical for the time, and was a bivalent model, which consists of only symbol and referent. Pierce demonstrated that all bivalent models are patently inadequate, and identified the faults inherent in all direct symbol-referent meaning systems. He proposed a trivalent model, which interposed another function, that of representation, between symbol and referent.

-----------------------------------------------------------------@-----------------------------------------------------------------------

The issue of how to analyse the meaning of words in a sentence has remained an unresolved one for such a long time (since Humboldt, at least), that one could easily be forgiven for assuming that this question of analysis (given the sentence, find the meaning) is all there is to structural linguistics.

Going from the sentence to the meaning is sometimes called the 'consumer's problem'- the terminology arises from viewing the recipient of the message, that is, the reader or listener, as the consumer of its semantic content, or 'payload'. This paradigm therefore admits to the existence of the opposite, or 'dual' task.

The 'producer's problem' describes the opposite predicament, that of going from the meaning to the sentence. This is the task performed by the sender of the message, the writer or speaker, who is therefore the producer of the message's semantic value. The producer doesn't need to analyse the message for meaning, because they are its authors! They already know the meaning they wish to communicate. Their problem is not semantics, but syntax, that is, how to build a sentence from words whose semantic value changes radically whenever they are joined in a sentence.

Consider the difference in meaning between the dictionary entry for a word, say 'table', and the same word used in a real sentence, for example, 'The student put the apple on the teacher's table'. The word by itself has a very general meaning- essentially, it refers to every table everywhere and anywhere. When it is used in the example sentence, it now refers to a specific table, the (real or imaginary) one in the story.

The sentence that the producer constructs depends as much on the size and selection of their vocabulary, as the semantic content of the message. Linguistic data structures have an almost infinite diversity, yet rely on a large but finite lexicon. The way that language yields infinite output from finite resources has been described as miraculous. There is a very large number of ways to communicate any given message, that is, there is no unique function which gives a unique (expression) output corresponding to each semantic input. This is in stark contrast to the inverse function (comprehension), which reliably gives a unique meaning for each sentence across a diverse range of consumers. On the relatively rare occasions that genuine confusion as to meaning arises, it is often cause for humor, or leads to the producer voluntarily rephrasing the sentence or replacing a contentious word with a synonym, so as to side-step the glitch.

Therefore, to minimise effort over the long term, the producer needs to use a systematic method to choose words from the existing lexicon. To produce timely results, a reliable method (computation technique, or algorithm) should be used, one that is not vulnerable to exponential complexity issues. The type of algorithm needed is known in AI circles as an 'heuristic', a pseudotechnical term that means 'rules of thumb', 'kludge' or 'informed guesswork'. The producer's task is similar in principle to that faced by a chess player deciding what move to make next. There are too many legal chess moves (analogous to meaningful word choices) to make the right choice easily.

One technique that makes the producer's job possible is to use habit (autonomic learning of disambiguity rules) to choose the words. This technique can be used by itself when the situation semantics are highly regular, eg greetings, pleasantries, 'handshaking' during phone calls ('hello/hello' in English, 'moshi moshi' in Japanese). Such habitual utterances, used in a reflexive, stimulus-response fashion, are best learned from one's own family group and historical culture by means of informal, unstructured conversations.

Where the required range of expression is greater than that used in such pre-packaged human situations, a more structured, collectively organised system which engages conscious as well as autonomic cognitive resources must be used. This is the educational framework known as formal 'grammar'.

Learning grammar rules as a child turns an apparent deficit, the arbitrary nature of word meanings learned by context dependency, into an advantage. If any word:meaning mappings are possible as long as everyone knows them, it makes sense to teach grammatical rules that are highly ordered and tightly controlled. This long-term, mass education strategy makes life easier for both consumer and producer.

Teaching grammar is a laborious, costly, society-wide enterprise that is only successful if enough teaching institutions exist, since the training process relies on the critical dependence of semantics on embedded, or situational, learning of context dependency - that is, well equipped classrooms with common syllabi, competent teachers and motivated, orderly pupils.

Of course, not everyone learns to write. Indeed, many human cultures possess neither writing nor organised education. In 'civilized' cultures, the grammatical rules for writing (orthography) are used to judge what speech constructs are acceptable. In less sophisticated cultures ('primitive'), unique languages evolve via historical routes, and at the very least, inherit a variant of the basic tripartite predication scheme (subject-object-verb, or SOV for example) in an order that is fixed for each language group.

Summarising the discussion so far, we have analysed the functional inversion that lies at the heart of the difference between consumer and producer processes. We have arrived at the conclusion that an organised method of constructing new meanings without learning new words is highly advantageous, especially to help the producer choose from too many semantic possibilities.

We will now discuss how a grammar helps the producer. Basically, all sentences are divided into smaller pieces which can be treated by techniques that are already in use for computing semantics of words and sentences.

The producers task is made much worse by increases in the number of words in each sentence. The maths of context sensitivity arises from set theory. The possible semantic values of a 'bag of words' increase combinatorially. Therefore, breaking a sentence with n words into two parts with n/2 words reduces the total effort needed not by 1/2, but by exp(-n), which is approximately 1/6.

These parts are called Constituents, in the general case. Common types of constituents are called clauses (micro-sentences, which use the same rules for computing embedded semantics of sentences) and phrases (macro-words, which use the same rules for computing isolated semantics of words). To apply rules in a systematic way, phrases and clauses must be classified in a 'natural' way. There are two ways to do this. The first is to use the 'native' predicate tripartition, eg SVO for English. The subject, verb, or object is used to create semantic classes. The second is to use the word's usual or most common functions to classify them, eg verb, noun, descriptor, designator, delimiter.

We can now see how the producer and consumer roles interact. The producer uses words whose core semantics is a historico-cultural function of Dependancy relations. Generally speaking, these words will be used in situations far removed from those situational contexts in which they were originally and laboriously learned. Hence some system is needed to 'adjust' the core semantics of the closest words that fit the purpose at hand. The method chosen is to construct small groups of words, with a subordinate word being used to modify the core semantics of the dominant one. For example, in the phrase ' the orange table', 'table' is the dominant word, a noun, and supplies the core semantic context, while 'orange', an adjective, is the subordinate word and modifies one aspect of the table's core semantics (ie adds the colour).

Bock et al present the following use of the following in-line term substitution- an editor of a celebrity gossip Web site created a verb to refer to the ravenous way that Catherine Zeta-Jones eats vegetarian food. The editor had written, “I had zetajonesed one too many carb-loaded dinners at Babbo to fit into my size zero skirt”. It is interesting to deconstruct what is happening, linguistically speaking. The zootaxic term 'zetajones' is inserted in the verb's 'slot' in the morphemic clause 'I had ___ed'. The listener is implicitly forced to compare the actual utterance from the ones it expects. The resultant semantics is an interpolation, where the surname (a proper noun) 'zeta-jones' is somehow converted into a verb morpheme, bringing in with it many of the famous actresses semantic 'baggage'.

Generalising, such syntax rules ('grammar') do not treat each word as members in a set of semantic equals, contextually speaking. Rather, they provide another functional dimension, that of word types, from which to construct and evaluate semantic values

The producer uses the Constituency function, to construct an approximate plan for each part of the sentence's 'surface' structure, then uses Dependency functions to choose the dominant and subordinate semantic units that comprise each Constituent part. Heuristic algorithms (iterative guesses, 'try and better') are still used to choose the 'parts of speech', but since their type is known first, and the size of semantic micro-contexts (simultaneous word uses) is small, complexity issues are avoided.

The consumer must also navigate this two-level architecture, by consciously guessing the most likely surface Constituency relations, then navigate deep structure by using autonomic training, thus quickly resolving Dependency relations for each constituent object, and thus decoding the sentence.

The computation of SEMANTICS is the CONSUMER'S PROBLEM

The computation of SYNTAX is the PRODUCER'S PROBLEM

Both Dependency and Constituency relationships arise from procedural learning, that is, what used to be called behaviourism. These relationships are learned at the automatic level, that is, like autonomic somatic functions (reflexes and postures). In fact, since one of the basic CHI principles is that language is a type of behaviour, symbol semantics are types of reflexes (S->R) and postures (R->S).

The main difference is that while learning Dependency-based semantic relationships, there is an external teacher (typically, a parent) while in the learning of Constituency-based semantic relationships, there is no external teacher- rather, the language user themselves is responsible for constructing the associative learning frameworks.

These two options correspond exactly to metaphor and analogy. In metaphor, someone else 'coins' the metaphor, ie the new word is placed somewhere where another word or term is expected. The reader/listener is forced to (a) include the new word within the previous dependency framework, or (b) admit to nonsense. The least costly path is usually (a) that is, to include a new semantic class in the previous word 'slot'.

In constituency, the same mechanism of metaphor, ie context dependent learning, is used. However, the reader/listener is not faced with such a forced choice- this time they can either accept the new word meaning, or abandon the new constituent, not the entire sentence. If I say "hand me the fyunipodixor tool" to my colleague", he will usually reply "which tool is that?". He knows I am talking about a tool he has not yet seen or heard about, and that I am not talking nonsense.

Instead of metaphor the appropriate mechanism is analogy. We are not introducing a completely new semantic class (exemplar) name, as in the case of metaphor, we are introducing a new instance, a variant of an existing class.

We are now in a position to introduce a critical factor to the analysis, without which a solution would be more difficult, perhaps impossible. This is to use perceptual feature analysis to allocate precise computational substructure to semantic vehicles like words and sentences.

Feature theory and Semantics

Treisman's Feature Theory

The idea of a feature is not an easy one to master, because it is (a) subjective (b) recursive , (c) embodied (d) holistic (e) fractalinear, or semi-finite.

Firstly, features are subjective. This is a huge conceptual leap in itself, because we are taught to think about our surroundings in objective terms- this WRONG approach is summarised by the epithet, "what you see is what I see". In fact, each animal species is equipped with specific feature detectors, environmental filters which allow only relevant signals to 'register'. Luckily, most animals have similar requirements and behaviour types, so they share a lot of basic level features.

Secondly, features are recursive. The real advantage of subjective features is that they admit to semantic grounding, a recursive decomposition process in which objective features (those detected by distal sensory channels like vision and hearing) are converted into their subjective (embodied, body centred, embedded) root-level components. Recursive feature grounding is what gives us the ability to see what a complex image, or a part of it, really 'means', whereas a robot with equivalent visual powers sees a detailed data representation which it does not implicitly understand.

Thirdly, features are 'embodied', that is, they are ultimately reducible to combinations of one's own reflexes and postures. This is actually an old idea, first mentioned by the 18th Century philosopher Immanuel Kant in his classic treatise, 'A critique of pure reason". Most of the current crop of cognitive scientists believe that human-level intelligence is embodied, or explainable only in terms of deeply embedded biocomputation processes.

Fourthy, features are holistic, that is, the recursive feature hierarchy is used to encode the organism's entire global state, or gestalt. This mechanism is necessary in order to adequately explain how the frame problem is to be avoided. In the early 20th Century, the Estonian semiotician Jacob von Uexkull coined the german word 'umwelt' for the entire complex recursive feature hierarchy that filled each animals entire sensorium. William T. Powers (who called it 'the method of levels') in the U.S. and Charles Dyer (who called it 'Situation Image') in Australia both independently rediscovered this vital idea in recent times.

Finally, features are fractalinear. They consist of a boundary in dimensional space, which is called a 'measure' in CHI theory. It is a line defining the edge of a binary, present-or-absent region in the sensor map. The 1960's research of Hubel & Wiesel in kitten vision demonstrates that these lines are not abstract concepts, but kittens raised in visual environments which lack horizontal features (such as a room painted with vertical zebra stripes) are entirely naive to 'visual cliff edges' , eg the series of descending steps in a staircase.

Consider the square in the diagram. This example illustrates some of these 5 properties of features. It can be recursively decomposed into four corners (angular structures, or postures) and four sequential 90 degree rotations (rotational processes, or reflexes).

Applying Feature Theory to the problem of meaning

Once we have understood the full significance of features to the problems of AI, a lot of the 'heavy lifting' needed to define the meaning of Meaning has been done. We can proceed almost immediately to the definition itself, which is as follows-

The meaning of a representation is the (physical or virtual) object or process that it points to within its parent (current, or putative) Situation Image, or SI, which is a hierarchical, recursive, feature map that represents the global state of the entire organism-in-the-world. The SI was derived in 2011 from Dyer's CHI theory, but an identical concept was independently discovered by Estonian semiotician, Jakob von Uexkull, who called it the 'umwelt'. One of the things that make the SI interesting is that it offers an escape from Descartes Paradox, the infinite homuncular regression that forms the central feature of the so-called 'Cartesian Theater'.

Developing the Situation Image (SI) concept

The idea behind the SI is precisely the same as that underpinning the principles of cybernetic control, namely, using feedback to adjust system behaviour. The simplest example of cybernetic control is the room temperature thermostat. It belongs to the general class of feedback devices called homeostats, a name derived from the classic Greek words for 'staying the same'. Thermostats only control one variable, room temperature. They only perform one computation, subtracting that measurement from an internally stored datum called a 'setpoint'. If the difference is positive, the thermostat turns on the heating plant; if negative, it turns the heater off.

Now imagine how a complex system containing thousands of interdependent features might be controlled. Which feature's feedback levels would be measured? What kind of combination of measurements would best characterise the global system state? It turns out that the SI is the best way of controlling the entire system with a single feedback loop. The Situation Image is a data representation that truly describes the combined effect of the Self-in-World. This is where the subjective (see item (a)) nature of features comes into its own. If features were defined objectively (that is, with global knowledge, or omniscience), defining self-in-world would prove impossible- how could everything in the world be included? With subjective theories, however, the limits to perception correspond to the boundary of the sensory window. Like the features within the self, the perceptual features are those within this window, hence finite in number and type.

Not that the computation of the SI is a trivial matter. Far from it. Every feature within the limits of perception (the window's frame) across all modalities must first be related to every other feature. Then, the structural (historical) and causal (contemporaneous) links between the distal (world) features and the proximal (self) measures must be established so that memory representations can be semantically 'grounded'.