Strong and weak typing - Misplaced Pages

This is an old revision of this page, as edited by 71.232.14.147 (talk) at 04:08, 26 April 2014 (→Implicit type conversions and "type punning"). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 04:08, 26 April 2014 by 71.232.14.147 (talk) (→Implicit type conversions and "type punning")(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In computer programming, programming languages are often colloquially referred to as strongly typed or weakly typed. In general, these terms do not have a precise definition. Rather, they tend to be used by advocates or critics of a given programming language, as a means of explaining why a given language is better or worse than alternatives.

History

In 1974, Liskov and Zilles described a strong-typed language as one in which "whenever an object is passed from a calling function to a called function, its type must be compatible with the type declared in the called function." Jackson wrote, "In a strongly typed language each data area will have a distinct type and each process will state its communication requirements in terms of these types."

Definitions of "strong" or "weak"

A number of different language design decisions have been referred to as evidence of "strong" or "weak" typing. In fact, many of these are more accurately understood as the presence or absence of type safety, memory safety, static type-checking, or dynamic type-checking.

Implicit type conversions and "type punning"

Some programming languages make it easy to use a value of one type as if it were a value of another type. This is sometimes described as "weak typing".

For example, Aahz Maruch writes that "Coercion occurs when you have a statically typed language and you use the syntactic features of the language to force the usage of one type as if it were a different type (consider the common use of void* in C). Coercion is usually a symptom of weak typing. Conversion, OTOH, creates a brand-new object of the appropriate type."

As another example, GCC describes this as type-punning and warns that it will break strict aliasing. Thiago Macieira discusses several problems that can arise when type-punning causes the compiler to make inappropriate optimizations.

It is easy to focus on the syntax, but Macieira's argument is really about semantics. There are many examples of languages which allow implicit conversions, but in a type-safe manner. For example, both C++ and C# allow programs to define operators to convert a value from one type to another in a semantically meaningful way. When a C++ compiler encounters such a conversion, it treats the operation just like a function call. In contrast, converting a value to the C type "void*" is an unsafe operation which is invisible to the compiler.

Pointers

Some programming languages expose pointers as if they were numeric values, and allow users to perform arithmetic on them. These languages are sometimes referred to as "weakly typed", since pointer arithmetic can be used to bypass the language's type system.

Untagged unions

Some programming languages support untagged unions, which allow a value of one type to be viewed as if it were a value of another type. In the article titled A hacked Boolean, Bill McCarthy demonstrates how a Boolean value in .NET programming may become internally corrupted so that two values may both be "true" and yet still be considered unequal to each other.

Dynamic type-checking

Some programming languages do not have static type-checking. In many such languages, it is easy to write programs which would be rejected by most static type-checkers. For example, a variable might store either a number or the Boolean value "false". Some programmers refer to these languages as "weakly typed", since they do not seem to enforce the "strong" type discipline found in a language with a static type-checker.

Static type-checking

In Luca Cardelli's article Typeful Programming, a "strong type system" is described as one in which there is no possibility of an unchecked runtime type error. In other writing, the absence of unchecked run-time errors is referred to as safety or type safety; Tony Hoare's early papers call this property security.

Predictability

Some programmers refer to a language as "weakly typed" if simple operations do not behave in a way that they would expect. For example, consider the following program:

x = "5" + 6

Different languages will assign a different value to 'x':

One language might convert 6 to a string, and concatenate the two arguments to produce the string "56" (e.g. JavaScript, Java)
Another language might convert "5" to a number, and add the two arguments to produce the number 11 (e.g. Perl, PHP)
Yet another language might convert the string "5" to a pointer representing where the string is stored within memory, and add 6 to that value to produce a semi-random address (e.g. C)
And yet another language might simply fail to compile this program or run the code, saying that the two operands have incompatible type (e.g. Python, BASIC)

Languages that work like the first three examples have all been called "weakly typed" at various times, even though only one of them (the third) represents a safety violation.

Type inference

Languages with static type systems differ to the extent that users are required to manually state the types used in their program. Some languages, such as C, require that every variable be declared with a type. Other languages, such as Haskell, use the Hindley-Milner method to infer all types based on a global analysis. Other languages, such as C# and C++, lie somewhere in between; some types can be inferred based on local information, while others must be specified. Some programmers use the term weakly typed to refer to languages with type inference, often without realizing that the type information is present but implicit.

Variation across programming languages

Note that some of these definitions are contradictory, others are merely orthogonal, and still others are special cases (with additional constraints) of other, more "liberal" (less strong) definitions. Because of the wide divergence among these definitions, it is possible to defend claims about most programming languages that they are either strongly or weakly typed. For instance:

Java, Pascal, Ada and C require all variables to have a declared type, and support the use of explicit casts of arithmetic values to other arithmetic types. Java, C#, Ada and Pascal are sometimes said to be more strongly typed than C, a claim that is probably based on the fact that C supports more kinds of implicit conversions, and C also allows pointer values to be explicitly cast while Java and Pascal do not. Java itself may be considered more strongly typed than Pascal as manners of evading the static type system in Java are controlled by the Java Virtual Machine's type system. C# is similar to Java in that respect, though it allows disabling dynamic type checking by explicitly putting code segments in an "unsafe context". Pascal's type system has been described as "too strong", because the size of an array or string is part of its type, making some programming tasks very difficult.
The object-oriented programming languages Smalltalk, Ruby, Python, and Self are all "strongly typed" in the sense that typing errors are prevented at runtime and they do little implicit type conversion, but these languages make no use of static type checking: the compiler does not check or enforce type constraint rules. The term duck typing is now used to describe the dynamic typing paradigm used by the languages in this group.
The Lisp family of languages are all "strongly typed" in the sense that typing errors are prevented at runtime. Some Lisp dialects like Common Lisp or Clojure do support various forms of type declarations and some compilers (CMUCL and related) use these declarations together with type inference to enable various optimizations and also limited forms of compile time type checks.
Standard ML, F#, OCaml and Haskell are statically type checked but the compiler automatically infers a precise type for all values. These languages (along with most functional languages) are considered to have stronger type systems than Java, as they permit no implicit type conversions. While OCaml's libraries allow one form of evasion (Object magic), this feature remains unused in most applications.
Visual Basic is a hybrid language. In addition to variables with declared types, it is also possible to declare a variable of "Variant" data type that can store data of any type. Its implicit casts are fairly liberal where, for example, one can sum string variants and pass the result into an integer variable.
Assembly language and Forth have been said to be untyped. There is no type checking; it is up to the programmer to ensure that data given to functions is of the appropriate type. Any type conversion required is explicit.

For this reason, writers who wish to write unambiguously about type systems often eschew the term "strong typing" in favor of specific expressions such as "type safety".

References

Liskov, B; Zilles, S (1974). "Programming with abstract data types". ACM Sigplan Notices. CiteSeer: 10.1.1.136.3043.
Jackson, K. (1977). "Parallel processing and modular software construction". Lecture Notes in Computer Science. Lecture Notes in Computer Science. 54: 436–443. doi:10.1007/BFb0021435. ISBN 3-540-08360-X.
Typing: Strong vs. Weak, Static vs. Dynamic
Type-punning and strict-aliasing, Thiago Macieira
A hacked Boolean
ftp://gatekeeper.research.compaq.com/pub/DEC/SRC/research-reports/SRC-045.pdf page 3
Infoworld April 25, 1983

Common Lisp HyperSpec, Types and Classes
CMUCL User's Manual: The Compiler, Types in Python

Category:

Type theory