Datatype
The Television & Movie Wiki: for TV, celebrities, and movies.
In computer science, a datatype (often simply a type) is a name or label for a set of values and some operations which one can perform on that set of values. Programming languages implicitly or explicitly support one or more datatypes; these types may act as a statically or dynamically checked constraint, ensuring valid programs for a given language.
Contents |
Basis
Assigning datatypes ("typing") has the basic purpose of giving some semantic meaning to otherwise meaningless collections of bits. Types usually have associations either with values in memory or with objects such as variables. Because any value simply consists of a set of bits in a computer, hardware makes no distinction even between memory addresses, instruction code, characters, integers and floating-point numbers. Types inform programs and programmers how they should treat those mere bits.
Major functions that type systems provide include:
- Safety - Use of types may allow a compiler to detect meaningless or provably invalid code. For example, we can identify an expression
"Hello, World" / 3as invalid because one cannot divide (in the usual sense) a string literal by an integer. As discussed below, strong typing offers more safety, but it does not necessarily guarantee complete safety (see type-safety for more information). - Optimization - Static type-checking may provide useful information to a compiler. For example, if a type says a value must align at a multiple of 4, the compiler may be able to use more efficient machine instructions.
- Documentation - In more expressive type systems, types can serve as a form of documentation, since they can illustrate the intent of the programmer. For instance, timestamps may be a subtype of integers -- but if a programmer declares a function as returning a timestamp rather than merely an integer, this documents part of the meaning of the function.
- Abstraction (or modularity) - Types allow programmers to think about programs at a higher level, not bothering with low-level implementation. For example, programmers can think of strings as values instead of as a mere array of bytes. Or types can allow programmers to express the interface between two subsystems. This localizes the definitions required for interoperability of the subsystems and prevents inconsistencies when those subsystems communicate.
Typically a program associates each value with one particular type (although a type may have more than one subtype). Other entities, such as objects, modules, communication channels, dependencies, or even types themselves, can become associated with a type. For example:
A type system, specified in each programming language, stipulates the ways typed programs may behave and makes behavior outside these rules illegal. An effect system typically provide more fine-grained control than a type system.
More formally, type theory studies type systems.
Type checking
The process of verifying and enforcing the constraints of types - type checking - may occur either at compile-time (a static check) or run-time (a dynamic check). Static type-checking becomes a primary task of the semantic analysis carried out by a compiler. If a language enforces type rules strongly (that is, generally allowing only those automatic type conversions which do not lose information), one can refer to the process as strongly typed, if not, as weakly typed.
Static and dynamic typing
In dynamic typing, type checking often takes place at runtime because variables can acquire different types depending on the execution path. Static type systems for dynamic types usually need to explicitly represent the concept of an execution path, and allow types to depend on it. This seems to require either a trivial or a cumbersome type system in order to work well.
C, C++, Java, ML, and Haskell are statically typed, whereas Objective-C, Scheme, Lisp, Smalltalk, Perl, PHP, Visual Basic, Ruby, and Python, are dynamically typed. Dynamic typing is often associated with so-called "scripting languages" and other rapid application development environments. One tends to see dynamic types more often used in interpreted languages, whereas static types are used in compiled languages. See typed and untyped languages for the complete list of typed and untyped languages.
Duck typing is a humorous way of describing the (dynamic) typing typical of many scripting languages which guess the type of a value. Initially coined by Dave Thomas in the Ruby community, its premise is that "(referring to a value) if it walks like a duck, and quacks like a duck, then it is a duck".
To see how type checking works, consider the following pseudocode example:
var x; // (1) x := 5; // (2) x := "hi"; // (3)
In this example, (1) declares the name x; (2) associates the integer value 5 to the name x; and (3) associates the string value "hi" to the name x. In most statically typed systems, this code fragment would be illegal, because (2) and (3) bind x to values of inconsistent type.
By contrast, a purely dynamically typed system would permit the above program to execute, because the name x would not have to have a consistent type. The implementation of a dynamically typed language will catch errors related to the misuse of values - "type errors" - at the time of the computation of the erroneous statement or expression. In other words, dynamic typing catches errors during program execution. A typical implementation of dynamic typing will keep all program values "tagged" with a type, and check the type tag before using any value in an operation. For example:
var x = 5; // (1) var y = "hi"; // (2) var z = x + y; // (3)
In this code fragment, (1) binds the value 5 to x; (2) binds the value "hi" to y; and (3) attempts to add x to y. In a dynamically typed language, the value bound to x might be a pair (integer, 5), and the value bound to y might be a pair (string, "hi"). When the program attempts to execute line 3, the language implementation would check the type tags integer and string, discover that the operation + (addition) is not defined over these two types, and signals an error.
Some statically typed languages have a "back door" in the language that enables programmers to write code that does not statically type check. For example, C and Java have "casts".
The presence of static typing in a programming language does not necessarily imply the absence of dynamic typing mechanisms. For example, Java uses static typing, but certain operations require the support of runtime type tests, which are a form of dynamic typing. See programming language for more discussion of the interactions between static and dynamic typing.
Static and dynamic type checking in practice
The choice between static and dynamic typing requires some trade-offs. Many programmers strongly favor one over the other; some to the point of considering languages following the disfavored system to be unusable or crippled.
Static typing finds type errors reliably and at compile time. This should increase the reliability of the delivered program. However, programmers disagree over how commonly type errors occur, and thus what proportion of those bugs which are written would be caught by static typing. Static typing advocates believe programs are more reliable when they have been type-checked, while dynamic typing advocates point to distributed code that has proven reliable and to small bug databases. The value of static typing, then, presumably increases as the strength of the type system is increased. Advocates of strongly typed languages such as ML and Haskell have suggested that almost all bugs can be considered type errors, if the types used in a program are sufficiently well declared by the programmer or inferred by the compiler.
Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce machine code that just does the right thing. Further, compilers in statically typed languages can find shortcuts more easily. Some dynamically-typed languages such as Common Lisp allow optional type declarations for optimization for this very reason. Static typing makes this pervasive. See optimization.
Statically-typed languages which lack type inference – such as Java – require that programmers declare the types they intend a method or function to use. This can serve as additional documentation for the program, which the compiler will not permit the programmer to ignore or drift out of synchronization. However, a language can be statically typed without requiring type declarations, so this is not a consequence of static typing.
Static typing allows construction of libraries which are less likely to be accidentally misused by their users. This can be used as an additional mechanism for communicating the intentions of the library developer.
A static type system constrains the use of powerful language constructs more than it constrains less powerful ones. This makes powerful constructs harder to use, and thus places the burden of choosing the "right tool for the problem" on the shoulders of the programmer, who might otherwise prefer to use the most powerful tool available. Choosing overly powerful tools may cause additional performance, reliability or correctness problems, because there are theoretical limits on the properties that one can expected from powerful language-constructs. For example, indiscriminate use of recursion or global variables may cause well-documented adverse effects.
Dynamic typing allows constructs that some static type systems would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible (however, the typing within that evaluated code might remain static). Furthermore, dynamic typing accommodates transitional code and prototyping, such as allowing a string to be used in place of a data structure.
Dynamic typing allows debuggers to be more functional; in particular, the debugger can modify the code arbitrarily and let the program continue to run. Programmers in dynamic languages sometimes "program in the debugger" and thus have a shorter edit-compile-test-debug cycle. However, the need to use debuggers is considered by some to be a sign of design or development process problems.
Dynamic typing may allow compilers and interpreters to run more quickly, since there may be less checking to perform and less code to revisit when the source code changes. This, too, may reduce the edit-compile-test-debug cycle.
Strong and weak typing
Main article: strongly-typed programming language
A strongly typed language has several meanings; for a clearer discussion, see the main article on the topic. One definition involves it not allowing an operation to succeed on arguments which are of the wrong type. An example of the absence of strong typing is a C cast gone wrong; if you cast a value in C, not only is the compiler required to allow the code, but the runtime is expected to allow it as well. This allows C code to be compact and fast, but it can make debugging more difficult.
Sometimes the term memory-safe language (or just safe language) is used to describe languages that do not allow undefined operations to occur. For example, a memory-safe language will also check array bounds.
Weak typing means that types are implicitly converted (or cast) when they are used. If we were to revisit the previous example:
var x = 5; // (1) var y = "hi"; // (2) x + y; // (3)
Writing the code above in a weakly-typed language, such as older versions of Visual Basic (pre .Net), would produce runnable code which would yield the result "5hi". The system would convert the number 5 into the string "5" to make sense of the operation (the language overloads the + operator to mean both addition and concatenation). However, problems can ensue with such conversions and operators overloaded in this way. For example, would the following code produce a result of 9 or "54"?
var x = 5; var y = "4"; x + y;
In contrast, the REXX language (a weakly-typed environment because it only has one type) does not overload the + operator, and hence + always denotes addition. The equivalent of the first example would fail (one operand not a number), and the second would yield "9", unambiguously. Careful language design has also allowed other languages to appear weakly-typed (through type inference and other techniques) for usability while preserving the type checking and protection offered by languages such as VB.Net, C# and Java.
Polymorphism and types
The type system allows operations to be done relying on contexts by type. For example, in an arithmetic expression, a + b, if a and b are typed as integer, an underlying operation can be integer addition. If the type is real, floating-point addition is probably done. In generics the type of values determines which code will be executed. See also: type polymorphism
Explicit or implicit declaration and inference
Many static type systems, such as C's and Java's, require type declarations: the programmer must explicitly associate each variable with a particular type. Others, such as Haskell's, perform type inference: the compiler draws conclusions about the types of variables based on how the variables are used. For example, given a function f(x,y) in which x and y are added together, the compiler can infer that x and y must be numbers -- since addition is only defined for numbers. Therefore, any call to f elsewhere in the program that specifies a non-numeric type (such as a string or list) as an argument would be erroneous.
Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression 3.14 might imply that a type of floating-point while [1, 2, 3] might imply a list of integers; typically an array.
Collections of types
Types form natural collections that can often be indexed or listed to find specific types of the kind described.
- primitive types — the simplest kind of type, e.g. integer and floating-point number
- integral types — types of whole numbers, e.g. integers and natural numbers
- rational types — types of exact ratios: e.g. 1/3 or 3/5
- floating point types — types of numbers in floating-point representation
- composite types — types composed of basic types, e.g. records. Abstract data types have attributes of both composite types and interfaces, depending on who you talk to.
- subtype
- derived type
- object types, e.g. type variable
- partial type
- recursive type
- function types, e.g. binary functions
- universally quantified types, e.g. parametrized types
- existentially quantified types, e.g. modules
Specialized types
There are many different special kinds of types, which are associated with particular kinds of instances.
- class — type of an object in object-oriented programming
- interface — type of a dependency
- protocol — type of a communications channel
- kind — type of a type
- use case — type of interaction of a system and its environment
- layer — type of an execution context of a method of a class in object-oriented programming
Compatibility, equivalence and substitutability
The question of compatibility and equivalence becomes a complicated and controversial topic and relates to the problem of substitutability: that is, given type A and type B, are they equal types or compatible? Can the value with type B be used in the place where the value of A?
If type A is compatible with type B, A is a subtype of B while not always vice versa. The definition is known as the Liskov substitution principle.
Type conversion may take place in order to make a type compatible or substitutable in context.
Two different type compatibility methods exist: name (or nominal) compatibility and structure (or structural) compatibility. The terms "equivalence" and "compatibility" mean the same thing.
- Name type compatibility means that two variables have compatible types only if they appear in either the same declaration or in declarations that use same type name.
- Structure type compatibility means that two variables have compatible types if their types have identical structure.
Some variations of these two methods exist, and most languages use combinations of the different techniques.
See also
- Programming language
- Operator overloading
- Polymorphism in object-oriented programming
- Type signature
- Signednessast:Tipu de datu
de:Datentyp es:Tipo de dato fr:Type it:Tipo di dato lt:Duomenų struktūra ja:データ型 pl:Dynamiczne typowanie pt:Tipo de dado ru:Тип данных sv:Datatyp
