Misplaced Pages

Data: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 15:41, 25 April 2007 editDaniel.Cardenas (talk | contribs)Extended confirmed users, Pending changes reviewers11,481 edits See discussion page← Previous edit Latest revision as of 03:59, 13 January 2025 edit undoAugmented Seventh (talk | contribs)Extended confirmed users13,708 edits Reverted 1 edit by 37.39.169.64 (talk)Tags: Twinkle Undo Mobile edit Mobile web edit Advanced mobile edit Disambiguation links added 
(1,000 intermediate revisions by more than 100 users not shown)
Line 1: Line 1:
{{short description|Units of information}}
----
{{redirect|Scientific data|the journal|Scientific Data (journal){{!}}Scientific Data ''(journal)''}}
{{otheruses}}
{{pp-move-indef}}
'''Data''' is a ] for ]<ref>http://www.dict.org/bin/Dict?Form=Dict1&Query=data&Strategy=*&Database=*</ref>. Data is a measurement that can be disorganized and when the data becomes organized it becomes information. Data can be about reality or fiction such as a fictional ]. Data about reality consist of ]s. A large class of practically important propositions are ]s or ]s of a ].
{{for multi|data in computer science|Data (computer science)|other uses|Data (disambiguation)|and|Datum (disambiguation)}}
Such propositions may comprise ]s, ]s, or ]s.
]
{{Epistemology sidebar}}


'''Data''' ({{IPAc-en|ˈ|d|eɪ|t|ə}} {{respell|DAY|tə}}, {{IPAc-en|USalso|ˈ|d|æ|t|ə}} {{respell|DAT|ə}}) are a collection of discrete or continuous ] that convey ], describing the ], ], ], ], other basic units of meaning, or simply sequences of ]s that may be further ]. A '''datum''' is an individual value in a collection of data. Data are usually organized into ]s such as ]s that provide additional context and meaning, and may themselves be used as data in larger structures. Data may be ] as ]s in a ].<ref name=":0">{{cite book |title=OECD Glossary of Statistical Terms |page=119 |date=2008 |publisher=OECD |isbn=978-92-64-025561}}</ref><ref>{{cite web |url=https://abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+what+are+data |title=Statistical Language - What are Data? |date=2013-07-13 |website=Australian Bureau of Statistics |access-date=2020-03-09 |url-status=live |archive-url=https://web.archive.org/web/20190419010315/http://abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+what+are+data |archive-date=2019-04-19}}</ref> Data may represent abstract ideas or concrete measurements.<ref>{{Cite web|url=https://diffen.com/difference/Data_vs_Information|title=Data vs Information - Difference and Comparison {{!}} Diffen|website=www.diffen.com|language=en|access-date=2018-12-11}}</ref>
==Etymology==
Data are commonly used in ], ], and virtually every other form of human organizational activity. Examples of data sets include price indices (such as the ]), ]s, ] rates, and ] data. In this context, data represent the raw facts and figures from which useful information can be extracted.
The word ''data'' is the plural of ] '']'', ] past ] of ''dare'', "to give", hence "something given". The ] of "to give" has been used for millennia, in the sense of a statement accepted at face value; one of the works of ], circa 300 BC, was the ''Dedomena'' (in Latin, ''Data''). In discussions of problems in ], ], ], and so on, the terms ''givens'' and ''data'' are used interchangeably. Such usage is the origin of ''data'' as a concept in ]: ''data'' are numbers, words, images, etc., accepted as they stand. Pronounced dey-tuh, dat-uh, or dah-tuh.


Data are ] using techniques such as ], ], ], or ], and are typically '']'' as ]s or ]s that may be further ]. ] are data that are collected in an uncontrolled, ] environment. ] are data that are generated in the course of a controlled ] experiment. Data are ] using techniques such as ], ]ing, discussion, ], ], or other forms of post-analysis. Prior to analysis, ] (or unprocessed data) is typically cleaned: ]s are removed, and obvious instrument or data entry errors are corrected.
==Usage in English==
In ], the word ''datum'' is still used in the general sense of "something given", and more specifically in ], ], ], ] and ] to mean a reference point, reference line, or reference surface. The Latin plural ''data'' is also used as a plural in English, but it is perhaps more commonly treated as a ] and used in the ], at least in day-to-day usage. For example, "This is all the data from the experiment". This usage is inconsistent with the rules of Latin grammar, which would suggest, "These are all the data from the experiment" instead; each measurement or result is a single ''datum''. Many (perhaps most) academic, scientific, and professional ] (e.g., see page 43 of the ) request that authors treat data as a plural noun.


Data can be seen as the smallest units of factual information that can be used as a basis for calculation, reasoning, or discussion. Data can range from abstract ideas to concrete measurements, including, but not limited to, ]s. Thematically connected data presented in some relevant context can be viewed as ''information''. Contextually connected pieces of information can then be described as ''data insights'' or ''intelligence''. The stock of insights and intelligence that accumulate over time resulting from the synthesis of data into information, can then be described as ''knowledge''. Data has been described as "the new ] of the ]".<ref>{{Cite magazine|url=https://wired.com/insights/2014/07/data-new-oil-digital-economy/|title=Data Is the New Oil of the Digital Economy|first=Joris Toonders|last=Yonego|magazine=Wired|date=July 23, 2014|via=www.wired.com}}</ref><ref>{{Cite web|url=https://spotlessdata.com/blog/data-new-oil|title=Data is the new oil|date=July 16, 2018|archive-url=https://web.archive.org/web/20180716224058/https://spotlessdata.com/blog/data-new-oil|archive-date=2018-07-16}}</ref> Data, as a general ], refers to the fact that some existing ] or ] is '']'' or '']d'' in some form suitable for better usage or ].
==Uses of ''data'' in computing==
{{main|Data (computing)}}


Advances in computing technologies have led to the advent of ], which usually refers to very large quantities of data, usually at the petabyte scale. Using traditional data analysis methods and computing, working with such large (and growing) datasets is difficult, even impossible. (Theoretically speaking, infinite data would yield infinite information, which would render extracting insights or intelligence impossible.) In response, the relatively new field of ] uses ] (and other ] (AI)) methods that allow for efficient applications of analytic methods to big data.
''Raw data'' are ]s, ], ]s or other outputs from devices to convert physical quantities into symbols, in a very broad sense. Such data are typically further ] by a human or ] into a ], ] and processed there, or transmitted (]) to another human or computer. ''Raw data'' is a relative term; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next.


== Etymology and terminology ==
Mechanical computing devices are classified according to the means by which they represent data. An ] represents a datum as a voltage, distance, position, or other physical quantity. A ] represents a datum as a sequence of symbols drawn from a fixed ]. The most common digital computers use a binary alphabet, that is, an alphabet of two characters, typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet.
{{further|Data (word)|l1=''Data'' (word)}}
The ] word {{lang|la|data}} is the plural of {{lang|la|datum}}, "(thing) given," and the neuter past participle of {{lang|la|dare}}, "to give".<ref name="EOL"/>
The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954.<ref name="EOL">{{Cite web|url=https://etymonline.com/word/data|title=data &#124; Origin and meaning of data by Online Etymology Dictionary|website=www.etymonline.com}}</ref>


When "data" is used more generally as a synonym for "information", it is treated as a ] in singular form. This usage is common in ] and in technical and scientific fields such as ] and ]. One example of this usage is the term "]".
Some special forms of data are distinguished. A ] is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably ] and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish ], that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.
When used more specifically to refer to the processing and analysis of sets of data, the term retains its plural form.
This usage is common in the natural sciences, life sciences, social sciences, software development and computer science, and grew in popularity in the 20th and 21st centuries. Some style guides do not recognize the different meanings of the term and simply recommend the form that best suits the target audience of the guide. For example, ] as of the 7th edition requires "data" to be treated as a plural form.<ref>{{Cite book|author=American Psychological Association |section=6.11 |title=Publication Manual of the American Psychological Association: the official guide to APA style |publisher=American Psychological Association |year=2020 |isbn=9781433832161 }}</ref>


==Meaning of data, information and knowledge== == Meaning ==
]'s "A TABLE of the Apertures of Object-Glasses" from ] in '']'']]{{See also|DIKW pyramid}}
The terms ] and ] are frequently used for overlapping concepts. These three concepts are ill- or ambiguously defined in the subject matter literature <!--Anyone know what subject matter this is referring to? It may need clarifying. User:Joeblakesley-->. However, In recent interdisciplinary research a few independent specializations of these terms have been proposed.
Data, ], ], and ] are closely related concepts, but each has its role concerning the other, and each term has its meaning. According to a common view, data is collected and analyzed; data only becomes information suitable for making decisions once it has been analyzed in some fashion.<ref>{{cite web|title=Joint Publication 2-0, Joint Intelligence|url=https://jcs.mil/Portals/36/Documents/Doctrine/pubs/jp2_0.pdf|work=Joint Chiefs of Staff, Joint Doctrine Publications|publisher=Department of Defense|access-date=July 17, 2018|pages=I-1|date=23 October 2013|archive-date=18 July 2018|archive-url=https://web.archive.org/web/20180718055308/http://www.jcs.mil/Portals/36/Documents/Doctrine/pubs/jp2_0.pdf|url-status=dead}}</ref> One can say that the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information contained in a data stream may be characterized by its ].


] is the awareness of its environment that some entity possesses, whereas data merely communicates that knowledge. For example, the entry in a database specifying the height of ] is a datum that communicates a precisely-measured value. This measurement may be included in a book along with other data on Mount Everest to describe the mountain in a manner useful for those who wish to decide on the best method to climb it. Awareness of the characteristics represented by this data is knowledge.
==See also==

{{wiktionary|Data}}
Data are often assumed to be the least abstract concept, information the next least, and knowledge the most abstract.<ref>{{cite web|author=Akash Mitra|year=2011|title=Classifying data for successful modeling|url=https://dwbi.org/data-modelling/dimensional-model/16-classifying-data-for-successful-modeling|access-date=2017-11-05|archive-date=2017-11-07|archive-url=https://web.archive.org/web/20171107030817/https://dwbi.org/data-modelling/dimensional-model/16-classifying-data-for-successful-modeling|url-status=dead}}</ref> In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and a climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that range from everyday usage to technical use. This view, however, has also been argued to reverse how data emerges from information, and information from knowledge.<ref>{{cite journal|last=Tuomi|first=Ilkka|date=2000|title=Data is more than knowledge|journal=Journal of Management Information Systems|volume=6|issue=3|pages=103–117|doi=10.1080/07421222.1999.11518258}}</ref> Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, ], perception, and representation.<!--given by nupur set--> Beynon-Davies uses the concept of a ] to differentiate between data and information; data is a series of symbols, while information occurs when the symbols are used to refer to something.<ref>{{cite book|author=P. Beynon-Davies|year=2002|title=Information Systems: An introduction to informatics in organisations|publisher=] |location=Basingstoke, UK|isbn=0-333-96390-3}}</ref><ref>{{cite book|author=P. Beynon-Davies|year=2009|title=Business information systems|publisher=Palgrave |location=Basingstoke, UK|isbn=978-0-230-20368-6}}</ref>
*]

*]
Before the development of computing devices and machines, people had to manually collect data and impose patterns on it. With the development of computing devices and machines, these devices can also collect data. In the 2010s, computers were widely used in many fields to collect data and sort or process it, in disciplines ranging from ], analysis of ] usage by citizens to scientific research. These patterns in the data are seen as information that can be used to enhance knowledge. These patterns may be interpreted as "]" (though "truth" can be a subjective concept) and may be authorized as aesthetic and ethical criteria in some disciplines or cultures. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken.<ref>{{cite book|author=Sharon Daniel|title=The Database: An Aesthetics of Dignity}}</ref>
*]

*]
Mechanical computing devices are classified according to how they represent data. An ] represents a datum as a voltage, distance, position, or other physical quantity. A ] represents a piece of data as a sequence of symbols drawn from a fixed ]. The most common digital computers use a binary alphabet, that is, an alphabet of two characters typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet. Some special forms of data are distinguished. A ] is a collection of data, that can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably ] and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish ], that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.
*]

*]
*] == Data documents ==
{{LibraryandInformation-TopicSidebar}}
*]
Whenever data needs to be registered, data exists in the form of a data ]. Kinds of data documents include:
*]

*]
*] *]
*data study
*]
*]
*] and data destruction techniques
*] *]
*] *]
*] *]
*data handbook
*]
*] *]

*]
Some of these data documents (data repositories, data studies, data sets, and software) are indexed in ]es, while data papers are indexed in traditional bibliographic databases, e.g., ].
*]

===Data collection===

Gathering data can be accomplished through a primary source (the researcher is the first person to obtain the data) or a secondary source (the researcher obtains the data that has already been collected by other sources, such as data disseminated in a scientific journal). Data analysis methodologies vary and include data triangulation and data percolation.<ref>Mesly, Olivier (2015), ''Creating Models in Psychological Research'', Springer Psychology : 126 pages. {{ISBN|978-3-319-15752-8}}</ref> The latter offers an articulate method of collecting, classifying, and analyzing data using five possible angles of analysis (at least three) to maximize the research's objectivity and permit an understanding of the phenomena under investigation as complete as possible: qualitative and quantitative methods, literature reviews (including scholarly articles), interviews with experts, and computer simulation. The data is thereafter "percolated" using a series of pre-determined steps so as to extract the most relevant information.

== Data longevity and accessibility ==
An important field in ], ], and ] is the longevity of data. ] generates huge amounts of data, especially in ] and ], but also in the ]s, e.g. in ]. In the past, scientific data has been published in ]s and books, stored in libraries, but more recently practically all data is stored on ]s or ]s. However, in contrast to paper, these storage devices may become unreadable after a few decades. Scientific publishers and libraries have been struggling with this problem for a few decades, and there is still no satisfactory solution for the long-term storage of data over centuries or even for eternity.

'''Data accessibility'''. Another problem is that much scientific data is never published or deposited in data repositories such as ]s. In a recent survey, data was requested from 516 studies that were published between 2 and 22 years earlier, but less than one out of five of these studies were able or willing to provide the requested data. Overall, the likelihood of retrieving data dropped by 17% each year after publication.<ref>{{Cite journal |last1=Vines |first1=Timothy H. |last2=Albert |first2=Arianne Y. K. |last3=Andrew |first3=Rose L. |last4=Débarre |first4=Florence |last5=Bock |first5=Dan G. |last6=Franklin |first6=Michelle T. |last7=Gilbert |first7=Kimberly J. |last8=Moore |first8=Jean-Sébastien |last9=Renaut |first9=Sébastien |last10=Rennison |first10=Diana J. |date=2014-01-06 |title=The availability of research data declines rapidly with article age |journal=Current Biology |volume=24 |issue=1 |pages=94–97 |doi=10.1016/j.cub.2013.11.014 |issn=1879-0445 |pmid=24361065|s2cid=7799662 |doi-access=free |arxiv=1312.5670 }}</ref> Similarly, a survey of 100 datasets in ] found that more than half lacked the details to reproduce the research results from these studies.<ref>{{Cite journal |last1=Roche |first1=Dominique G. |last2=Kruuk |first2=Loeske E. B. |last3=Lanfear |first3=Robert |last4=Binning |first4=Sandra A. |date=2015 |title=Public Data Archiving in Ecology and Evolution: How Well Are We Doing? |journal=PLOS Biology |volume=13 |issue=11 |pages=e1002295 |doi=10.1371/journal.pbio.1002295 |issn=1545-7885 |pmc=4640582 |pmid=26556502 |doi-access=free }}</ref> This shows the dire situation of access to scientific data that is not published or does not have enough details to be reproduced.

A solution to the problem of reproducibility is the attempt to require ], that is, data that is Findable, Accessible, Interoperable, and Reusable. Data that fulfills these requirements can be used in subsequent research and thus advances science and technology.<ref>{{Cite journal |last=Eisenstein |first=Michael |date=April 2022 |title=In pursuit of data immortality |journal=Nature |volume=604 |issue=7904 |pages=207–208 |doi=10.1038/d41586-022-00929-3 |issn=1476-4687 |pmid=35379989|bibcode=2022Natur.604..207E |s2cid=247954952 |doi-access=free }}</ref>

== In other fields ==
Although data is also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given". ] introduced the term ''capta'' (from the Latin ''capere'', "to take") to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented.<ref>{{cite book | author = P. Checkland and S. Holwell | title = Information, Systems, and Information Systems: Making Sense of the Field. | year = 1998 | publisher = John Wiley & Sons | location = Chichester, West Sussex | isbn = 0-471-95820-4 | pages = 86–89 }}</ref> ] has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using ''data'' may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent.<ref>{{cite journal
|author=Johanna Drucker
|year=2011
|title=Humanities Approaches to Graphical Display
|journal=Digital Humanities Quarterly
|volume=005
|issue=1
|url=https://digitalhumanities.org/dhq/vol/5/1/000091/000091.html
}}</ref> The term ''capta'', which emphasizes the act of observation as constitutive, is offered as an alternative to ''data'' for visual representations in the humanities.

The term '''data-driven''' is a neologism applied to an activity which is primarily compelled by data over all other factors.{{cn|date=February 2024}} Data-driven applications include ] and ].

== See also ==
{{div col|colwidth=15em}}
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]


{{div col end}}


== References == == References ==
{{Reflist}}
<references/>
{{FOLDOC}}
* http://www.answers.com/topic/data - discussion of the correctness of using data as a singular or plural ("data is" or "data are")


== External links ==
]
{{Wiktionary}}
]
{{Commons category}}
{{Scholia|topic}}
* (a detailed assessment)


] {{Data}}
{{Statistics|state=collapsed}}
]
{{Authority control}}
]

]
] ]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]

Latest revision as of 03:59, 13 January 2025

Units of information "Scientific data" redirects here. For the journal, see Scientific Data (journal). For data in computer science, see Data (computer science). For other uses, see Data (disambiguation) and Datum (disambiguation).
These are some of the different types of data: Geographical, Cultural, Scientific, Financial, Statistical, Meteorological, Natural, Transport
Part of a series on
Epistemology
Schools
Concepts
Domains
Epistemologists
Related fields

Data (/ˈdeɪtə/ DAY-tə, US also /ˈdætə/ DAT-ə) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data are usually organized into structures such as tables that provide additional context and meaning, and may themselves be used as data in larger structures. Data may be used as variables in a computational process. Data may represent abstract ideas or concrete measurements. Data are commonly used in scientific research, economics, and virtually every other form of human organizational activity. Examples of data sets include price indices (such as the consumer price index), unemployment rates, literacy rates, and census data. In this context, data represent the raw facts and figures from which useful information can be extracted.

Data are collected using techniques such as measurement, observation, query, or analysis, and are typically represented as numbers or characters that may be further processed. Field data are data that are collected in an uncontrolled, in-situ environment. Experimental data are data that are generated in the course of a controlled scientific experiment. Data are analyzed using techniques such as calculation, reasoning, discussion, presentation, visualization, or other forms of post-analysis. Prior to analysis, raw data (or unprocessed data) is typically cleaned: Outliers are removed, and obvious instrument or data entry errors are corrected.

Data can be seen as the smallest units of factual information that can be used as a basis for calculation, reasoning, or discussion. Data can range from abstract ideas to concrete measurements, including, but not limited to, statistics. Thematically connected data presented in some relevant context can be viewed as information. Contextually connected pieces of information can then be described as data insights or intelligence. The stock of insights and intelligence that accumulate over time resulting from the synthesis of data into information, can then be described as knowledge. Data has been described as "the new oil of the digital economy". Data, as a general concept, refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.

Advances in computing technologies have led to the advent of big data, which usually refers to very large quantities of data, usually at the petabyte scale. Using traditional data analysis methods and computing, working with such large (and growing) datasets is difficult, even impossible. (Theoretically speaking, infinite data would yield infinite information, which would render extracting insights or intelligence impossible.) In response, the relatively new field of data science uses machine learning (and other artificial intelligence (AI)) methods that allow for efficient applications of analytic methods to big data.

Etymology and terminology

Further information: Data (word)

The Latin word data is the plural of datum, "(thing) given," and the neuter past participle of dare, "to give". The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954.

When "data" is used more generally as a synonym for "information", it is treated as a mass noun in singular form. This usage is common in everyday language and in technical and scientific fields such as software development and computer science. One example of this usage is the term "big data". When used more specifically to refer to the processing and analysis of sets of data, the term retains its plural form. This usage is common in the natural sciences, life sciences, social sciences, software development and computer science, and grew in popularity in the 20th and 21st centuries. Some style guides do not recognize the different meanings of the term and simply recommend the form that best suits the target audience of the guide. For example, APA style as of the 7th edition requires "data" to be treated as a plural form.

Meaning

Adrien Auzout's "A TABLE of the Apertures of Object-Glasses" from a 1665 article in Philosophical Transactions
See also: DIKW pyramid

Data, information, knowledge, and wisdom are closely related concepts, but each has its role concerning the other, and each term has its meaning. According to a common view, data is collected and analyzed; data only becomes information suitable for making decisions once it has been analyzed in some fashion. One can say that the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information contained in a data stream may be characterized by its Shannon entropy.

Knowledge is the awareness of its environment that some entity possesses, whereas data merely communicates that knowledge. For example, the entry in a database specifying the height of Mount Everest is a datum that communicates a precisely-measured value. This measurement may be included in a book along with other data on Mount Everest to describe the mountain in a manner useful for those who wish to decide on the best method to climb it. Awareness of the characteristics represented by this data is knowledge.

Data are often assumed to be the least abstract concept, information the next least, and knowledge the most abstract. In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and a climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that range from everyday usage to technical use. This view, however, has also been argued to reverse how data emerges from information, and information from knowledge. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. Beynon-Davies uses the concept of a sign to differentiate between data and information; data is a series of symbols, while information occurs when the symbols are used to refer to something.

Before the development of computing devices and machines, people had to manually collect data and impose patterns on it. With the development of computing devices and machines, these devices can also collect data. In the 2010s, computers were widely used in many fields to collect data and sort or process it, in disciplines ranging from marketing, analysis of social service usage by citizens to scientific research. These patterns in the data are seen as information that can be used to enhance knowledge. These patterns may be interpreted as "truth" (though "truth" can be a subjective concept) and may be authorized as aesthetic and ethical criteria in some disciplines or cultures. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken.

Mechanical computing devices are classified according to how they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A digital computer represents a piece of data as a sequence of symbols drawn from a fixed alphabet. The most common digital computers use a binary alphabet, that is, an alphabet of two characters typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet. Some special forms of data are distinguished. A computer program is a collection of data, that can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.

Data documents

Part of a series on
Library and information science
HistoriesLibraries - Information
FocusArchives management - Collections management (Preservation) - Data management - Information management (cataloguing) - Knowledge management - Library management
CurationData - Metadata - Information - Documents - Artefacts - Knowledge
Interdisciplinary fieldsArchival science - Communication studies - Computer science - Data science - Documentation science - Epistemology - Library science - Information science - Science and technology studies
AreasAcademic - Archival - Legal - Health - Private - Public - School - Special

Whenever data needs to be registered, data exists in the form of a data document. Kinds of data documents include:

Some of these data documents (data repositories, data studies, data sets, and software) are indexed in Data Citation Indexes, while data papers are indexed in traditional bibliographic databases, e.g., Science Citation Index.

Data collection

Gathering data can be accomplished through a primary source (the researcher is the first person to obtain the data) or a secondary source (the researcher obtains the data that has already been collected by other sources, such as data disseminated in a scientific journal). Data analysis methodologies vary and include data triangulation and data percolation. The latter offers an articulate method of collecting, classifying, and analyzing data using five possible angles of analysis (at least three) to maximize the research's objectivity and permit an understanding of the phenomena under investigation as complete as possible: qualitative and quantitative methods, literature reviews (including scholarly articles), interviews with experts, and computer simulation. The data is thereafter "percolated" using a series of pre-determined steps so as to extract the most relevant information.

Data longevity and accessibility

An important field in computer science, technology, and library science is the longevity of data. Scientific research generates huge amounts of data, especially in genomics and astronomy, but also in the medical sciences, e.g. in medical imaging. In the past, scientific data has been published in papers and books, stored in libraries, but more recently practically all data is stored on hard drives or optical discs. However, in contrast to paper, these storage devices may become unreadable after a few decades. Scientific publishers and libraries have been struggling with this problem for a few decades, and there is still no satisfactory solution for the long-term storage of data over centuries or even for eternity.

Data accessibility. Another problem is that much scientific data is never published or deposited in data repositories such as databases. In a recent survey, data was requested from 516 studies that were published between 2 and 22 years earlier, but less than one out of five of these studies were able or willing to provide the requested data. Overall, the likelihood of retrieving data dropped by 17% each year after publication. Similarly, a survey of 100 datasets in Dryad found that more than half lacked the details to reproduce the research results from these studies. This shows the dire situation of access to scientific data that is not published or does not have enough details to be reproduced.

A solution to the problem of reproducibility is the attempt to require FAIR data, that is, data that is Findable, Accessible, Interoperable, and Reusable. Data that fulfills these requirements can be used in subsequent research and thus advances science and technology.

In other fields

Although data is also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere, "to take") to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented. Johanna Drucker has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent. The term capta, which emphasizes the act of observation as constitutive, is offered as an alternative to data for visual representations in the humanities.

The term data-driven is a neologism applied to an activity which is primarily compelled by data over all other factors. Data-driven applications include data-driven programming and data-driven journalism.

See also


References

  1. OECD Glossary of Statistical Terms. OECD. 2008. p. 119. ISBN 978-92-64-025561.
  2. "Statistical Language - What are Data?". Australian Bureau of Statistics. 2013-07-13. Archived from the original on 2019-04-19. Retrieved 2020-03-09.
  3. "Data vs Information - Difference and Comparison | Diffen". www.diffen.com. Retrieved 2018-12-11.
  4. Yonego, Joris Toonders (July 23, 2014). "Data Is the New Oil of the Digital Economy". Wired – via www.wired.com.
  5. "Data is the new oil". July 16, 2018. Archived from the original on 2018-07-16.
  6. ^ "data | Origin and meaning of data by Online Etymology Dictionary". www.etymonline.com.
  7. American Psychological Association (2020). "6.11". Publication Manual of the American Psychological Association: the official guide to APA style. American Psychological Association. ISBN 9781433832161.
  8. "Joint Publication 2-0, Joint Intelligence" (PDF). Joint Chiefs of Staff, Joint Doctrine Publications. Department of Defense. 23 October 2013. pp. I-1. Archived from the original (PDF) on 18 July 2018. Retrieved July 17, 2018.
  9. Akash Mitra (2011). "Classifying data for successful modeling". Archived from the original on 2017-11-07. Retrieved 2017-11-05.
  10. Tuomi, Ilkka (2000). "Data is more than knowledge". Journal of Management Information Systems. 6 (3): 103–117. doi:10.1080/07421222.1999.11518258.
  11. P. Beynon-Davies (2002). Information Systems: An introduction to informatics in organisations. Basingstoke, UK: Palgrave Macmillan. ISBN 0-333-96390-3.
  12. P. Beynon-Davies (2009). Business information systems. Basingstoke, UK: Palgrave. ISBN 978-0-230-20368-6.
  13. Sharon Daniel. The Database: An Aesthetics of Dignity.
  14. Mesly, Olivier (2015), Creating Models in Psychological Research, Springer Psychology : 126 pages. ISBN 978-3-319-15752-8
  15. Vines, Timothy H.; Albert, Arianne Y. K.; Andrew, Rose L.; Débarre, Florence; Bock, Dan G.; Franklin, Michelle T.; Gilbert, Kimberly J.; Moore, Jean-Sébastien; Renaut, Sébastien; Rennison, Diana J. (2014-01-06). "The availability of research data declines rapidly with article age". Current Biology. 24 (1): 94–97. arXiv:1312.5670. doi:10.1016/j.cub.2013.11.014. ISSN 1879-0445. PMID 24361065. S2CID 7799662.
  16. Roche, Dominique G.; Kruuk, Loeske E. B.; Lanfear, Robert; Binning, Sandra A. (2015). "Public Data Archiving in Ecology and Evolution: How Well Are We Doing?". PLOS Biology. 13 (11): e1002295. doi:10.1371/journal.pbio.1002295. ISSN 1545-7885. PMC 4640582. PMID 26556502.
  17. Eisenstein, Michael (April 2022). "In pursuit of data immortality". Nature. 604 (7904): 207–208. Bibcode:2022Natur.604..207E. doi:10.1038/d41586-022-00929-3. ISSN 1476-4687. PMID 35379989. S2CID 247954952.
  18. P. Checkland and S. Holwell (1998). Information, Systems, and Information Systems: Making Sense of the Field. Chichester, West Sussex: John Wiley & Sons. pp. 86–89. ISBN 0-471-95820-4.
  19. Johanna Drucker (2011). "Humanities Approaches to Graphical Display". Digital Humanities Quarterly. 005 (1).

External links

Scholia has a topic profile for Data.
Data
Statistics
Descriptive statistics
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Data collection
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical inference
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical / Multivariate / Time-series / Survival analysis
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Applications
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Categories: