von Nasrullah Memon
Statistik und Sichtungsnachweis dieser Seite findet sich am Artikelende
[1.] Nm/Fragment 215 01 - Diskussion Zuletzt bearbeitet: 2012-04-23 11:18:44 Hindemith | Dugan etal 2006, Fragment, Gesichtet, Nm, SMWFragment, Schutzlevel sysop, Verschleierung |
|
|
Untersuchte Arbeit: Seite: 215, Zeilen: 1-4 |
Quelle: Dugan_etal_2006 Seite(n): 409, Zeilen: 29-32 |
---|---|
[The weakness of the] knowledge base includes potential media bias and misinformation, lack of information beyond incident specific details alone, and missing data which was not available in the media. We review some of these strength and weaknesses in the next section of this article. | Weaknesses of the database include potential media bias and misinformation, lack of information beyond incident specific details alone, and missing data from a set of cards that were lost during an office move of PGIS. We review some of these strengths and weaknesses in the next section of this report. |
The source is not mentioned anywhere in the thesis |
|
[2.] Nm/Fragment 215 11 - Diskussion Zuletzt bearbeitet: 2012-04-21 22:30:48 WiseWoman | Fragment, Gesichtet, Malin etal 2005, Nm, SMWFragment, Schutzlevel sysop, Verschleierung |
|
|
Untersuchte Arbeit: Seite: 215, Zeilen: 11-31 |
Quelle: Malin_etal_2005 Seite(n): 119, 120, Zeilen: 30ff; 1ff |
---|---|
Investigative data mining is increasingly performed on networks constructed from personal name relationships extracted from text-based documents. In such networks, a node corresponds to a particular name and an edge specifies the relationship between two names. Before such a network can be analyzed for centrality, grouping, or intelligence gathering purposes, the correctness of the network must be maximized. Specifically, it must be decided when two pieces of data correspond to the same entity or not. Failure to ensure correctness can result in the inability to discover certain relationships or the cause of learning false knowledge.
Names are not unique identifiers for specific entities and, as a result, there exists many confounders to the construction of correct networks. Firstly, the data may consist of typographical error. In this case, the name “Nasrullah” may be accidentally represented as “Nasarullah” or “Nasurullah”. There exists a number of string comparator metrics to account for typographical errors, many of which are in practice. However, even when names are free of typographical errors, there are additional confounders to data correctness. For example, there may occur name variation, where multiple names correctly reference the same entity or same name correctly references multiple entities i.e., there can exist name ambiguity. |
Link analysis is increasingly performed on networks constructed from personal name relationships extracted from text-based documents
[...] [P. 120] [...] In such networks, a vertex corresponds to a particular name and an edge specifies the relationship between two names. Before such a network can be analyzed for centrality, grouping, or intelligence gathering purposes, the correctness of the network must be maximized. Specifically, it must be decided when two pieces of data correspond to the same entity or not. Failure to ensure correctness can result in the inability to discover certain relationships or cause the learning of false knowledge. Names are not unique identifiers for specific entities and, as a result, there exist many confounders to the construction of correct networks. Firstly, the data may consist of typographical error. In this case, the name “John” may be accidentally represented as “Jon” or “Jhon”. There exist a number of string comparator metrics (Winkler, 1995; Cohen et al., 2003;Wei, 2004) to account for typographical errors, many of which are in practice by various federal statistical agencies, such as the U.S. Census Bureau. However, even when names are devoid of typographical errors, there are additional confounders to data correctness. For instance, there can exist name variation, where multiple names correctly reference the same entity. Or, more pertinent to our research, there can exist name ambiguity, such that the same name correctly references multiple entities. |
The source is not mentioned anywhere in the thesis. |
|
Letzte Bearbeitung dieser Seite: durch Benutzer:Hindemith, Zeitstempel: 20120423112520