Immagine
 Trilingual World Observatory: italiano, english, română. GLOBAL NEWS & more... di Redazione
   
 
\\ Home Page : Articolo
Engineering and computer science research: A glimpse of the archives of the future.
By Admin (from 22/05/2011 @ 11:00:40, in en - Global Observatory, read 1820 times)

How does an archivist understand the relationship among billions of documents or search for a single record in a sea of data? With the proliferation of digital records, the task of the archivist has grown more complex. This problem is especially acute for the National Archives and Records Administration (NARA), the government agency responsible for managing and preserving the nation's historical records.

At the end of President George W. Bush's administration in 2009, NARA received roughly 35 times the amount of data as previously received from the administration of President Bill Clinton, which itself was many times that of the previous administration. With the federal government increasingly using social media, cloud computing and other technologies to contribute to open government, this trend is not likely to decline. By 2014, NARA is expecting to accumulate more than 35 petabytes (quadrillions of bytes) of data in the form of electronic records.

"The National Archives is a unique national institution that responds to requirements for preservation, access and the continued use of government records," said Robert Chadduck, acting director for the National Archives Center for Advanced Systems and Technologies.

To find innovative and scalable solutions to large-scale electronic records collections, Chadduck turned to the Texas Advanced Computing Center (TACC), a National Science Foundation- (NSF) funded center for advanced computing research, to draw on the expertise of TACC's digital archivist, Maria Esteva, and data analysis expert, Weijia Xu.

"For the government and the nation to effectively respond to all of the requirements that are associated with very large digital record collections, some candidate approaches and tools are needed, which are embodied in the class of cyberinfrastructure that is currently under development at TACC," Chadduck said.

After consulting with NARA about its needs, members of TACC's Data and Information Analysis group developed a multi-pronged approach that combines different data analysis methods into a visualization framework. The visualizations act as a bridge between the archivist and the data by interactively rendering information as shapes and colors to facilitate an understanding of the archive's structure and content.

Archivists spend a significant amount of time determining the organization, contents and characteristics of collections so they can describe them for public access purposes. "This process involves a set of standard practices and years of experience from the archivist side," said Xu. "To accomplish this task in large-scale digital collections, we are developing technologies that combine computing power with domain expertise."

This snapshot corresponds to a regularly organized website containing a total of 2,000 files of different file formats. Highlighted in shades of yellow are different number of Portable Document Format (PDF) files. The purple color shows patterns in file naming convention across directories. Credit: Visualizations courtesy of Maria Esteva, Weijia Xu, Suyog Dutt Jain, and Varun Jain.

Knowing that human visual perception is a powerful information processing system, TACC researchers expanded on methods that take advantage of this innate skill. In particular, they adapted the well-known treemap visualization, which is traditionally used to represent file structures, to render additional information dimensions, such as technical metadata, file format correlations and preservation risk-levels. This information is determined by data driven analysis methods on the visualization's back-end. The renderings are tailored to suit the archivist's need to compare and contrast different groups of electronic records on the fly. In this way, the archivist can assess, validate or question the results and run other analyses.

One of the back-end analysis methods developed by the team combines string alignment algorithms with Natural Language Processing methods, two techniques drawn from biology. Applied to directory labels and file naming conventions, the method helps archivists infer whether a group of records is organized by similar names, by date, by geographical location, in sequential order, or by a combination of any of those categories.

Another analysis method under development computes paragraph-to-paragraph similarity and uses clustering methods to automatically discover "stories" from large collections of email messages. These stories, made by messages that refer to the same activity or transaction, may then become the points of access to large collections that cannot be explored manually.

To analyze terabyte-level data, the researchers distribute data and computational tasks across multiple computing nodes on TACC's high performance computing resource, Longhorn, a data analysis and visualization cluster funded by NSF. This accelerates computing tasks that would otherwise take a much longer time on standard workstations.

"TACC's nationally recognized, HPC supercomputers constitute wonderful national investments," said Chadduck. "The understanding of how such systems can be effective is at the core of our collaboration with TACC."

The question remains as to whether archivists and the public will adapt to the abstract data representations proposed by TACC.

"A fundamental aspect of our research involves determining if the representation and the data abstractions are meaningful to archivists conducting analysis, if they allow them to have a clear and thorough understanding of the collection," said Esteva.

Throughout the research process, the TACC team has sought feedback from archivists and information specialists on the University of Texas at Austin campus, and in the Austin community.

"The research addresses many of the problems associated with comprehending the preservation complexities of large and varied digital collections," said Jennifer Lee, a librarian at the University of Texas at Austin. "The ability to assess varied characteristics and to compare selected file attributes across a vast collection is a breakthrough."

The NARA/TACC project was highlighted by the White House in its report to Congress as a national priority for the federal 2011 technology budget. The researchers presented their findings at the 6th International Digital Curation Conference, and at the 2010 Joint Conference on Digital Libraries.

As data collections grow bigger, new ways to display and interact with the data are necessary. Currently, TACC is building a transformable multi-touch display to enhance interactivity and the collaborative aspects of archival analysis. The new system will enable multiple users to explore data concurrently while discussing its meaning.

"What constitutes research today at TACC will eventually be integrated into the cyberinfrastructure of the country, at which point it will become commonplace," said Chadduck. "In that way, TACC is providing what I believe is a window on the archives of the future."

Source: PhysOrg

Provided by National Science Foundation

Articolo Articolo  Storico Storico Stampa Stampa  Share
Cannabis seeds, Autoflowering seeds, Greenhouse, Sweet Seeds, Dutch Passion
comments powered by Disqus
 
Nessun commento trovato. No comments found. Nici un comentariu găsit.

Anti-Spam: dial the numbers CAPTCHA
Text (max 5000 characters)
Nome - Name - Nume
Link ( OPTIONAL - visible on the site - NOT a must )


Disclaimer
Tuo commento sarŕ visibile dopo la moderazione. - Your comment will be visible after the moderation. - Comentariul tău va fi vizibil după moderare.
Ci sono 2784 persone collegate

< ottobre 2019 >
L
M
M
G
V
S
D
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
             

Titolo
en - Global Observatory (605)
en - Science and Society (594)
en - Video Alert (346)
it - Osservatorio Globale (503)
it - Scienze e Societa (555)
it - Video Alerta (132)
ro - Observator Global (399)
ro - Stiinta si Societate (467)
ro - TV Network (149)
z - Games Giochi Jocuri (68)

Catalogati per mese - Filed by month - Arhivate pe luni:

Gli interventi piů cliccati

Ultimi commenti - Last comments - Ultimele comentarii:
Hi, it's Nathan!Pretty much everyone is using voice search with their Siri/Google/Alexa to ask for services and products now, and next year, it'll be EVERYONE of your customers. Imagine what you are ...
15/01/2019 @ 17:58:25
By Nathan
Now Colorado is one love, I'm already packing suitcases;)
14/01/2018 @ 16:07:36
By Napasechnik
Nice read, I just passed this onto a friend who was doing some research on that. And he just bought me lunch since I found it for him smile So let me rephrase that Thank you for lunch! Whenever you ha...
21/11/2016 @ 09:41:39
By Anonimo


Titolo

Latest NEWS @
www.TurismoAssociati.it

Non-Hodgkin lymphoma cured by CANNABIS. The video of Stan and Barb Rutner.

Dr. Joycelyn Elders, U.S. surgeon general: Myths About Medical Marijuana in The Providence Journal, 2004.

Marihuana vindeca CANCERUL: marturii despre uleiul de cocos si canabis.

Yahoo Incorporated Mail. My account recovery information is incorrect. The Password Helper says my password can't be reset online. "First time signing in here?" message.

All information in a video about Donatio Mortis Causa and The Venus Project

What is TOR browser?

Impeachment inquiry, Warren-Biden matchup highlight U.S. Democratic debate

Pence, Giuliani will not cooperate in U.S. House impeachment inquiry

Pentagon says unable to share documents with House impeachment request 'at this time'

Turkey pushes offensive in Syria, despite sanctions and calls to stop

Trump to meet U.S. lawmakers on Syria at White House on Wednesday: sources

Clashes erupt in Barcelona as Catalan separatists protest sentences for leaders

Ultimele articole - Antena3.roHOROSCOP. Sfatul Arhanghelului Mihail pentru zodii 16 octombrie. Berbecii trebuie sa creada într-un plan divin, pentru Raci totul este posibil

Taifunul Hagibis a luat peste 70 de vieti în Japonia. Alte zeci de persoane sunt date disparute

HOROSCOP 16 OCTOMBRIE. Fecioarele au idei noi, Taurii petrec timp în familie

Viorica Dancila: Am fi de acord cu o unire cu Pro România, dar fara Victor Ponta

CALENDAR ORTODOX 16 OCTOMBRIE. Ce sfânt este sarbatorit astazi?

Vasile a ajuns de râsul satului în urma cu mai mul?i ani pentru un incident din Elve?ia. Recent, însa, barbatul a comis-o grav. De atunci, toata lumea îl cauta, iar dispari?ia lui este învaluita într-un mister total





16/10/2019 @ 02:07:46
script eseguito in 769 ms