September 14, 2006; 06:36 AM Snowtide Informatics Systems, Inc., the leading provider of
enterprise-class PDF text and metadata extraction APIs (Application
Programming Interface) for Java applications and Web services,
announced that the National Institutes of Health (NIH) is successfully
using its flagship product, PDFTextStream, to automate the text
extraction process for thousands of archived research grant
applications in PDF (Portable Document Format). The process is enabling
the NIH to feed its content analysis systems with thousands of archived
grant applications, and in turn generate more accurate funding reports
required by Congress.
With an annual budget of over $28 billion, and almost 200,000 grant
applications reviewed every year, the NIH has been embarking on various
projects to convert archived paper documents to electronic files in
order to enable efficient and accurate report generation. The
deployment of PDFTextStream is part of this enormous undertaking.
According to the NIH, the Java API has already helped the agency save
thousands of man-hours by automating the tedious and complex task of
PDF text extraction.
“We needed to be able to categorize NIH grant applications based on
their specific content so we could ultimately generate accurate funding
reports for Congress,” said LiMing Shen, Lead Integration Engineer at
the NIH, and the project lead for the text extraction project.
“This was more challenging than it sounds. Our conversion process to
PDF was producing documents that were readable to the human eye, but
difficult for our text extraction engines to process accurately. Not
only did PDFTextStream do an outstanding job extracting text, but it
also accurately identified specific sections within each PDF
documents—a demanding function other PDF extraction tools we evaluated
could not provide. This feature alone has already saved us hundreds of
hours of costly manual work, and will add up to thousands of hours
saved by the time the project is complete.”
“Our focus has always been to provide customers with the highest
performing PDF text and metadata extraction API for Java, .NET, and
Python,” said Chas Emerick, President, Snowtide Informatics Systems,
Inc. “The NIH development team put PDFTextStream through some very
rigorous tests. We’re pleased to see that our API performed very well
and is having a significant impact on productivity and cost savings for
the agency.”
About the National Institutes of Health
The National Institutes of Health (NIH), a part of the U.S. Department
of Health and Human Services, is the primary Federal agency tasked with
conducting and supporting medical research. Helping to lead the way
toward important medical discoveries that improve people's health and
save lives, NIH scientists investigate ways to prevent disease as well
as the causes, treatments, and even cures for common and rare diseases.
With an annual budget of over $28 billion, the NIH awards almost 50,000
competitive grants every year to more than 212,000 researchers at over
2,800 universities, medical schools, and other research institutions in
every state and around the world.
About Snowtide Informatics Systems, Inc.
Snowtide Informatics Systems, Inc. is a privately held software company
headquartered in Holyoke, Massachusetts. Its high-performance software
and custom development services enable large enterprises and government
agencies to automate the extraction, conversion, and cataloging of
content held in PDF documents. PDFTextStream, Snowtide Informatics'
flagship product, is a software component library for Java, Python, and
.NET environments that has been built from the ground up to rapidly and
accurately extract text and metadata held in PDF documents. For more
information about Snowtide Informatics Systems visit snowtide.com.
|