PDF to text
By aggregator | September 22, 2011
This service will extract the text content from a PDF file. It uses the pdftotext executable from Xpdf (http://www.foolabs.com/xpdf/). The text returned from this service often contains characters which are XML-invalid, therefore the text is returned in its binary or Base64 encoded form. The text output should be cleaned before sending to another web service.
- Name
- PDF to text
- Documentation
- http://gnode1.mib.man.ac.uk:8080/ArticleSectionClassifierWebApp/
- Protocol
- SOAP
- WSDL
- Endpoint
- http://gnode1.mib.man.ac.uk:8080/FullTextWebServices/PdfToTextService
- Topic
- General
- Type
- Text Mining
- Tags
- e-lico, pdf, pdftotext, text mining, text preprocessing
- Description
This service will extract the text content from a PDF file. It uses the pdftotext executable from Xpdf (http://www.foolabs.com/xpdf/). The [...]
- Further information
This service will extract the text content from a PDF file. It uses the pdftotext executable from Xpdf (http://www.foolabs.com/xpdf/). The text returned from this service often contains characters which are XML-invalid, therefore the text is returned in its binary or Base64 encoded form. The text output should be cleaned before sending to another web service.
- Original source
- BioCatalogue