Location:
Search - get pdf text
Search list
Description: Lucene Web interface, use XML as a lightweight protocol. developer can convert data source (text, DB, MS Word, PDF... etc) into xml format, indexing with lucene engine, and get full text search result via HTTP, with XML format output, user can easily intergrated with JSP ASP PHP front end or use XSLT at server side transform output.
Platform: |
Size: 2891116 |
Author: 张和 |
Hits:
Description: Lucene Web interface, use XML as a lightweight protocol. developer can convert data source (text, DB, MS Word, PDF... etc) into xml format, indexing with lucene engine, and get full text search result via HTTP, with XML format output, user can easily intergrated with JSP ASP PHP front end or use XSLT at server side transform output.
Platform: |
Size: 2890752 |
Author: 张和 |
Hits:
Description: 这是本人收集的一些linux基础学习相关的资料,全部是PDF格式的,解压后即可。内容包括:(1)Linux操作系统文件系统学习教程.pdf(2)Linux基础复习题.pdf(3)Linux命令学习加Linux标准文本处理命令.pdf(4)Linux扫描式教程.pdf(5)Linux系统常用命令快速入门.pdf.不再列举了,有兴趣的自己下载去看吧-This is linux, I collected some basic information on study-related, all are PDF format, immediately after decompression. Include: (1) Linux operating system, file system tutorial study. Pdf (2) Linux foundation Exercises. Pdf (3) Linux command study plus the standard Linux text processing commands. Pdf (4) Linux scanning tutorial. Pdf (5) commonly used Linux system commands to get started quickly. pdf. is no longer listed, interested to see your own download
Platform: |
Size: 3548160 |
Author: 朱明来 |
Hits:
Description: Lucene.Net is a high performance Information Retrieval (IR) library, also known as a search engine library. Lucene.Net contains powerful APIs for creating full text indexes and implementing advanced and precise search technologies into your programs. Some people may confuse Lucene.net with a ready to use application like a web search/crawler, or a file search application, but Lucene.Net is not such an application, it s a framework library. Lucene.Net provides a framework for implementing these difficult technologies yourself. Lucene.Net makes no discriminations on what you can index and search, which gives you a lot more power compared to other full text indexing/searching implications you can index anything that can be represented as text. There are also ways to get Lucene.Net to index HTML, Office documents, PDF files, and much more.-Lucene.Net is a high performance Information Retrieval (IR) library, also known as a search engine library. Lucene.Net contains powerful APIs for creating full text indexes and implementing advanced and precise search technologies into your programs. Some people may confuse Lucene.net with a ready to use application like a web search/crawler, or a file search application, but Lucene.Net is not such an application, it s a framework library. Lucene.Net provides a framework for implementing these difficult technologies yourself. Lucene.Net makes no discriminations on what you can index and search, which gives you a lot more power compared to other full text indexing/searching implications you can index anything that can be represented as text. There are also ways to get Lucene.Net to index HTML, Office documents, PDF files, and much more.
Platform: |
Size: 320512 |
Author: Yu-Chieh Wu |
Hits:
Description: 这是open source的, 程式码为c++. 如果不想使用它的源码, 也可以利用里面附的pdftotext.exe直接将pdf里的文字资料输出到文字档里. -get text from pdf file
Platform: |
Size: 787456 |
Author: wangxiaojuan |
Hits:
Description: oreilly出版社的programming python 第四版-If you ve mastered Python s fundamentals, you re ready to start using it to get real work done. Programming Python will show you how, with in-depth tutorials on the language s primary application domains: system administration, GUIs, and the Web. You ll also explore how Python is used in databases, networking, front-end scripting layers, text processing, and more. This book focuses on commonly used tools and libraries to give you a comprehensive understanding of Python’s many roles in practical, real-world programming.
Platform: |
Size: 25789440 |
Author: haha |
Hits:
Description: Chebfun找到一个类似的问题已经解决了别人使用作为一个模板。这一页将你们连接到几十个这样的模板,称为Chebfun例子。每一个例子是一个M-file生产文本和/或图形输出,执行,在大多数情况下,在少于5秒。你也可以执行的例子,用Matlab命令的出版得到一个包含更多的信息的故事。一类开路(出版(“文件名”))去看在你的屏幕上最快的版本或出版(“文件名”)为一个更好的格式化乳胶版本,它将出现在一个目录叫做html。公布的产量也可直接下载作为一个pdf档案。-The quickest way to solve your problem with Chebfun may be to find a similar problem someone else has solved to use as a template. This page connects you to dozens of such templates, called Chebfun Examples. Each example is an M-file producing text and/or graphical output which executes, in most cases, in less than 5 seconds. You can also execute the example with Matlab s PUBLISH command to get a more informative story. Type open(publish( filename )) to see the quickest version on your screen or publish( filename , latex ) for a better formatted LaTeX version, which will appear in a directory called html. The published output is also available for direct download as a pdf file.
Platform: |
Size: 4598784 |
Author: jie |
Hits:
Description: tm包是R语言中为文本挖掘提供综合性处理的package,进行操作前载入tm包,vignette命令可以让你得到相关的文档说明。
>vignette("tm") //会打开一个tm.pdf的英文文件,讲述tm package的使用及相关函数-R language tm package is to provide comprehensive treatment for text mining package, loaded tm package before the operation, vignette command allows you to get relevant documentation. >vignette("tm") // will open a tm.pdf documents in English, tells tm package use and related functions
Platform: |
Size: 709632 |
Author: 梦召 |
Hits:
Description: Apache tika 可以解析各种富文本格式的文件,得到其中的文本内容字符串。如tika 可用于解析Office 97/2003/2007 格式、PDF 格式、HTML 等格式的文件。请参考tika-app-1.5.jar 的功能,实现一个GUI 界面的桌面程序,该程序可以打开以上几种格式的文件,调用tika 进行解析,在界面上展示txt 解析结果,并将结果保存成文本文件。该程序也可以同时打开一组文件,以多线程的方式对它们进行并行处理。注:请在程序中导入tika-core-1.5.jar 和tika-parsers-1.5.jar 来调用tika的功能(也可以只导入tika-app-1.5.jar)。-Tika Apache can parse a variety of rich text format file, get the text content of the string. Such as Tika can be used to parse 97/2003/2007 Office format, PDF format, HTML format file. Please refer to the tika-app-1.5.jar function, to achieve a GUI interface of the desktop program, the program can open more than a few formats of the file, call Tika to resolve, in the interface to display the results of TXT analysis, and save the results into a text file. The program can also open a group of files at the same time, in order to carry out the process of multi thread parallel processing. Note: import tika-core-1.5.jar and tika-parsers-1.5.jar in the program to call the Tika function (also can only import tika-app-1.5.jar).
Platform: |
Size: 2048 |
Author: danny |
Hits:
Description: This pdf contains highlights of Sublime Text Keyboard Shortcuts & Code Templates.
If you invest some time in learning your IDE you will benefit greatly in the long run.
Don t let the IDE get in the way of your thinking process.
Platform: |
Size: 315392 |
Author: user0983 |
Hits:
Description: Get a HTML text and generate a PDF file to make it printer-friendly. This PHP script is based upon FPDF PHP script.
Platform: |
Size: 99328 |
Author: jrizo
|
Hits:
Description: This package contains an OCR engine - libtesseract and a command line program - tesseract.
The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and GitHub's log of contributors.
Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".
Tesseract supports various output formats: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf.
You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.
This project does not include a GUI application. If you need one, please see the 3rdParty wiki page.
Tesseract can be trained to recognize other languages. See Tesseract Training for more information.
Platform: |
Size: 42390528 |
Author: 独孤闻林 |
Hits:
Description: C#生成PDF 读取PDF文本内容 获取PDF内图片,十分值得参考(C# Generate PDF, read PDF text content and get pictures in PDF, which is worth referencing.)
Platform: |
Size: 8058880 |
Author: net资源共享 |
Hits: