Journey of our Project

Inspiration

Terminal Apps For Visually Impaired people takes the inspiration from various other projects here at IIT Delhi to facilitate Visually Impaired People. Our project idea gets its motivation from the very basic habit of human society i.e reading books, everybody wants to read books but the obstacle arises in case of Visually Impaired People as they have to handle those bulky books written in Braille. So, we tried to fix this problem and to provide them with an efficient solution of reading Digital text formats with the help of Refreshable Braille Display.

Our Approach

The first task on our hand was to unfold the various documents into a readable structure as all those documents have their own archive type or text scripting format so to make it manipulatable we trans-compiled all those document formats into one common format i.e XML which made it easier for us to modify the document format.

After studying various file structures we convert all of them into a common markup format so that it can be handled easily.
For the common markup, we chose XML as it is easily accessible for further modifications. For a detailed description about XML click here.

At this point in time, we had a structured common document format which we had to manipulate using some common norms.

Next thing we achieved was that we used python for controlling the text flow on the terminal or bash interface of a Linux/Unix Operating System which was a necessary task to proceed further in the project.

After getting raw output on the terminal interface we had to look for all possible data blocks and label them as their block type such as heading, table, image, subheading etc.

Till this stage, we had output on our terminal interface which resembles the data in original documents and can be handled using a Class structure to manipulate on further levels. After converting the documents into a common markup format we built a class structure resembling the text in the exact same format as in the original documents

For gaining full control over text flow we designed class structure that can provide data on specific method calls i.e. for getting next paragraph or to navigate to next heading and much more.

We have the above text as our output from the program which contains all information about the original document and can be accessed using simple method calls of our document object model.

Problems and Challenges

Misalignment of the text boxes and discontinuous object placements in various pdfs caused difficulty in the parsing of pdf documents.
Many properties of various texts were lost.
Larger files cannot be parsed as whole in one go due to memory constraints.(50 MB available)
Identifying tables and making the data navigable, as the parser being used did not have such provision.

Our Solution

For tackling the memory constraint we decided to process only the current chunk of data and for managing that we had to go all the way back to the original file and remember the position which we managed to do successfully for ePub. For pdf and daisy, we still have to rely on the file size which is one of the limitations of our application.

Tables are now being identified with the help of layout line tags and a simple condition of a polygon to be a rectangle. Also, the text streams have position coordinates as well which we match with table cells and access the data in the table. Navigation is by standard up/down/left/right keys.

Proposed Workflow

General Workflow Our application workflow takes input as a file and after that our program converts it to XML and process it to represent data on the terminal Interface EPub In case of ePub, we have used ePub Python Library to get content from the epub file after which our application makes the Class Structure to handle and manipulate various aspects of a document After that, we have information of the various content files present in an ePub document. These files have meta-structure in form of XML and content in form of HTML, for post-processing of these HTML we used Beautiful Soup Library and converted texts into navigable blocks. For the navigation purpose, we made another class Structure which handles all the navigation requests and respond accordingly. And hence our output comes straight to the terminal Interface as PDF In PDF we have used PDFMINER library to get the data from the pdf file and conv...

Terminal Apps

Search This Blog