#theREALITT

The Real ITT

 Project on Predatory Student Lending

def makeshortlink(link):
    api = pybitly.api.Api('USERNAME', 'PASSCODE')
    return api.shorten(link)

def makeTweet(textoftweet):
    file = open('/urltofile/generatedtweetimage.png', 'rb')
    data = file.read()
    # data is the image we are going to send
    r = api.request('statuses/update_with_media', {'status': textoftweet}, {'media[]': data})

def makeTextOfTweet(handle, link):
    textoftweet = " Testimony abt #TheRealITT 
   frm your constituents RT @USEdGov if you support ITT students #4Profit "+ link
    return "@CONGRESSPERSON " + textoftweet 

states = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',

The Project fights for low-income borrowers who have debt from for-profit colleges. They represent students and families who have experienced unfair, deceptive, and illegal conduct at the hands of for-profit colleges. In addition to litigation, we have represented our clients by advocating for policy reforms to increase accountability in the for-profit sector.  Read more about it here

They came to us with two problems

The Twitter App + Congress was actually three distinct projects that turned into one overarching project. The three phases amounted to: (1) mass redaction, (2) letter compilation, and (3) tweeting.

Design & Purpose

Phase (1) mass redaction

The first part of this project began with 300 pages of documents with personally identifiable information about clients they wished to redact before filing. The PPSL filed on behalf of ITT tech students, unfortunately they had an estimated 40 hours of redactions before they could file.

Instead of sending it out to a service we started up our trusted python IDE and built a tool. The tool required a way to parse the docx files they had generated, with a handy trial and error for parsing and extracting only the particular areas of the page where names and addresses were found.

The beauty of legal documents is that they are highly delineated and structured. This allows simpler code to identify our information given the structure. Once the correct parser was created, it was simple to traverse the structure and extract the relevant information and replace them with our trusty u”\u2588″.

Calculating the number of characters that needed to be replaced and inserting the same number of blocks kept the document in the same format which is often a plus. You can read more about the python module that i used python-docx here

After this was complete, the process was simply convert to pdf, and apply all meta and hidden information redactions and sanitations using good old Adobe Acrobat Pro. Technically, I think all the changes were not carried over during the rebuilding of the docx files but better safe than sorry.

Technically the second part was not mandatory, but Docx Files use XML and they retain the information you deleted – so that they can provide most people with the undo functionality we all appreciate.  By converting the documents to pdfs – we strip out that information and can go the extra mile again of removing it.   This project could have been accomplished in a number of ways but this one seemed the easiest and fastest at the time.

Availability

The tool is currently not open to the public but we would be happy to discuss the details even further on how it was accomplished. The guts of the project will be added to our github page.

Leave a Reply

Your email address will not be published. Required fields are marked *