close
close

Python PyPDF Displaying “If this message is not eventually..” etc. message when attempting to read multiple PDFs

Python PyPDF Displaying “If this message is not eventually..” etc. message when attempting to read multiple PDFs

I’m new to programming so I’m hoping to get some assistance. I’m trying to pull data from several PDFs and output a csv file. When I run the python script, it will pull data correctly from the first PDF it ingests, but further PDFs after that give this message “If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document.” .etc. As I understand it based on my troubleshooting, that is the actual PDF document and the other stuff is XML that isn’t being loaded.

Here’s the method that is reading the PDFs with the PdfReader:

def parse_pdf(file_list):
    i = 0
    user_dict = {}
    while i < len(file_list):
        filename = os.path.join(INTAKE_FILEPATH, file_list(i))
        reader = PdfReader(filename)
        page = reader.pages(0)
        fields = reader.get_fields()
        text = page.extract_text()
        name_l = get_name_l(text)
        serial_num = get_serial_num_s(text)
        signed_l = check_if_signed_l(name_l, fields)
        loc_l = get_loc_deros_l(text)
        contact_l = get_contact_info_l(text)
        user_list(i) = assemble_dict(name_l, signed_l, loc_l, contact_l, serial_num)
        ##ISSUES WITH READING MULTIPLE TIMES
        i += 1
    return user_list

The first PDF read in outputs exactly what I’m looking for, but any subsequent PDFs have the message at the top when read in by the PdfReader. Any help is appreciated, thank you.

I attempted to use other libraries but several are having trouble reading the PDF, I have tried changing the loop around a bit in hopes that maybe in needed to get out of the parse_pdf method and go back, I tried using sleep and giving it 5 minutes , and I’m not sure what else I can try.