Pdftotext -layout -x 567 -y 77 -W 176 -H 500 \ĭAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 4th-columns.txtīTW, I cheated a bit: in order to get a clue about what values to use for -x, -y, -W and -H I did first run this command in order to find the exact coordinates of the column header words: pdftotext -f 1 -l 1 -layout -bbox \ĭAC06E7D1302B790429AF6E84696FCFAB20B.pdf - | head -n 10 These are for second, third and fourth columns: pdftotext -layout -x 214 -y 77 -W 176 -H 500 \ĭAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 2nd-columns.txt The following command extracts the first columns: pdftotext -layout -x 38 -y 77 -W 176 -H 500 \ĭAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 1st-columns.txt Then append the columns with a combination of utilities like paste and column. parameters to pdftotext to crop the PDF column-wise.
0 Comments
Leave a Reply. |