Monday 23 November 2009
Twittering OCR
'things that go bump in the tunnel'
As I've beem following the LHC restart I've written a parser for the vistar status feed to send it to twitter. The basic method is:
Grab URL (see image) then do some imagemagick hackery to cut out the corner. Resize larger (helps with the OCR) and save as tiff. Run the image through OCR software, compare the output to the last run, if different then upload to twitter.
ie
curl -o $IMG $SRC
convert $IMG +repage -crop 509x205+1+533 -resize 1000x -threshold 39000 $IMG
convert -monochrome $IMG $TIFF
mv $OUT.txt $OUT.old # make a backup of old
tesseract $TIFF $OUT
# Strip out ready for Twitter
DATE=`date +%d-%m-%Y`
sed -i "s/Comments $DATE /#LHC Status /" $OUT.txt
diff -q $OUT.txt $OUT.old
if [ $? -eq 1 ] ; then
# Post to Twitter.
curl --basic --user lhcstatus:password --data status="`cat $OUT.txt`" http://twitter.com/statuses/update.json
fi
and lo: http://twitter.com/lhcstatus
Subscribe to:
Posts (Atom)
Feeling Pumped!
Having just had a day without power, and then going round the site to check everything came back online correctly (including services such a...
-
Those of you who follow my twitter stream will have noticed that I managed to 'lose' my home machine today. It was online and activ...
-
Since there's no european satellite stream of Nasa TV it means you have to watch a streamed version over here. Also I'd like to w...
-
We have a small cabin on the site that used to have a 200w panel, 65Ah deep cycle lead acid battery, el-cheapo PWM charger to power some LED...