donderdag 3 april 2014

A simple Python script to make a literature table

Geeky post again - no math this time, but computer code.

I'm sure people have done this before, but I thought it would be a nice opportunity to practice my Python skills to write a small script for the following problem.. Usually when I read a scientific article I watch out for the following elements:
  • Innovation: what does the study do what others haven't done before?
  • Method: what method did they use?
  • Data: where did they get their data from?
  • Results: what are the main results?
  • Relevance: who benefits from this research, and how?
I also like to place the research in one of the four quadrants in this post. I find it helpful to make an overview of these questions in a table:

1st authorYearJournalQuadrantInnovationMethodDataResultsRelevance
Kompas2005Journal of Productivity Analysis4Estimates efficiency gains quota trade for Southeast Trawl Fishery, AUStochastic frontier analysisAFMA and ABARE survey dataITQs gave efficiency gainsPolicy debate on ITQs
Kompas2006Pacific Economic Bulletin3Estimates optimal effort levels and allocation across speciesMultifleet, multispecies, multiregion bioeconomic modelSPC dataEffort reduction needed; optimal stocks larger than BMSYPolicy debate on MEY

But here's the problem: I usually make my notes in a bibtex file (as a good geek should), which looks like this:

@ARTICLE{Kompas2006PacEconBull,
  author = {Kompas, T. and Che, T.N.},
  title = {Economic profit and optimal effort in the Western and Central Pacific tuna fisheries},
  journal = {Pacific Economic Bulletin},
  year = {2006},
  volume = {21},
  pages = {46-62},
  number = {3},
  data = {SPC data},
  innovation = {Estimates optimal effort levels and allocation across species},
  quadrant = {3},
  keywords = {tuna; bioeconomic model; optimisation; Pacific},
  method = {Multifleet, multispecies, multiregion bioeconomic model},
  results = {Effort reduction needed; optimal stocks larger than BMSY},
  relevance = {Policy debate on MEY}
}

@ARTICLE{Kompas2005JProdAnalysis,
  author = {Kompas, Tom and Che, Tuong Nhu},
  title = {Efficiency gains and cost reductions from individual transferable quotas: A stochastic cost frontier for the Australian South East fishery},
  journal = {Journal of Productivity Analysis},
  year = {2005},
  volume = {23},
  pages = {285-307},
  number = {3},
  quadrant = {3},
  data = {AFMA and ABARE survey data},
  innovation = {Estimates efficiency gains quota trade for Southeast Trawl Fishery, AU},
  keywords = {individual transferable quotas; stochastic cost frontier; fishery efficiency; ITQs},
  method = {Stochastic frontier analysis},
  relevance = {Policy debate on ITQs.},
  results = {ITQs gave efficiency gains}
}
I don't want to copy it all by hand, so I wrote this little script in Python to convert all entries in the bibtex file to a csv file:

import csv
from bibtexparser.bparser import BibTexParser
from dicttoxml import dicttoxml
from operator import itemgetter

def readFirstAuthor(inpList,num):
    author1 = ""
    x = inpList[num]['author']
    for j in x:
        if j != ',':
            author1+=j
        else:
            break
    return author1

def selectDict(inpList,name):
    outObj = []
    for i in range(len(inpList)):
        if name in inpList[i]['author'] and \
            inpList[i]['type']=='article':
            outObj.append(inpList[i])
    return(outObj)

def selectFieldsDict(inpList,fieldNames):
    outObj = []
    for i in range(len(inpList)):
        temp = {}
        for n in fieldNames:
            if n == 'author':
                author1 = readFirstAuthor(inpList,i)
                temp['author'] = author1
            else:
                if n in inpList[i]:
                    temp[n] = inpList[i][n]
                else:
                    temp[n] = 'blank'
        outObj.append(temp)
    return(outObj)

fieldnames = ['author','year','journal','quadrant',\
    'innovation','method','data','results','relevance']

with open('BibTexFile.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    
record_list = bp.get_entry_list()
record_dict = bp.get_entry_dict()

dictSelection = selectDict(record_list,'Kompas')

fieldSelection = selectFieldsDict(dictSelection,fieldnames)


test = sorted(fieldSelection, key=itemgetter('year'))


test_file = open('output.csv','wb')
csvwriter = csv.DictWriter(test_file, delimiter=',',\
    fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in test:
     csvwriter.writerow(row)
test_file.close()

If you are a Python developer: any comments on this are welcome. I'm sure it's not perfect.

Geen opmerkingen:

Een reactie posten