Monday, May 28, 2012

Authorship recognition of multi-authored document

There are very few works on this topic.
Most recent work is by Moshe Koppel and his team "Unsupervised Decomposition of a Document into Authorial Components." In this paper they analyzed Bible as a multi-authored document and decomposed chapters into two sets, chapters written by Jeremiah and chapters written by Ezekiel.  They used synonym usage to distinguish two authors.

The authors said this approach could be used iteratively for more than 2 authors, but you need to know the number of authors.

There are couple of limitations of this work. They experimented with the Hebrew version of Bible, there were only two authors and they decomposed the text chapter-wise.

