interface to distinguish rich and poor document data. This also reverts some changes from commit 796770e070 because the firstSeen database is the wrong method to distinguish these types of data
796770e070