Commit Graph

71 Commits (f8f1959ebb3f96b66e75d7d83cd70ae9714e85bd)

Author SHA1 Message Date
luccioman 8da3174867 Ensure lower case conversion consistency with any default locale.
7 years ago
luccioman 8399275142 Properly close file output streams even on exceptions scenarios.
8 years ago
luccioman 79fdf14b0a Fixed regression introduced by commit 9ad4d16
8 years ago
reger 9ad4d16829 Add a responsHeader to the solr index export with a format identifier
8 years ago
reger 18c7563dbe Extend DCEntry.getLanguage convert to ISO639-1 codes for more languages
8 years ago
reger df80c57842 add ukr and pol to DCEntry.getLanguage ISO639-2 3-char language code
8 years ago
luccioman 6a4d51d8f9 Cleaned up some Javadoc warnings.
8 years ago
reger 4c9be29a55 fix concurrency issue with htmlParser using not current scraper data
8 years ago
reger 4c7a77662a eleminate dependency on file-extension in storeDocument but use supported mime-type
8 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 6b7c10cef8 fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago
luc 8ebefa4233 Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
9 years ago
luc 27d11f8671 Fixed isSolrDump function : PushBackInputStream was not unread when
9 years ago
reger d5330391de remove some unused var allocation in parser
9 years ago
Michael Peter Christen 593de05922 enhanced surrogate import process speed (dramatically!)
10 years ago
Michael Peter Christen d0aff91f23 fix for index import
10 years ago
Michael Peter Christen b43811d38c added surrogate import process for exported solr dumps.
10 years ago
Michael Peter Christen ff29b0e503 added option to re-index exported xml snapshot dumps to
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen 1f5047b15f using precompiled pattern CommonPattern.SEMICOLON for splits
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
reger 03a7a29db3 limit OAI import urn resolver try for Deutsche National Library
10 years ago
Michael Peter Christen 8b44fcf0f4 added missing @Override annotation
11 years ago
reger 651d057e93 surrogate import translate dc:language 3-char codes
11 years ago
Michael Peter Christen 453bfd0f17 removed unused variables and warnings
11 years ago
reger 1d01672bd3 fix DCEntry.getIdentifier
11 years ago
reger 6306d28a6a OAI import get multivalued keywords (dc:subject)
11 years ago
reger 5c9dcc269d improve OAI-PMH import identifier recognition
11 years ago
orbiter 937273d4e3 added parsing of metadata to surrogate reading:
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen 35ab2cef7b added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in
11 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value
11 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 584663ae8c - redesign of solr query construction
12 years ago
Michael Peter Christen 24d2ee3c52 - better date ranking
12 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
12 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen b0c408788b made class methods static where possible
13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
13 years ago
Michael Peter Christen 0c345d1559 giving threads name so its easier to see whats happening during
13 years ago
Michael Peter Christen 461a0ce052 removed warnings
13 years ago
Michael Peter Christen 964406ad17 added concurrency enhancement to xml parser
13 years ago
Michael Peter Christen 6e83b02b83 - bugfix for surrogate file reader
13 years ago
Michael Peter Christen 9b4c699526 ehanced location search:
13 years ago
Roland 'Quix0r' Haeder a093ccf5eb Now used synchronization in all close() methods to make sure all objects
13 years ago
Michael Peter Christen e101c2e0e2 added changes from copperdust (submitted by email):
13 years ago
orbiter 205cc75157 abstraction of surrogate main element (xmlns:geo was missing for wiki extracts)
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago