Commit Graph

406 Commits (f7e887bf4960dd7713047faee8a1f9273ae81be8)

Author SHA1 Message Date
orbiter 63762d8f89 removed kelondro dependencies from cora 13 years ago
Michael Peter Christen e54ac38095 - some corrections in usage of getFile() and getFileName() 13 years ago
Michael Peter Christen 528d6763fa - added new solr fields: 13 years ago
Michael Peter Christen e8acd542b5 - added faceted drill-down for host and geolocation to solr queries 13 years ago
orbiter 67f2866cd0 small fixes 13 years ago
orbiter 67edfd991c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 13 years ago
orbiter d9173ba7ed added more solr fields to integrate values from URIMetadataRow. All 13 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time 13 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet 13 years ago
orbiter 482afed07c reduced logging overhead (a bit) 13 years ago
orbiter bbfa497a3c replaced more size() > 0 by !isEmpty() 13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty() 13 years ago
Michael Peter Christen 801972fe6f fix for url camel case parser and sentence reader 13 years ago
Michael Peter Christen fbc1a2030d fix for sitemap importer: can now also import very large sitemaps within 13 years ago
Michael Peter Christen 92731e5287 fix for sevenzip parser 13 years ago
Michael Peter Christen 8efc1c1078 - fixed a memory leak (or bad usage) during parsing/snippet fetch 13 years ago
Michael Peter Christen b1e7c11fba fix for pattern matcher in html parser 13 years ago
Michael Peter Christen b0c408788b made class methods static where possible 13 years ago
Michael Peter Christen 7c1ba99755 removed more unused method parameters 13 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters 13 years ago
Michael Peter Christen d3964253ae - added @SuppressWarnings to unused servlet method parameters 13 years ago
Michael Peter Christen ea10766bfd cleaned unnecessary nested code 13 years ago
orbiter fc0f9543fe More SentenceReader cleanup 13 years ago
orbiter 586bb0eb6a Simplified SentenceReader (no more Reader inside..) 13 years ago
orbiter 7f851d62a7 replaced HashARC with SizeLimited Objects which are less costly 13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one 13 years ago
orbiter bb8dcb4911 automatically adopt size of word cache to available memory 13 years ago
Michael Peter Christen ad09b786bf clean up parser data 13 years ago
Michael Peter Christen 276a66a793 Adding a limit of 1000 links that a parser shall store during indexing. 13 years ago
Michael Peter Christen de903a53a0 parser refactoring & hacks 13 years ago
Michael Peter Christen 1825f165b8 better integration of blacklist according to use case 13 years ago
Michael Peter Christen ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'. 13 years ago
Michael Peter Christen 0c345d1559 giving threads name so its easier to see whats happening during 13 years ago
Michael Peter Christen 508a81b86c added solr field 'refresh_s' which stores the refresh url contained in 13 years ago
Michael Peter Christen f3167def64 do not fill the keywords with title content if keywords do not exist. 13 years ago
Michael Peter Christen 77f795756c fixing redirects and status codes: storing of status code in 13 years ago
Michael Peter Christen dbdd697f4d moved RDFaParser.xsl configuration file to defaults 13 years ago
Michael Peter Christen 786be7d175 better integration of RDFaParser 13 years ago
Michael Peter Christen de3ef8ad73 removed unimportant warnings 13 years ago
Michael Peter Christen 24bbe359ca integrate also geonames library files for less cities. these are more 13 years ago
Michael Peter Christen 223a5440ab preventing that an empty pnd is inserted into the vocabularies 13 years ago
Michael Peter Christen 963f92ed9a - merged files 13 years ago
Michael Peter Christen dd88d0ace2 more logging 13 years ago
Michael Peter Christen 94d54e2d91 added recognition of multi-word terms in vocabulary matching 13 years ago
Michael Peter Christen 64c0268b2b show triplestore metadata in yacydoc and viewfile 13 years ago
Michael Peter Christen c2f0d16d2c fixed vocabulary initialization 13 years ago
Michael Peter Christen df3531f8d5 added the generation of virtual vocabularies using the pnd 13 years ago
Michael Peter Christen a0f1decd82 - added loading of the dbpedia pnd triplestore in the dictionary loader 13 years ago
Michael Peter Christen 16d8f33795 added objectlink generation to vocabulary generation and editor 13 years ago
Michael Peter Christen d45718251e refactoring (Localization -> Location) 13 years ago
Michael Peter Christen b8b3c87ba7 - renamed localization to location (that was confusing) 13 years ago
Michael Peter Christen e89747bb67 - added automated generation of vocabularies from url stubs 13 years ago
Michael Peter Christen 79464189a4 The 'Locale' vocabulary, which is generated by geo data, has now the 13 years ago
Michael Peter Christen 61bb52d55c - using http://purl.org/dc/terms/references to refer from an 13 years ago
Michael Peter Christen 50c576599b allow multiple parser options instead of printing an error 13 years ago
Michael Peter Christen 8b53771db2 changed behavior of navigation processing: 13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there 13 years ago
cominch bbfc53b663 bugfix 13 years ago
cominch 65c5826d93 bugfix 13 years ago
cominch 5f8ba7f4f2 small changes 13 years ago
cominch 90512640bf Added config switches for custom parser 13 years ago
cominch bcbd8eee33 Add several parsers, for RDFa and rdf files. 13 years ago
cominch 9cbfc1a1c0 augmentedProxy, which forwards every proxy request to a 13 years ago
Michael Peter Christen cde20911bb saved a bit more ram using UTF8 String compression for OpenGeoDB and 13 years ago
Michael Peter Christen 225ee42879 made the GeoLocation into an interface with the current 13 years ago
Michael Peter Christen 96e9d77270 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 13 years ago
Michael Peter Christen 96c8119b50 added GeoLocation / GeoPoint classes which uses less memory than 13 years ago
Michael Peter Christen 461a0ce052 removed warnings 13 years ago
Michael Peter Christen 2fe207f813 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 13 years ago
Michael Peter Christen 514700291a moved Vocabulary to cora package (added in git 13 years ago
Michael Peter Christen 0284a4d88f more fixes for double precision of coordinates 13 years ago
Michael Peter Christen 964406ad17 added concurrency enhancement to xml parser 13 years ago
Michael Peter Christen e0d8643226 - performance hacks 13 years ago
Michael Peter Christen 6e83b02b83 - bugfix for surrogate file reader 13 years ago
Michael Peter Christen 9b4c699526 ehanced location search: 13 years ago
Michael Peter Christen 4d3cc02168 replaced old bzip2 library against better documented commons-compress 13 years ago
Michael Peter Christen c15fcde1c8 add-on to latest commit 13 years ago
Michael Peter Christen 81737dcb18 removed stack trace from swf parser since we cant do anything there 13 years ago
Michael Peter Christen acf8d521a2 fix for http://bugs.yacy.net/view.php?id=126 13 years ago
Michael Peter Christen 89142d1e8d removed (not all) warnings 13 years ago
Roland 'Quix0r' Haeder a093ccf5eb Now used synchronization in all close() methods to make sure all objects 13 years ago
Michael Peter Christen ba6aaabc51 refactoring + parser bugfixes 13 years ago
Michael Peter Christen 09484955dc added new entry class for embed tags 13 years ago
Michael Peter Christen 453010bd68 - solved problems with backpath normalization 13 years ago
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD 13 years ago
Michael Peter Christen f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not 13 years ago
Michael Peter Christen a1a5b015d8 refactoring: moved document Classification to cora package 13 years ago
Michael Peter Christen 4d5da75814 fix for parser problem if a <a>-tag is 'within' html tags with unclosed 13 years ago
Michael Peter Christen 046f3a7e8d check if httpc has decompressed the release file and rename the file 13 years ago
Michael Peter Christen e101c2e0e2 added changes from copperdust (submitted by email): 13 years ago
Michael Peter Christen 8d63a5887c bugfixes 13 years ago
Michael Peter Christen 9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a 13 years ago
Michael Peter Christen 7e4e3fe5b6 free some memory after parsing html 13 years ago
Michael Peter Christen 4540174fe0 memory hacks 13 years ago
Michael Peter Christen 2e5cd6a1b2 fixed parser extension deny list generation and usage 13 years ago
Michael Peter Christen 8bee1472c9 there is no noindex, only nofollow in links 13 years ago
Michael Peter Christen c560a582ac fix for single-word vocabulary lines 13 years ago
Michael Peter Christen ef78f22ee1 performance hack 13 years ago
Michael Peter Christen 1f4f60654a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 13 years ago
reger 32104360ce PDFParser - return at least first 3 pages of PDF 13 years ago