Commit Graph

193 Commits (8eb6fba59c665f5ece3151f1c4688fda5a5ba2b9)

Author SHA1 Message Date
orbiter 039126cfaf better handling of on/off switched solr indexing
14 years ago
orbiter 3d5104d357 - fixed a bug in crawl start with file name (npe in new url)
14 years ago
low012 2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
f1ori fafab7a8fe * provide option to delete cached snippet fetching failures
14 years ago
orbiter 10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
14 years ago
low012 38fdf43587 *) renamed classes according to standard Java coding conventions
14 years ago
orbiter 790e0b1894 - enhanced index deletion in IndexControlRWIs_p: delete also robots.txt database and cache if demanded
14 years ago
orbiter 863065abc4 added user agent logging to access tracker
14 years ago
orbiter aacf572a26 - enhancements for search speed
14 years ago
orbiter d5dc88a351 shop cleanup button only if servlet was called without post/put arguments.
14 years ago
orbiter 97ee278931 enhanced search speed:
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
15 years ago
orbiter b18a7606a0 some performance hacks and fixed after reading dump in
15 years ago
orbiter 1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
orbiter 727dd9b193 - fixed a bug in robots.txt parser
15 years ago
orbiter 564927ce72 redesign of CrawlResult data structures because of OOM occurrences during URL deletion processes.
15 years ago
orbiter 362b7a929b added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
15 years ago
orbiter 4782d2c438 fix for search bug that appeared when looking at page 3 of results or further
15 years ago
orbiter 4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
15 years ago
orbiter 491ba6a1ba - some refactoring in workflow
15 years ago
orbiter 5399d1e2bc refactoring (reason: get more abstraction to use the blacklist class; for integration in other servlets)
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter 52470d0de4 - fix for xls parser
15 years ago
orbiter 5e8038ac4d - refactoring of blacklists
15 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
15 years ago
orbiter 5841ee83d3 refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter 1e4f8b56ed accumulated classes from different packages into the new rwi package
15 years ago
orbiter 4446acc8cd moved kelondro order
15 years ago
orbiter 735e2737e3 * added index segments
15 years ago
low012 5e4f267a36 *) added subversion properties and edited a few comments
15 years ago
orbiter af3a696fc4 added a fast-fail concept in search processes. The search now has better control if all the remote searches may bring any result. If all processes are finished, then all search tasks fail fast.
15 years ago
orbiter 61748285c3 more refactoring of search
15 years ago
orbiter 72ac5bd80f refactoring of search process.
15 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter 0e8647d62f refactoring of search classes
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
orbiter bc6dd8194b refactoring: moved search query class to new search package
16 years ago
orbiter 945777aa80 replaced rwi term counting method by one that computes the maximum of the blobs that contibute to the RWI. An addition of the blob sizes is wrong/incorrect and does not reflect the real size. Truncation the size operation to the maximum of all blobs is also incorrect, but not as wrong as the sum of all blob sizes wich double-counts many rwi entries.
16 years ago
orbiter 88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter 09987e93fd fixed some more bad handling of byte[]
16 years ago
orbiter c8624903c6 full redesign of index access data model:
16 years ago
orbiter 89ec3acb3e - full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
16 years ago
orbiter 44e01afa5b - refactoring
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago
orbiter 83792d9233 more refactoring
16 years ago
orbiter 209f25f5f5 refactoring to integrate indexCell data structures
16 years ago
orbiter 7f67238f8b refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection
16 years ago
orbiter 14a1c33823 refactoring of wordIndex class
16 years ago
orbiter aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
16 years ago
orbiter 6ffc6e3389 more refactoring of indexer and kelondro classes;
16 years ago
orbiter 76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
16 years ago
orbiter 94c42691d8 - reject less transmissions as transmission receiver
16 years ago
orbiter c25c334b75 replaced old DHT transmission method with new method. Many things have changed! some of them:
16 years ago
orbiter 7ee494fde5 more refactoring of kelondro:
16 years ago
orbiter bf93767ec6 refactoring of kelondro database classes
16 years ago
orbiter 7535fd7447 - refactoring of CrawlEntry and CrawlStacker
16 years ago
lotus 1545e5440a * index deletion: checkbox-confirmation
16 years ago
orbiter 3f746be5d4 - consolidation and refactoring of many DHT target - computing methods
16 years ago
lotus 029e16b653 replaced some put(String, String) by putHTML(String, String) on serverObjects respond
16 years ago
orbiter 536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
16 years ago
danielr 3bb870bfcd added final where possible
17 years ago
orbiter a6719dfd2b - refactoring of robots parser
17 years ago
danielr 7feae906aa - organize imports
17 years ago
orbiter 2f381b8d7a - fixed at least two causes for a NPE after a use case switch.
17 years ago
orbiter 25192e0d36 added a deletion button to indexControlRWIs that deletes the complete web index
17 years ago
orbiter cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
17 years ago
orbiter d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
17 years ago
orbiter 7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
17 years ago
orbiter d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
17 years ago
orbiter 541b817502 refactoring of switchboard queueing
17 years ago
orbiter bfed9c2da6 - some refactoring in search process
17 years ago
orbiter a8a5df4a51 - more dublin core naming of page metadata
17 years ago
orbiter 15397298dc - refactoring of indexControlRWIs: moved statics to own class; better Dublin Core naming
17 years ago
orbiter 85dc62c16f refactoring: more dublin core - compliant naming
17 years ago
orbiter ecd7f8ba4e - added NEAR operator (must be written in UPPERCASE in search query)
17 years ago
fuchsi 33ee6745f6 more cleanup in serverDate
17 years ago
orbiter 9e23acf2d6 introduced new 'authority' ranking property
17 years ago
orbiter b46bcaa5d8 changed method of profiling
17 years ago
orbiter c48b73cda2 redesign of ranking data structure
17 years ago
orbiter bf9a9e4e5e fix for NPE in IndexControlRWIs_p.java
17 years ago
orbiter c527969185 - enhanced monitoring of ranking parameters
17 years ago
orbiter bd5673efbe added cleaning of search event before opening the index administration
17 years ago
orbiter 55da871211 preparations for better ranking: better debugging of index properties
17 years ago