Commit Graph

209 Commits (04ca24d7dc45994dae9aa90334b18059456c3b30)

Author SHA1 Message Date
Michael Peter Christen b51df6c7e8 - added coordinate storage in solr schema
12 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
12 years ago
Michael Peter Christen cba4ab862e fix for http://bugs.yacy.net/view.php?id=202
12 years ago
orbiter 69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
12 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen ea10766bfd cleaned unnecessary nested code
13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
13 years ago
Michael Peter Christen 1825f165b8 better integration of blacklist according to use case
13 years ago
Michael Peter Christen 03280fb161 removed segments-concept and the Segments class:
13 years ago
Michael Peter Christen 9116013c64 - allow lazy initialization of solr value (if using 'lazy', then no
13 years ago
Michael Peter Christen 52f5d40043 better abstraction of document model generation
13 years ago
Michael Peter Christen 64c0268b2b show triplestore metadata in yacydoc and viewfile
13 years ago
Michael Peter Christen 8b974905ee changed log-in text for all servlets with authentication:
13 years ago
Michael Peter Christen a3badd3205 changed search process for images: no more media snippet load process,
13 years ago
Michael Peter Christen 33d1062c79 refactoring: the cache belongs to the crawler
13 years ago
Michael Christen 9e5894c784 Removed handling of components objects for URIMetadataRows.
13 years ago
Michael Christen 204c29f010 small bugfixes for search result display and cache display
13 years ago
orbiter e22f8497c9 - tested the ARC methods
13 years ago
orbiter 5a55397f99 some last-minute performance hacks
13 years ago
orbiter 0d858d48ec replaced String with StringBuilder in suggestion process
13 years ago
orbiter 37e35f2741 normalization of url using urlencoding/decoding
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter b00e69c5df removed test output
13 years ago
orbiter 5dd2efc9a2 - bugfixes in html parser
13 years ago
sixcooler 59b767eebd stop loading via http at defined maximum of bytes - even size is unknown before loading
13 years ago
orbiter 115abc8917 - more attributes for search progress bar
14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 5b579e21a3 code cleanup
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
low012 2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
low012 3d95981f7d *) cleaning up the code a little bit
14 years ago
f1ori 9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
14 years ago
orbiter 7bb4b001ed - view image files from cache
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter 58e74282af added a word counter statistic in condenser which is used by the did-you-mean to calculate best matches for given search words.
14 years ago
mikeworks 61e87c0b14 IndexControlRWIs_p.html, IndexControlURLs_p.html, ViewFile.html/.java: changes to HTML output and   in case of empty values for XHTML strict / transitional validation
14 years ago
orbiter 10a9cb1971 simplified snippet computation process and separated the algorithm into two classes
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter b6fb239e74 redesign of parser interface:
15 years ago
orbiter 777195e8d1 more abstraction for access of LoaderDispatcher and cache
15 years ago
orbiter 7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter 9842fab6e4 - fixes to query parameter
15 years ago
orbiter 2a8f70f0ca - fix for caching of OSM tiles. if you want that this fix applies to your peer, please delete the crawl profiles
15 years ago
orbiter 2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
15 years ago
orbiter c45117f81f fixed dates in metadata
15 years ago
orbiter 06ff0c5b06 fixes for metadata retrieval and presentation
15 years ago
orbiter 7ab207d93a better presentation of search result metadata and fixes to htcache loading
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 0f8004f9da enhanced html parser to recognize a href tags inside header tags
15 years ago
orbiter 3300930fc5 - (almost) fixed FTP crawler
15 years ago
orbiter 61493a9a9f added more information about metadata in ViewFile.html
15 years ago
orbiter c4bdb1e7f2 added one more option in ViewFile to show an iframe like for the orginal web page content but using the cache than the direct link to the content in the web. Upgraded the very old and previously not any more used CacheResource_p servlet to a new and working version.
15 years ago
orbiter 270fb38674 - fixed some bugs in Table viewer
15 years ago
orbiter 38d7a28cd2 fix in viewfile needed when ViewFile is called only with 'url' parameter
15 years ago
orbiter fe41a84330 some enhancements in web caching: avoid double loading of response metadata and/or content
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter 5e8038ac4d - refactoring of blacklists
15 years ago
orbiter 26fafd85a5 - more refactoring
15 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
15 years ago
orbiter 5841ee83d3 refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter 735e2737e3 * added index segments
15 years ago
orbiter 6aa474f529 - better logging for web cache access and fail reasons
15 years ago
orbiter 72e5407115 refactoring of snippet cache
15 years ago
orbiter 161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago
orbiter 0e8647d62f refactoring of search classes
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
orbiter 409538e17a code cleanup and code simplifcation
16 years ago
orbiter 222850414e simplification of the code: removed unused classes, methods and variables
16 years ago
orbiter 88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter a642d6a7b5 - added navigation icons for search result pages
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago
orbiter 14a1c33823 refactoring of wordIndex class
16 years ago
orbiter 396a4451be increased timeout in ViewFile
16 years ago
orbiter aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
16 years ago
orbiter 76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
16 years ago
orbiter c12bb8a6d0 - refactoring of the http client
16 years ago
orbiter b57c9da1f8 - fixes to doc, ppt, xls parser: better title
16 years ago
orbiter 94110df85a moved logging partially to kelondro
16 years ago
orbiter c4c4c223b9 fixed a problem with attribute flags on RWI entries that prevented proper selection of index-of constraint
16 years ago
orbiter 47292e696a more performance hacks
16 years ago
orbiter 0edec2b760 FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html.
16 years ago
orbiter 47f0c3b002 replaced the cacheAdmin with the ViewFile servlet, because the cacheAdmin was an interface to the old HTCACHE data structure which does not exist any more. Changed links to point to the ViewFile servlets.
16 years ago
low012 77e41da7d2 *) further propagation of display value (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1536)
16 years ago
orbiter 536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
16 years ago
orbiter 7989335ed6 Preparations to replace the HTCache with a new storage data structure:
16 years ago
danielr 3bb870bfcd added final where possible
17 years ago
orbiter c3d461d191 - removed superfluous copyright statement
17 years ago
orbiter 3ca98fee42 removed superfluous copyright statement
17 years ago
danielr 7feae906aa - organize imports
17 years ago
orbiter cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
17 years ago