Commit Graph

263 Commits (32ca669bfbbf4f881fa98d097c01f5fa90f4078b)

Author SHA1 Message Date
Michael Peter Christen 43f3345c90 - removed dependencies from URIMetadataRow and made direct access to
12 years ago
Michael Peter Christen 21fe8339b4 - enhanced generation of url objects
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 2f536cb54d code cleanup: removed unised methods and made more methods and objects
12 years ago
Michael Peter Christen 24d2ee3c52 - better date ranking
12 years ago
Michael Peter Christen ca313e404f - if a "/date" modifier is used, the solr remote query applies an
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 872f83ebe0 refactoring
12 years ago
Michael Peter Christen 8219a445f3 refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
orbiter 563d584420 removed more dependencies in cora from kelondro
12 years ago
Michael Peter Christen d8425e6809 added collections to crawl monitor
12 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
12 years ago
Michael Peter Christen 316b5fe116 - added a solr type definition verifier
12 years ago
Michael Peter Christen e8acd542b5 - added faceted drill-down for host and geolocation to solr queries
12 years ago
Michael Peter Christen 4716546ef5 - reduced memory usage in index transmission using a transformation of
12 years ago
Michael Peter Christen 06b0081fdc fix for NPE during host navigation computation
12 years ago
orbiter acb9f04e80 removed unused classes
12 years ago
Michael Peter Christen 755f5e76cf removed strange assert statements and simplified code in metadata
12 years ago
orbiter ee01c12e56 fixes for putDocument and putMetadata
12 years ago
Michael Peter Christen f9fc5cfaba better check for bad urls in url transmission
12 years ago
Michael Peter Christen 40c0856489 refactoring
12 years ago
Michael Peter Christen 9bece5ac5f enhanced snippet fetch - removed a bug that caused documents to be
12 years ago
Michael Peter Christen 395b78a0d8 using the solr search index to concurrently search within solr and the
12 years ago
Michael Peter Christen e5ef840f40 - renamed DoubleSolrConnector to MirrorSolrConnector and added a
12 years ago
Michael Peter Christen 94a334f128 another fix to the Solr metadata reading process and to the shutdown
12 years ago
Michael Peter Christen b51df6c7e8 - added coordinate storage in solr schema
12 years ago
Michael Peter Christen f9c0e6e950 - Implemented and integrated the URIMetadataNode object which is a
12 years ago
Michael Peter Christen dcc72799c4 better abstraction for result writers using controlled vocabularies and
12 years ago
Michael Peter Christen a12f693ec9 added two response writer for embedded solr interface:
12 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet
12 years ago
orbiter 69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
12 years ago
Michael Peter Christen f78ce93a80 collection of speed and memory saving hacks
13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen b0c408788b made class methods static where possible
13 years ago
Michael Peter Christen 5bd3c90907 - removed unnecessary semicolons
13 years ago
Michael Peter Christen 7c1ba99755 removed more unused method parameters
13 years ago
Michael Peter Christen 83701a1b4c removed unused ImageReference package
13 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters
13 years ago
Michael Peter Christen ea10766bfd cleaned unnecessary nested code
13 years ago
Michael Peter Christen 613b45f604 - better data structures in secondary search
13 years ago
Michael Peter Christen 8a82609360 - smaller caches to save memory
13 years ago
Michael Peter Christen ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'.
13 years ago
Michael Peter Christen 0c345d1559 giving threads name so its easier to see whats happening during
13 years ago
Michael Peter Christen 9264d8b4af removed old navigation practice using subject tags in favor of
13 years ago
Michael Peter Christen 61bb52d55c - using http://purl.org/dc/terms/references to refer from an
13 years ago
Michael Peter Christen 8b53771db2 changed behavior of navigation processing:
13 years ago
Michael Peter Christen 407fdf6968 more bug fixes and performance hacks for search process
13 years ago
Michael Peter Christen e0d8643226 - performance hacks
13 years ago
Michael Peter Christen 9b4c699526 ehanced location search:
13 years ago
Michael Peter Christen 10da7335ea performance hack: use a hash cache for all hashes that are computed by a
13 years ago
Michael Peter Christen 7c1feefb28 introduced a default 10 second time-out in rwi normalization time
13 years ago
Michael Peter Christen 7e0ddbd275 added a "fromCache" flag in Response object to omit one cache.has()
13 years ago
Michael Peter Christen f294f2e295 bugfix to http://bugs.yacy.net/view.php?id=181
13 years ago
Michael Peter Christen acf8d521a2 fix for http://bugs.yacy.net/view.php?id=126
13 years ago
Michael Peter Christen 15db703808 added missing serialization to remove all warnings
13 years ago
Michael Christen e32055aa15 added stub classes for
13 years ago
Michael Peter Christen 2fc8ecee36 ConcurrentLinkedQueue has a VERY long return time on the .size() method.
13 years ago
Michael Peter Christen 1cd711d005 added classes for citation references (for new citation ranking)
13 years ago
Michael Peter Christen e0f1e7d904 added new citation reference data structure that shall be used for a
13 years ago
Michael Peter Christen 4540174fe0 memory hacks
13 years ago
Michael Peter Christen e2f8f263e8 changed storage of search words: keep order
13 years ago
Michael Peter Christen 2ea585d616 fix for host navigator
13 years ago
Michael Peter Christen 4901cee3cc suppress auto-tagged subject entries when sending out or receiving
13 years ago
Michael Peter Christen b7bb84c0bb set a limit to CharBuffer object size to fight against bad/too large
13 years ago
Michael Christen 20962a4ed7 added metadata node stub for metadata from blobs
13 years ago
Michael Christen 9e5894c784 Removed handling of components objects for URIMetadataRows.
13 years ago
Michael Christen 1f4afb4dc0 performance hacks
13 years ago
Michael Christen e9dc99fe15 added rules to set specific RWIs as private RWIs which are not
13 years ago
Michael Peter Christen 0bcef2d156 added feature as requested in
13 years ago
Michael Christen 204c29f010 small bugfixes for search result display and cache display
13 years ago
Michael Christen 86b3385847 fixed a deadlock during secondary remote search
13 years ago
orbiter 0cf9ebc3b0 speed enhancements when parsing RWI rows (makes search slightly faster)
13 years ago
orbiter 709013385a fix for language fix
13 years ago
orbiter c0c6e9e7a5 fix for bad language encoding
13 years ago
orbiter 0d858d48ec replaced String with StringBuilder in suggestion process
13 years ago
orbiter 813f297a95 another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
13 years ago
orbiter 035ebfbf3b - performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter 734059d33e performance hacks
13 years ago
orbiter 52230a6864 replaced catching of Exception with Throwable, which catches also Errors
13 years ago
orbiter 1912d0cccc changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
14 years ago
orbiter 31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
14 years ago
orbiter 0c1b29f3c9 - applied many small performance hacks
14 years ago
orbiter fe0c08455b more concurrency (enhancement) hacks
14 years ago
orbiter 87082f407e less String object creation during search
14 years ago
orbiter a36fda991e hack to increase speed of url hash computation
14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 10e2f588f8 - enhanced ybr ranking computation
14 years ago
orbiter 3ed4a09368 small features, some bug fixes and performance hacks
14 years ago
orbiter 123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
14 years ago
orbiter 0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
orbiter 61acf55da4 avoided using a synchronized(this) for the hash computation to prevent that the lock on the object is (accidently) stolen by another thread and replaced this synchronization using the protocol object. Made also the protocol object final.
14 years ago
orbiter 078ecacf61 avoid synchronization in DigestURI hash requests
14 years ago
orbiter 1989ebc24b removed more warnings
14 years ago
orbiter dc0db3550e avoid string conversion
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
low012 3b40b98256 *) set SVN properties
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 7138f4036b less synchronization, better thread dump tool
14 years ago
orbiter 8d14916c74 more patches for a better out-of-memory management
14 years ago
orbiter 993b9bc1a8 memory/performance hacks, less synchronization, better concurrency
14 years ago
orbiter 5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
14 years ago
orbiter 19b2a50578 - enhanced date formatter cache
14 years ago
orbiter 5e45ded8e2 - removed locks from WordReference
14 years ago
orbiter cd19d0517e added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used
14 years ago
orbiter 431f780f41 patch for bad data in url metadata
14 years ago
orbiter 10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
14 years ago
orbiter 99a7fe87f9 - removed old intranet scanner (the generic scanner now completely subsumes the old one)
14 years ago
orbiter a563b05b60 enhanced crawler:
14 years ago
orbiter f0651e5f2f added image search to yacyinteractive.html
14 years ago
low012 9b3fae9496 *) cleaning up the code a little bit
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter 0d363a94d7 more performance hacks
14 years ago
orbiter 091dd3f6ec - enhanced intranet search speed
14 years ago
orbiter aacf572a26 - enhancements for search speed
14 years ago
orbiter e54cb7fb0c more bugfixes (also for latest commit)
14 years ago
orbiter be6b48311c misc bugfixes
14 years ago
orbiter 0cf006865e refactoring and enhanced concurrency
14 years ago
orbiter 83ac07874f - corrected return value of put() methods (not used anywhere, so it did not harm before)
14 years ago
orbiter 39f409a7bb performance hacks
14 years ago
orbiter 348dece62f redesign of the SortStack and SortStore classes:
14 years ago
orbiter 24502fe3de performance hacks
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 7e2d6fac12 patch for bad values during local search join
15 years ago
orbiter 87087f12fe - scanned remote search process and enhanced some data structure and synchronizations here and there
15 years ago
orbiter de4f30bb2e UTF-8 fix
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter 6950d8a33d fixes to SMB crawler
15 years ago
orbiter cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
15 years ago
orbiter c45117f81f fixed dates in metadata
15 years ago
orbiter 90c3e5d6f6 - cleanup, removed unused imports
15 years ago
orbiter 3a50b5aa04 enhanced object hash computation
15 years ago
orbiter 1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
15 years ago
orbiter 2e26744f4e more concurrency when normalizing RWI entries + cleanup
15 years ago
orbiter 70e6222978 more concurrency during search requests
15 years ago
low012 dc93cec3a8 *) Java 1.5 compatibility (see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=2764)
15 years ago
orbiter 67ec58d8e7 search performance enhancement
15 years ago
hermens ef467a0303 Another workaround for the second part of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2770
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
orbiter bb63c5d075 using a Pattern object with precompiled regular expressions to apply must-match constraints to search results: should speed up pre-sorting of search results and should cause richer search result sets
15 years ago
orbiter 1198b9989d bugfixes, more sorttable
15 years ago
orbiter de01fe0e6d fix for bug in url parser
15 years ago
orbiter 1bbe14d23f SVN 6716 unfortunately contained parts of the unfinished SMB integration. To fix compile errors the remaining parts of the SMB implementation stub is added with this commit.
15 years ago
orbiter 30c8185139 fix for sid check
15 years ago
orbiter ef62d017e5 integrated session id filtering for crawler
15 years ago
orbiter d8d9984913 added framework for session id filtering (not ready yet)
15 years ago
orbiter 7fdf59a77f misc NPE check
15 years ago