Commit Graph

435 Commits (ec927ea72b506f86cae977e2b95ccb0a39f19c2c)

Author SHA1 Message Date
Michael Peter Christen 6197caf698 added clear-text search words in query params
12 years ago
Michael Peter Christen 23226676c6 FOR THE BRAVE.. this is a forced migration to solr which is now ready
12 years ago
Michael Peter Christen d988ba50cf added a very rudimentary, incomplete, non-verified GSA response writer
12 years ago
Michael Peter Christen aab0b680c3 - added xslt support for solr result formats.
12 years ago
Michael Peter Christen e5ef840f40 - renamed DoubleSolrConnector to MirrorSolrConnector and added a
12 years ago
Michael Peter Christen b51df6c7e8 - added coordinate storage in solr schema
12 years ago
Michael Peter Christen da851c6071 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen bd4f03bc85 removed unused class
12 years ago
orbiter 39f8eb60c3 tried to prevent calls to bad-hack getSize() method and reduced overhead
12 years ago
orbiter e816b88b55 changed behaviour of metadata storage: in case that any solr is
12 years ago
orbiter 2571e0d47a removed unused classes
12 years ago
Michael Peter Christen f9c0e6e950 - Implemented and integrated the URIMetadataNode object which is a
12 years ago
Michael Peter Christen 136fcb1ad9 refactoring
12 years ago
Michael Peter Christen a12f693ec9 added two response writer for embedded solr interface:
12 years ago
Michael Peter Christen bca4a16603 replaced the multivalue generic string field name suffix _ss by _txt
12 years ago
orbiter 67edfd991c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter d9173ba7ed added more solr fields to integrate values from URIMetadataRow. All
12 years ago
Michael Peter Christen 3ce04cecf3 bad hack to prevent a bug appearing in solr
12 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
12 years ago
Michael Peter Christen ef488a15f7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet
12 years ago
sixcooler 76b037a20a check content domain fix:
12 years ago
Michael Peter Christen 3bcd9d622b cleaned up classes and methods which are either superfluous at this time
12 years ago
Michael Peter Christen 6f1ddb2519 Moved solr index-add method to the same method where the YaCy index is
12 years ago
Michael Peter Christen 315d83cfa0 cleanup
12 years ago
Michael Peter Christen 76202f068e extended abstraction of local and remote solr index using one front-end
12 years ago
Michael Peter Christen 826967513b changed options in IndexFederated_p to switch on/off parts of the index
12 years ago
orbiter 69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
12 years ago
orbiter 05a3ffd03a patches to ensure that solr connectors are active ony if they have a
13 years ago
orbiter 5a3c829872 embedded solr is only initiated if it is activated with
13 years ago
Michael Peter Christen 97b7bcf2a6 added a solr search index
13 years ago
orbiter c00a3cf74d less usage of generic logger to avoid logger generation overhead
13 years ago
orbiter e76159040b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
orbiter bbfa497a3c replaced more size() > 0 by !isEmpty()
13 years ago
Michael Peter Christen 58e7d1952f reduction of logging to prevent too much IO caused be logging
13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
orbiter c7afa8bc48 using SwitchboardConstants for solr attributes
13 years ago
orbiter c6d8950651 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
orbiter 62202e2d71 refactoring of query attribute variable names for better consistency
13 years ago
Michael Peter Christen d09d9f2364 filter old peers from bootstrap (now stronger: 60 minutes instead of
13 years ago
Michael Peter Christen b0c408788b made class methods static where possible
13 years ago
Michael Peter Christen 7c1ba99755 removed more unused method parameters
13 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters
13 years ago
Michael Peter Christen 241dd8410a removed snippet pattern filter - it was not used
13 years ago
Michael Peter Christen d3964253ae - added @SuppressWarnings to unused servlet method parameters
13 years ago
Michael Peter Christen ea10766bfd cleaned unnecessary nested code
13 years ago
orbiter fc0f9543fe More SentenceReader cleanup
13 years ago
orbiter d4291ac1f3 more tolerance when creating solar document
13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
13 years ago
Michael Peter Christen 613b45f604 - better data structures in secondary search
13 years ago
Michael Peter Christen de903a53a0 parser refactoring & hacks
13 years ago
Michael Peter Christen 8a82609360 - smaller caches to save memory
13 years ago
Michael Peter Christen 7249d9c9de bugfix for concurrent seed loader
13 years ago
Michael Peter Christen c72d3b12cd concurrently initialize the seed list during p2p network bootstrap
13 years ago
Michael Peter Christen 1825f165b8 better integration of blacklist according to use case
13 years ago
Michael Peter Christen c18fa9fa75 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
13 years ago
Michael Peter Christen ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'.
13 years ago
Michael Peter Christen 0c345d1559 giving threads name so its easier to see whats happening during
13 years ago
reger 067728bccc add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages)
13 years ago
Michael Peter Christen 03280fb161 removed segments-concept and the Segments class:
13 years ago
Michael Peter Christen 508a81b86c added solr field 'refresh_s' which stores the refresh url contained in
13 years ago
Michael Peter Christen 9116013c64 - allow lazy initialization of solr value (if using 'lazy', then no
13 years ago
Michael Peter Christen 0294a53459 - add canonical field only if requested by solr schema
13 years ago
Michael Peter Christen 3fd4a01286 added option to record urls that are forwarded to the solr index
13 years ago
Michael Peter Christen 96aeb127e3 generalized localhost naming.
13 years ago
Michael Peter Christen 77f795756c fixing redirects and status codes: storing of status code in
13 years ago
Michael Peter Christen 8dd469b9dd added option to configure the autocommit delay time of solr on-the-fly
13 years ago
Michael Peter Christen b9dfca4b0a - fixed IndexFederated Servlet / a embedded Solr can now be selected
13 years ago
Michael Peter Christen fad3b14813 added jetty libraries, needed for future use as web server and as
13 years ago
Michael Peter Christen a38b0a2c46 extended embedded solr tests to ensure that it will be usable within a
13 years ago
Michael Peter Christen b9d42fd9c8 using com.google.common.io.Files instead of homebrew methods
13 years ago
Michael Peter Christen a5eb91fa60 refactoring
13 years ago
Michael Peter Christen 1be0025a9c - added test for EmbeddedSolrConnector
13 years ago
Michael Peter Christen e12bb254b4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 3f55dc7c1e - added solr core and libraries that solr needs (lucene is missing, will
13 years ago
Michael Peter Christen 786be7d175 better integration of RDFaParser
13 years ago
Michael Peter Christen 0752983fbd - automatic periodic saving of triplestore
13 years ago
Michael Peter Christen 9264d8b4af removed old navigation practice using subject tags in favor of
13 years ago
Michael Peter Christen 64c0268b2b show triplestore metadata in yacydoc and viewfile
13 years ago
cominch a95127c9af Triplestore: initalize per-user triplestores
13 years ago
Michael Peter Christen e89747bb67 - added automated generation of vocabularies from url stubs
13 years ago
Michael Peter Christen 8b53771db2 changed behavior of navigation processing:
13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
13 years ago
Michael Peter Christen 4ee6fb1de9 added missing blacklist dht cache storage (maybe due to mistakes in
13 years ago
Roland 'Quix0r' Haeder edaa09b9b1 Rewrote all String blacklist types to enum 'BlacklistType', closes bug
13 years ago
Roland 'Quix0r' Haeder af5a597e47 Scroogle is not comming back, remove dead code
13 years ago
cominch 65c5826d93 bugfix
13 years ago
Michael Peter Christen cde20911bb saved a bit more ram using UTF8 String compression for OpenGeoDB and
13 years ago
Michael Peter Christen 2280a7b276 - changed initialization order to prefer allocation of memory for table
13 years ago
Michael Peter Christen 0746308bc2 only the metadata tables shall be able to use the tail cache
13 years ago
Michael Peter Christen 41c02cb10e - less restrictions for usage of Table RAM copy
13 years ago
Michael Peter Christen dd14b19c26 lazy initialization of block rank table ... only normal web search uses
13 years ago
Michael Peter Christen 701b9a28a0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen ab7107b34b fixed RWIProcess queue limits: now discovering hidden results for mass
13 years ago
Michael Peter Christen b0095c8d3c flush the compressor cache when a cleanup is done
13 years ago
Michael Peter Christen a61f44f9e4 lazy initialization of block rank table.
13 years ago
Michael Peter Christen 96e9d77270 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 00f2df1120 a variety of possible memory leak fixes
13 years ago
Michael Peter Christen d0ec8018f5 fixes for bad long computation
13 years ago
Michael Peter Christen 461a0ce052 removed warnings
13 years ago
Michael Peter Christen 407fdf6968 more bug fixes and performance hacks for search process
13 years ago
Michael Peter Christen a1fe65b115 performance hacks
13 years ago
Michael Peter Christen 2fe207f813 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 5e562dcdb7 adopted vocabulary usage within anotation/naviagtion feature of search
13 years ago
Michael Peter Christen 240045cf7c fix for bad distance computation
13 years ago
Michael Peter Christen e0d8643226 - performance hacks
13 years ago
Michael Peter Christen 9b4c699526 ehanced location search:
13 years ago
Michael Peter Christen 834dc6b263 store more data from interface access
13 years ago
Michael Peter Christen 10da7335ea performance hack: use a hash cache for all hashes that are computed by a
13 years ago
Michael Peter Christen 7c1feefb28 introduced a default 10 second time-out in rwi normalization time
13 years ago
Michael Peter Christen c846e9ca14 redesign of the crawler monitor page: show crawled pages instead of
13 years ago
Michael Peter Christen c15fcde1c8 add-on to latest commit
13 years ago
Michael Peter Christen cf47d94888 performance hack to parse numbers inside of substrings without actually
13 years ago
Michael Peter Christen 7e0ddbd275 added a "fromCache" flag in Response object to omit one cache.has()
13 years ago
Michael Peter Christen 7bf421b9dd - fixed image search page navigation
13 years ago
Michael Peter Christen fb94b47b1a changed queue sizes to have less memory occupied during indexing
13 years ago
Michael Peter Christen 76157dc2c3 bugfix for http://bugs.yacy.net/view.php?id=173
13 years ago
Michael Peter Christen c6558cba08 more classification bugs
13 years ago
Michael Peter Christen 082831b9d6 search contentdom was checked in wrong way - fixed
13 years ago
reger ee553d971e correct typo in scripts_txt comment
13 years ago
Michael Peter Christen f294f2e295 bugfix to http://bugs.yacy.net/view.php?id=181
13 years ago
Michael Peter Christen acf8d521a2 fix for http://bugs.yacy.net/view.php?id=126
13 years ago
Michael Peter Christen bb88878b4d the last commit was incomplete..
13 years ago
Michael Peter Christen d320a31ae1 bugfix for http://bugs.yacy.net/view.php?id=186
13 years ago
Michael Peter Christen 3e1bc9477f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Roland 'Quix0r' Haeder d10627d591 More sync in close() methods
13 years ago
Roland 'Quix0r' Haeder b3ae2aa41f With or without 'final'? At least please try it in other methods
13 years ago
Roland 'Quix0r' Haeder fbb946f913 Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile
13 years ago
Michael Peter Christen 52d307c735 prevent that the snippet fectch process removes catchall entries
13 years ago
Michael Peter Christen 89142d1e8d removed (not all) warnings
13 years ago
Michael Peter Christen 5deebd02ea added serialization
13 years ago
reger b2175ea4ef Add possibility to set custom Solr field names for the YaCy default Solr attributes.
13 years ago
Michael Peter Christen e7e381d110 added configuration to switch off redirection following in crawler
13 years ago
Michael Peter Christen 2717c1b749 fixed bug in solr interface
13 years ago
Michael Peter Christen f150bc218b fixed bug in solr error document
13 years ago
Michael Peter Christen cb54c1737b solrj connector bugfix
13 years ago
Roland 'Quix0r' Haeder a093ccf5eb Now used synchronization in all close() methods to make sure all objects
13 years ago
Michael Peter Christen 0d58fea210 made multiple connector default
13 years ago
Michael Peter Christen adeb33bb36 better abstraction for solr objects
13 years ago
Michael Peter Christen 8864141872 more abstraction in solr connection classes
13 years ago
Michael Peter Christen c00efc2717 made the solr connection more generic
13 years ago
Michael Peter Christen ea2bd43b28 patch for broken configurations
13 years ago
Michael Peter Christen ba6aaabc51 refactoring + parser bugfixes
13 years ago
Michael Peter Christen 453010bd68 - solved problems with backpath normalization
13 years ago
Michael Peter Christen 5f5ed33ed8 patch for media search (audio, video apps)
13 years ago
Michael Peter Christen 19efbf1b0f - apply directDocByURL to NOLOAD Queue
13 years ago
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
13 years ago
Michael Peter Christen a3badd3205 changed search process for images: no more media snippet load process,
13 years ago
Michael Peter Christen f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
13 years ago
Michael Peter Christen 14f67f217c refactoring of ContentDomain: now subclass of Classification
13 years ago
Michael Peter Christen a1a5b015d8 refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen 33d1062c79 refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen 7b5b9baee0 added citation rank to ranking profile
13 years ago
Michael Christen 02e4dedff2 fix to url citation collection
13 years ago
Michael Christen e32055aa15 added stub classes for
13 years ago
Michael Christen ac5d124ee0 experimental implementation of a citation ranking as post-ranking
13 years ago
Michael Christen 8fc86fe397 added storage of full anchor link structure:
13 years ago
Lotus 0b3f39136e allow custom ppm lower than minimum button on /Crawler_p.html
13 years ago
Michael Peter Christen 8aba045ba1 if a new pop-up page is set in config portal, then this page applies
13 years ago
Michael Peter Christen 36e4d82b27 changed ranking
13 years ago
Michael Peter Christen 096c17e7cd added test code
13 years ago
Michael Peter Christen 9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a
13 years ago
Michael Peter Christen e2f8f263e8 changed storage of search words: keep order
13 years ago
Michael Peter Christen 2e5cd6a1b2 fixed parser extension deny list generation and usage
13 years ago
Michael Peter Christen 3cd6dcd352 do not add new solr fields as activated fields
13 years ago
Michael Peter Christen e3bb73c3d6 serialized some database access methods
13 years ago
Michael Peter Christen 355ecf330f reduced target file site to 64mb
13 years ago
Michael Peter Christen 2ea585d616 fix for host navigator
13 years ago
Michael Peter Christen 4c5edab1ec added option to have exception search result windows
13 years ago
Michael Peter Christen ef78f22ee1 performance hack
13 years ago
Michael Peter Christen 41536eb4a2 performance hack
13 years ago
Michael Peter Christen f91487fc50 added delete-button for host navigation
13 years ago
Michael Peter Christen e8d24fd802 author navigator can be switched off
13 years ago
Michael Peter Christen 558ab7bd4e made the protocol navigator reversible
13 years ago
Michael Peter Christen 96cb75f1d4 made the filetype navigator be able to deselect the search constraint
13 years ago
Lotus c73af39e54 refactoring of tray icon class,
13 years ago
Michael Peter Christen 4eff0e26f1 npe bugfix
13 years ago
Michael Peter Christen 1a0b6b3913 get more navigation details to search results
13 years ago
Michael Peter Christen 83009d86f7 added the vocabulary navigator. It can be very simply tested by
13 years ago
Michael Peter Christen 254adea51c small fixes
13 years ago
Michael Peter Christen c602eaaf46 enhanced search process
13 years ago
Michael Christen eff966f396 fix for search process (it was aborted too early during remote search)
13 years ago
Marek Otahal 72adbeae90 !Important: move from Hashtable to HashMap
13 years ago
Marek Otahal f40efb39af Blacklist loadList() remove duplicates by using Set
13 years ago
Michael Peter Christen 2ee8cbeb2c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 992dbdf4bb added noload statistic to servlets
13 years ago
Michael Christen 216a287a85 Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
13 years ago
stbrumm d18095dc48 Patch fuer Issue 0000102
13 years ago
Michael Christen 585a8f3c44 fixed a bug in search sequence (caused emtpy results)
13 years ago
Roland 'Quix0r' Haeder a3083d13bf Blacklist checks are now always turned on, in media searches (e.g. image search) images matching blacklist entries are no longer shown to the user
13 years ago
Michael Christen 52184a1170 fix for search process
13 years ago
Michael Christen 0797b0de99 new handling of remote search processes: looking for seeds will now not
13 years ago
Michael Christen 9e5894c784 Removed handling of components objects for URIMetadataRows.
13 years ago
Michael Christen c04bfaa51b refactoring
13 years ago
Michael Christen e9dc99fe15 added rules to set specific RWIs as private RWIs which are not
13 years ago
Michael Peter Christen 0bcef2d156 added feature as requested in
13 years ago
Michael Christen 3eccdca63c protection against too long running snippet fetch processes
13 years ago
Michael Christen 86b3385847 fixed a deadlock during secondary remote search
13 years ago
Michael Christen c715d19c09 fixes for dependency on svn
13 years ago
Michael Christen 0bc5d76bee ups
13 years ago
Michael Christen 044f83feed added some pauses into the search process which shall produce
13 years ago
Michael Christen f14faf503b better ranking because we wait a very little time during the search
13 years ago
orbiter f9216e388c - faster ping to clean up old peers faster
13 years ago
orbiter d9c066227a fix for npe
13 years ago
orbiter ebd840ebf6 - enhanced description on search front page
13 years ago
orbiter e22f8497c9 - tested the ARC methods
13 years ago
orbiter bc5df0eef5 updated ranking tables (fresh computation)
13 years ago
orbiter 5a55397f99 some last-minute performance hacks
13 years ago
orbiter c9216d5adf fixed secondary remote search (the process that finds distributed join situations)
13 years ago
orbiter 64fd20b857 new default ranking profile
13 years ago
orbiter 0cf9ebc3b0 speed enhancements when parsing RWI rows (makes search slightly faster)
13 years ago
orbiter ee8b1d4de1 fixed unresolved pattern and unwanted local/global switch when using votes on search results
13 years ago
orbiter c584db991f creating a bookmark from the search results now works again .. with new YMarks
13 years ago
orbiter 6cd27473f5 - better default values for caching and cache usage
13 years ago
orbiter 1019c36dad bug fixes and speed enhancements for search
13 years ago
orbiter 507c9d478d much better timing when search globally; less blocking; more results earlier!
13 years ago
orbiter 8e0b2c5832 fixed cluster search
13 years ago
orbiter 804e48888b smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter
13 years ago
orbiter 84c3fc9d97 local/global fixes in search, better abstraction
13 years ago
orbiter 06352b8d6b more logging
13 years ago
orbiter 017a01714d - enhanced logging in robots.txt parser for remote debugging
13 years ago
orbiter 3a15e58e28 - increased stability when opening the robots table
13 years ago
orbiter 78ce3b13be typo
13 years ago
orbiter 85d6bf4ac4 fixed urls to media content during indexing
13 years ago
orbiter 0d858d48ec replaced String with StringBuilder in suggestion process
13 years ago
orbiter 3a807e10cf - added a cache for active crawl profiles to the crawl switchboard
13 years ago
orbiter e58438c01c - added a new retry connector for solr (for cases where solr responses are slow)
13 years ago
orbiter 4ad9fc2bff new snippet strategy for search hits in metadata: show beginning of text instead of hit position
13 years ago
orbiter 5af9598bd1 enhanced exported row parsing during row import
13 years ago
orbiter a7df70221e refactoring
13 years ago
orbiter cf4fd525ee added directDocByURL attribute in crawl profile
13 years ago
orbiter 035ebfbf3b - performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
13 years ago
orbiter b250e6466d implemented crawl restrictions for IP pattern and country lists
13 years ago
orbiter 2c3161b4ac refactoring:
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago