yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	36035e0a0a	- used reger's LukeRequest to generalize the index info in SolrServerConnector - used the LukeRequest in SolrServerConnector to replace the index size method by a getNumDocs request to a LukeRequest result	11 years ago
Michael Peter Christen	39fceb5ccf	fix for NPE & bug #264	11 years ago
Michael Peter Christen	735a66eff3	enhancements to crawler	11 years ago
Roland Haeder	be0ff6018f	Removed trailing spaces + some more final	11 years ago
Roland Haeder	aaedc0405d	Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Conflicts: source/net/yacy/repository/LoaderDispatcher.java	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Felix Ableitner	03044589dd	Fixed (?i) appearing in entries, fixed multiple equal lines in file.	11 years ago
Michael Peter Christen	89c0aa0e74	added collection_sxt to error documents	11 years ago
Michael Peter Christen	0df5195cb0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	1fd006cc56	fixes using the embedded connector	11 years ago
orbiter	d0dc86cf3d	logging of deadlocks (if any) during cleanup process	11 years ago
Michael Peter Christen	c6a6f159e8	fix for crawl stack domain counter	12 years ago
Michael Peter Christen	93d1bac140	do a more frequent optimization, reduces IO after optimization	12 years ago
orbiter	b71d13a014	added load and deadlock detector in Memory util	12 years ago
orbiter	290e24564b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	5533fc8e01	fix for bug 260	12 years ago
Michael Peter Christen	b79471ee67	grr	12 years ago
Michael Peter Christen	a79f288ac1	automatically running optimize on solr if user/search is idle for some time	12 years ago
orbiter	a9c8046c87	do a light optimization at the end of a crawl postprocessing	12 years ago
orbiter	a548354c71	replaced type of solr schema object sku of text_en_splitting_tight by string	12 years ago
orbiter	2f1ec8d4a2	npe fix	12 years ago
Michael Peter Christen	bcc623a843	refactoring of load_delay: this is a matter of client identification	12 years ago
orbiter	0d0b3a30f5	activate api actions after postprocessing of crawls	12 years ago
orbiter	3978c5ca5d	fix for http://bugs.yacy.net/view.php?id=255	12 years ago
orbiter	2be456e7fb	added a postprocessing field into api/status_p.xml to show if the postprocessing task is running at that time (status: busy) or not (status:idle)	12 years ago
orbiter	dac88561ae	minimum access time has a tight connection to ClientIdentification, therefore it is defined there.	12 years ago
Michael Peter Christen	9a29ab469e	another patch to prevent CLOSE_WAIT status on solr connections	12 years ago
Michael Peter Christen	5091d627bc	fixed parsing of peer flags	12 years ago
Michael Peter Christen	87e9052081	added Connection:close to all http requests in our http client to prevent CLOSE_WAIT states (as seen in lsof)	12 years ago
Michael Peter Christen	5c6946dd5f	replaced usage of log4j by ConcurrentLog where possible	12 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	f4f6551c66	better handling of time-out at solrj in case that a commit is done in a fail-over case during add	12 years ago
Michael Peter Christen	07261fe274	Merge remote-tracking branch 'nutomics/blacklist_structure'	12 years ago
Michael Peter Christen	dea71851d2	- better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index	12 years ago
Michael Peter Christen	a34e137e27	fix for citation index generation in case that entry.referrerhash() is null. This is especially the case if ftp sites are crawled	12 years ago
Michael Peter Christen	a2c8116a8f	accept (but ignore) a '+' sign in front of search words	12 years ago
orbiter	9f0cc9b401	enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains	12 years ago
sixcooler	308d73f855	do not use remote proxy if not switched on - regardless of the proto	12 years ago
sixcooler	69906b1d2e	Revert "do not use remote proxy if not switched on - regardless of the proto" This reverts commit `20f452d228`.	12 years ago
sixcooler	20f452d228	do not use remote proxy if not switched on - regardless of the proto	12 years ago
sixcooler	9551720d5c	re-enable saved setting for proxy-crawl-profile	12 years ago
sixcooler	d5d8936f9d	For indexes that are changing rapidly in NRT situations, fcs (stands for Field Cache per Segment) may be a better choice than the default fc. (saves memory) see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method	12 years ago
Felix Ableitner	44f8fcf62e	Changed class structure of Blacklist.	12 years ago
Michael Peter Christen	57ffdfad4c	added a crawl option to obey html-meta-robots-noindex. This is on by default.	12 years ago
Michael Peter Christen	5a5d411ec0	new robots_i attribute fields	12 years ago
Michael Peter Christen	fa08bd9d5a	hack to prevent long waiting times in crawler	12 years ago
Michael Peter Christen	f1c5338210	prepartion for greedy crawl profiles and refactoring	12 years ago
Michael Peter Christen	e6f361f474	adding the canonical tag to crawl queues	12 years ago
reger	a6bf44212e	bugfix: location (lat/lon) meta data retrival (Double.NaN check)	12 years ago
Michael Peter Christen	203921006a	redesign of citation index storage	12 years ago
reger	83763ee4a4	jpeg parser: extract GPS location from meta data	12 years ago
Michael Peter Christen	32aa1d4569	removed unused option for queries	12 years ago
Michael Peter Christen	9d291764d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
sixcooler	e5abccdfe4	added optimize-option	12 years ago
Michael Peter Christen	64140f35cd	fix for solr requests if no query part is given (prevent npe)	12 years ago
Michael Peter Christen	8caaf6203a	fixed false multiple-generation of remote facet search which caused high cpu usage on remote side.	12 years ago
Michael Peter Christen	823ae4d6a7	added url_protocol_s to error documents	12 years ago
Michael Peter Christen	660a196989	refactoring	12 years ago
Michael Peter Christen	c4538d8d91	added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib	12 years ago
reger	3760e2616b	bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments	12 years ago
Michael Peter Christen	9a6fcdf597	npe fix	12 years ago
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	12 years ago
reger	8d1c4c423d	make imageparser fileextension detection case insensitive (extensions are often upper case)	12 years ago
Michael Peter Christen	f9d859f5dc	now writing image alt texts and (camelcase-)parsed urls into a text search field for a better image retrieval	12 years ago
Michael Peter Christen	e441a9d4c8	to avoid confusion, the gsa api is available at /search? and /searchresult?	12 years ago
orbiter	8792e6c6e9	stub for better image indexing	12 years ago
orbiter	97f2ac9091	added hint to gsa response writer that the result comes from a yacy peer	12 years ago
Michael Peter Christen	14186e815e	npe fix	12 years ago
Michael Peter Christen	bdf306e0a7	increased time-out for loading of seed-lists	12 years ago
Michael Peter Christen	374d2e2a52	removed warning message during crawling	12 years ago
Michael Peter Christen	570511f3c8	removed fields references_internal_id_sxt and references_internal_url_sxt because they had been shown to be superfluous. The citation of referrer in the host browser is possible without them. Therefore now the host browser does not only show internal, but also external referrer to each link.	12 years ago
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	12 years ago
Michael Peter Christen	fc3ff92c69	npe fix	12 years ago
Michael Peter Christen	1762911f57	added synchronizations and timeouts in solr api; missing synchronizations in index modification methods causes deadlocks inside solr.	12 years ago
Michael Peter Christen	3e1e358fdc	calling pdf cache flush on class initialization because calling of the methods during runtime can conflict with dynamic solr class loader and cause a deadlock (seriously!)	12 years ago
Michael Peter Christen	291912ee52	removed misleading http accessGranted message (this is only for debugging)	12 years ago
Michael Peter Christen	2fd7bbb450	reduced load on solr; no seed update in Status and no exists-check in HTTPLoader in case of redirects, that can be done using the htcache.	12 years ago
Michael Peter Christen	2648b42b27	added fixed clear method as public method	12 years ago
Michael Peter Christen	ffc570f95f	removed forced soft commit since this may be the cause for a performance problem	12 years ago
Michael Peter Christen	6115bef335	added a 'greedy learning' mechanismn which will cause that a 'fresh' yacy will load linked web pages from search results until the total number of web pages reaches 15000. This shall give fresh peers a 'boost' to get faster a personalized search index.	12 years ago
Michael Peter Christen	f24574b3da	use s greeting line which does not sound so beta	12 years ago
Michael Peter Christen	b85db72a73	added another response writer which can present search result with texts, separated by sentences. Then, these sentences can be used to search again in the index for the same sentence. This can be used to provide a tool for plagiarism-search. (not finished yet). Try the following: http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml .. to search for 'flut' and show only sentences in the result documents which contain the word 'wasser'. Consider this like using a grep-tool on documents: you select the documents by a search query and you grep sentences inside the found documents with the 'grep' attribute.	12 years ago
Michael Peter Christen	8e965ffd16	fix for host compare in case that the host is null. This happens when doing a search in the intranet for file resources (they don't have a host).	12 years ago
orbiter	2b320313d9	replaced yacydoc servlet usage by a solr result output using an html output writer. This made the creation of a html result writer necessary which is included in this commit. The yacydoc servlet was used to present all metadata to a document, but the solr interface can serve for this purpose in a much better way. All usages (instead one) of yacydoc were replaced by a solr call. This affects also the 'metadata' link attached to search results.	12 years ago
Michael Peter Christen	f7a4377812	usage of the new normalized link polularity CRn as default ranking function. This replaces the previous formula, which was bad. Before you update to this version, please check if you changed the ranking function yourself before, since it will be overwritten.	12 years ago
Michael Peter Christen	f7e77a21bf	Added a citation reference computation for intra-domain link structures. While the values for the reference evaluation are computed, also a backlink-structure can be discovered and written to the index as well. The host browser has been extended to show such backlinks to each presented links. The host browser therefore can now show an information where an document is linked. The new citation reference is computed as likelyhood for a random click path with recursive usage of previously computed likelyhood. This process is repeated until the likelyhood converges to a specific number. This number is then normalized to a ranking value CRn, 0<=CRn<=1. The value CRn can therefore be used to rank popularity within intra-domain link structures.	12 years ago
Michael Peter Christen	e20450e798	patch in HTCache and CitationIndex loading in case that a file is broken: do not crash; instead ignore the file and delete it.	12 years ago
reger	d367b1f4d9	add null pointer check to stopword fix	12 years ago
reger	7480e87386	- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247 - append language setting specific stopword list - remove unused OVERHANG stack type	12 years ago
Michael Peter Christen	9fc0c4df98	fix for bad exists 'enhancement'; see bug: http://bugs.yacy.net/view.php?id=245	12 years ago
reger	9ef1fd9bac	fix: enable use of solrcore.properties for property substitution of solrconfig.xml	12 years ago
reger	8a7fcb391d	enable use of solrcore.properties for property substitution of solrconfig.xml - move setting of system property solr.directoryFactory=solr.MMapDirectoryFactory to solrcore.properties - add check of os.arch for 64bit system, if it fails use default/solrcore.x86.properties (if exists) as solrcore.properties reason: on 32bit MMapDirectoryFactory may fail with..... Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)	12 years ago
Michael Peter Christen	f7e887bf49	added missing class	12 years ago
Michael Peter Christen	5f92c68f1f	removed block rank ranking and all YBR files in /ranking	12 years ago
Michael Peter Christen	164603b946	cleanup	12 years ago
Michael Peter Christen	ba793a32c0	added timeout for remote searches of 10 seconds	12 years ago
Michael Peter Christen	1c4c1c0345	try to commit in case of failure which hopefully frees up some RAM	12 years ago
Michael Peter Christen	409d6edf53	Store node/solr search threads to be able to send them an interrupt signal in case that a cleanup process wants to remove the search process. Added also a new cleanup process which can reduce the number of stored searches to a specific number which can be higher or lower according to the remaining RAM. The cleanup process is called every time a search ist started.	12 years ago
Michael Peter Christen	2a8b99ea82	remove text_t in search result after snippet has been computed to save space in search result cache	12 years ago
Michael Peter Christen	a1644ca0fd	new workflow processor in Segment to enqueue indexing documents to solr	12 years ago
Michael Peter Christen	a8dc4346e8	default configuration of MMapDirectoryFactory for solr, increased lock timeout, less documents from remote searches (too many results had easily blocked a peer)	12 years ago
Michael Peter Christen	0c1a018bbd	removed 'later' tactic because it used too much RAM, reduced number of soft commits, reduced caching size of search events, ensured that solr results are processed before connection is closed to keep that stuff not too long in RAM	12 years ago
Michael Peter Christen	5344a1c5f7	getting the trash out	12 years ago
Michael Peter Christen	709e9b8ce7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	1eb9626cca	less logging	12 years ago
Michael Peter Christen	281959a2d7	added option to re-boot the embedded solr during run-time. Added also API recording for this method so it can be repeated automatically. The index dump generation is now also available for API recording. Added some synchronization in backend which was necessary for this.	12 years ago
orbiter	da621e827e	prevent NPE in case RWI is disabled	12 years ago
Michael Peter Christen	c2bcfd8afb	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	67757b425a	use a retry handler with retryCount=0 because we usually expect requests to fail if we access non-permanently available resources (peers, web pages) and want to fail fast without repeating the same request which is doomed to fail. The previous appearance of http client connection had a 1-2-4-8-second timeout scheme, which caused that connection attempts lasted for 16 seconds.	12 years ago
Michael Peter Christen	c2b1075dcf	activating pollImmediately in case that DHT receive is off. This will cause a much faster search result when running in public robinson mode.	12 years ago
orbiter	888a985dc6	set a higher limit for table copy usage	12 years ago
Michael Peter Christen	2b563debbf	javadoc of new multiple-exist test	12 years ago
Marc Nause	8fb1b1e290	*) simplified banner creation code	12 years ago
Michael Peter Christen	8f2d3ce2f9	reduced locking situation in crawler: shifted synchronized location and reduced time-out of robots.txt load limit	12 years ago
reger	97ab5b90e8	- odt & ooxml (office document) parser correction to add content to fulltext index - adjust Junit yacyVersionTest & ParserTest - update yacyVersion.combined2prettyVersion to the default 4-digit minor ver.	12 years ago
Michael Peter Christen	b68fbe7d21	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/migration.java	12 years ago
Michael Peter Christen	06d3063dc9	- no downcase when using collection modifier - removed warnings	12 years ago
Michael Peter Christen	8dbc80da70	redesign of index.exist-test: this shall now not be done using a single id to be tested, but with a collection of ids. This will cause only a single call to solr instead of many. The result is a much better performace when testing the existence of many urls. The effect should cause very much less IO during index transmission, both on sender and receiver side.	12 years ago
reger	7f63d3747d	more generic field selection for reindex option of documents with disabled fields using Luke request to compare config with actual fields in index	12 years ago
Michael Peter Christen	c91c67c3cd	reject bad solr requests	12 years ago
Michael Peter Christen	44e363f37f	refactoring of WorkflowProcessor, added process counter, update of process counter if an blocking thread dies. Added also a new column in PerformanceConcurrency_p servlet to show the actual number of concurrent processes.	12 years ago
Michael Peter Christen	4058369288	fixed query expressions for collection selection (added quotes)	12 years ago
Michael Peter Christen	f2e36fbd06	enhanced deletion process for very large number of documents	12 years ago
reger	79401cb938	added reindex option for documents with disabled or obsolete fields to Solr Schema Editor page (IndexSchema_p.html) this allows to remove obsolete fields from the index (according to current schema config) by selecting all documents containig disabled fields.	12 years ago
orbiter	cf36c1614f	prevent that concurrent deletion process causes wrong double-check in crawl start	12 years ago
orbiter	aeff31cd44	fix for workflow processor (cause: latest redesign for less threads)	12 years ago
Michael Peter Christen	77faeada4d	small memory leak patch	12 years ago
Michael Peter Christen	b24d1d18e4	removed synchronization and concurrency in Fulltext class, concurrent deletions are now handled in ConcurrentUpdateSolrConnector	12 years ago
Michael Peter Christen	b9b446bca6	- added ssl configuration sign (a lock) to network statistic/table - fixed a bug in bitfield	12 years ago
Michael Peter Christen	e6c8b545c2	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	a83c2fe833	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	4baa0d4a97	Added a default keystore for ssl encryption of the YaCy web interface. This will enable https-access to YaCy, but this feature is disabled by default using the new server.https=false attribute. This has two purposes: - make it easier for everyone to use https (just set server.https=true) - provide the basis for secure yacy-to-yacy communication in the future	12 years ago
Michael Peter Christen	aaddb4809c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	038f956821	fix for sitemap detection: the sitemap url was not visible if it appeared after the declaration of robots allow/deny for the crawler because the sitemap parser terminated after the allow/deny rules had been found. Now the parser reads the robots.txt until the end to discover also sitemap rules at the end of the file.	12 years ago
reger	4fc6837690	- fix monitor url of crawl job in PerformanceQueues_p.html - reduce logging of every index add (switch embeddedsolr.add from info to debug)	12 years ago
Michael Peter Christen	442ed50be0	removed some unnecessary synchronizations	12 years ago
Michael Peter Christen	ad050ec88d	- upgraded httpclient, httpcore and httpmime - removed httpclient 3.1 which has been used by solrj < 4.x.x and is now not used any more - fixed some parts in YaCy which used methods from httpclient 3.1	12 years ago
orbiter	a1c989002b	fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4652 generate dht data even if dht receive and dht transmission is switched off	12 years ago
Michael Peter Christen	e26bdd4a52	fixes to deletion methods (removed unnecessary concurrency and added removal of crawl queue entries)	12 years ago
Michael Peter Christen	f2c9b0b5f2	better robustness of Concurrent Solr Connector against update/deletion thread failure	12 years ago
Michael Peter Christen	f7f3e28c5e	prevent that the size of the index is computed too many times. Because the index size is now provided by solr, and the only way to do that is a match for [* TO *], a size computation is quite complex and time-consuming. Therefore this patch prevents that the method is called at all and if necessary puts a DOS-preventing barrier in front of it.	12 years ago
Michael Peter Christen	cca19d94d4	re-declared some fields to be of type string rather than text which makes them more efficient and less large	12 years ago
Michael Peter Christen	cc90f82dbb	increased default proxy client timeout to one minute	12 years ago
Michael Peter Christen	ed1d5bace6	draw the names of other peers which receive/send dht into the network graphic	12 years ago
Michael Peter Christen	b528448332	enlarge network graph circle according to image height and reduce the image height in the Network servlet. Overall, the image is now larger but takes less space on the web page.	12 years ago
reger	24d2b4baee	remove pre 1.0 migration statement which possibly overwrites user navigator setting	12 years ago
Michael Peter Christen	3841854c97	abstraction of catchall term	12 years ago
Michael Peter Christen	ea85674be2	added the date to error documents	12 years ago
Michael Peter Christen	6fafed2180	fix for solr cache when a delete buffer is filled and a document, which is the delete queue, is replaced with a new one.	12 years ago
Michael Peter Christen	20b767f35e	preventing score computation in solr where applicable	12 years ago
orbiter	7de5b9cfa0	fix for http://bugs.yacy.net/view.php?id=233 - check geolocation coordinates and accept only those, which are well-formed - the solr push process does not stop crawling any more if after 20 requests to Solr Solr does not accept the record. Instead, a severe log entry asks the user to create a bug request	12 years ago
Michael Peter Christen	ee217dbdee	remove sort order in all cases where not needed	12 years ago
Michael Peter Christen	70e981b333	prevent that long-running deletion tasks block a hard commit.	12 years ago
Michael Peter Christen	bb4bf3d8fd	infinity timeout bug protection patch	12 years ago
Michael Peter Christen	1b102d98d8	- added index deletion to index administration submenu - added index deletion processes to the process scheduler/recorder	12 years ago
Michael Peter Christen	d1be4127e7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	1aac722cc6	added another solr connector, the ConcurrentUpdateSolrConnector which does not block when long-running updates to solr are made. This is realized using blocking queues which process all long-running tasks in the background. Also some bugfixes to existing connectors.	12 years ago
Michael Peter Christen	0af7803367	added more features to ScoreMap (pretty toString)	12 years ago
Michael Peter Christen	f36a7da5f6	- re-introduced existById in solr connector. - intruduced raw-queries for the re-introduced byId-Queries (they are hopefully faster than full edismax queries) - removed the cached solr connector (testing this) to rely only on the solr built-in search caches. That should save some RAM (also). We will see if this is usable.	12 years ago
reger	46fa800bc7	added httpstatus_i to automatically switched on fields (used in all search queries)	12 years ago
Michael Peter Christen	3502b4c697	refactoring (renaming) of yacy-solr api	12 years ago
Michael Peter Christen	3a0fcfbeda	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	25499eead5	- added a new field for the regular expression in crawl start - added the field in crawl profile - adopted logging end error management - adopted duplicate document detection - added a new rule to the indexing process to reject non-matching content - full redesign of the expert crawl start servlet The new filter field can now be seen in /CrawlStartExpert_p.html at Section "Document Filter", subsection item "Filter on Content of Document"	12 years ago
orbiter	e1bfe9d07a	- reduction of the concurrently running processes to make YaCy more adjusted to smaller and 1-core devices. - the workflow processor now starts no process at all. these are started as soon as parser/condenser/indexing queues are filled. - better abstraction	12 years ago
Michael Peter Christen	c091000165	added collection attribute also to the rss feed reader	12 years ago
orbiter	f7571386a3	added a 'collection' property attribute in yacysearch.html which can be used to select between different collections as defined during a crawl start with the 'collection' attribute. This actually implements the ability to prepare search tenants which restrict their search results to a specific collection. The main use for this is to provide tenants to the yaml4 interface (at this time).	12 years ago
orbiter	3e79bd4b1f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	d937c55204	extended limitation of dom export size from 100000 to 100000000	12 years ago
Michael Peter Christen	fc2095ac67	some extensions to raster plotter to transform a RGB picture to an indexed color scheme. This is needed for gif animations	12 years ago
Michael Peter Christen	c1a2175fbc	added transparency to gif image animation and the integration to the YaCy httpd for on-the-fly generated gifs (including animated gifs)	12 years ago
orbiter	5d442dad82	avoid NPE in regex checker	12 years ago
Michael Peter Christen	50421171c3	added new schema fields: hreflang_url_sxt and hreflang_cc_sxt for http://support.google.com/webmasters/bin/answer.py?hl=de&answer=189077 navigation_url_sxt and navigation_type_sxt for http://googlewebmastercentral.blogspot.de/2011/09/pagination-with-relnext-and-relprev.html publisher_url_s for http://support.google.com/plus/answer/1713826?hl=de all fields are disabled by default and not written to the index.	12 years ago
Michael Peter Christen	566d6c980c	checking of document signature for a double-document check now refers only to documents within the same domain	12 years ago
Michael Peter Christen	1d30082446	added hindi translation configuration	12 years ago
Michael Peter Christen	d05dc07cff	setting of new default values for ranking	12 years ago
Michael Peter Christen	97775fbebc	fixed ranking for add-function queries: this did not work. The option was removed. All function queries are now boosts (multiplies the score according to a function). This is also the recommended way to boost rankings based on functions as explained in http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/	12 years ago
Michael Peter Christen	ac5fa9fe48	fix for result counter logging	12 years ago
Michael Peter Christen	7ab5093321	added new solr title_exact_signature_l and description_exact_signature_l to be able to identify unique title and unique description fields.	12 years ago
Michael Peter Christen	f24ac518e6	redesign of exists()-query (can now be called with query) and the CachedSolrConnector which based its cache on the key value. This will be used to correct the title_unique_b and description_unique_b field.	12 years ago
Michael Peter Christen	27d6222880	added new field host_extent_i which, after a crawl and postprocessing, holds the number of documents for the host where the document is hosted. This is necessary for ranking and the norming of references per local host in the ranking computation.	12 years ago
reger	518b20147c	skip postprocessing during document.store if no citation index connected (prevent null pointer exception)	12 years ago
Marc Nause	ac478384d3	*) did some long overdue refactoring	12 years ago
Michael Peter Christen	ada3f27de7	added three new field for a better ranking: references_internal_i, references_external_i and references_exthosts_i. These can be used to count and evaluate the number of external links to every web page. An experimental ranking function can be i.e.: div(add(references_internal_i,product(references_external_i,references_exthosts_i)),add(clickdepth_i,1))	12 years ago
Michael Peter Christen	082e3274d6	- setting the same default ranking in the solr interface as for YaCy search interfaces if no other ranking attributes are given - using the YaCy ranking in the GSA interface only if there was not given a GSA-style sort attribute - to avoid confusion about correct ranking attributes, only the default '0'-ranking profile is used and not scenario-adopted (site, date) because that should be configurable in the web interface before it is used actually for ranking.	12 years ago
Michael Peter Christen	a20941c067	resume paused crawls on startup; user expects that restarts 'heal' everything	12 years ago
Michael Peter Christen	edc0b33f6d	- showing references count and clickdepth in host browser - fixed generation and presentation of both values	12 years ago
reger	566a3b0294	fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search - removed duplicate QueryParams.hashes2Handles , redundant with .hashes2Set	12 years ago
Michael Peter Christen	cf0acd2cb4	upgrade to solr 4.2.1	12 years ago
reger	e89491271f	- fix opensearch discover err msg - webgraph not enabled - if no opensearchdescription link found in index - remove search2.net from sample config (is down)	12 years ago
reger	6a9d0b60a3	make sure configured port is reported on recreated mySeed.txt	12 years ago
Michael Peter Christen	870aedf3c6	fixes for better search interface integration in yaml templates	12 years ago
Michael Peter Christen	5512be6673	fix in GSA result writer which evaluates result context fields as String. After the migration to Solr 4.1.0 'some' of these fields suddenly are stored as String[]; this patch compensates this confusion.	12 years ago
Michael Peter Christen	342ba1049b	- callback fix - memory allocation problem in RowCollection: if memory is too low, do not to try to increase by 1 because this leads to very long execution time and at the end to the same OOM as if we allocate the memory at the moment we need it even if the resource observer states that this memory is not there. To compensate this, the increase size is reduced.	12 years ago
orbiter	65d73e5652	renamed callback function to 'callback' because that is a standard for jsonp which is also used in backbone.js/jquery	12 years ago
orbiter	17ae51e741	increased number of links limitation from 1000 to 10000 for rss feeds and html documents	12 years ago
orbiter	e4d26d1cb4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	940c6849ee	enhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements)	12 years ago
reger	d57b221921	add: reset Solr schema filed selection to default button in IndexSchema_p	12 years ago
Michael Peter Christen	9406a2e438	fixed NPE during index abstract computation	12 years ago
Michael Peter Christen	16e9d4d1dd	added a restart hint	12 years ago

... 2 3 4 5 6 ...

6575 Commits (daebeb93aa9ffa850b8af445246e132cb73e398b)