yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	76afcccaaf	fix for default boolean post values: the default value MUST NOT be TRUE, because it's normal that a boolean value is missing in the post argument if a checkbox is not selected. Added also some style enhancements to IndexFederated, removed the Solr attachment manual and replaced it with a link to the wiki which explains this in more detail.	11 years ago
orbiter	252c525709	fixed feed api servlet and and enhanced RSSReader class	11 years ago
Marc Nause	112836dcc9	Improved external links. ) image links will not be marked (if they have class "yacylogo" or "forceNoExternalIcon") ) external links in menu on left (and "fork me"-banner) will open in new tab/window now	11 years ago
Marc Nause	d64a094f0e	External links in HTML interface are marked as external with small icon. ) added new icon ) added CSS rules to mark all external links except search results (target="_self")	11 years ago
Michael Peter Christen	58fe986cca	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	cf12835f20	replaced the single-text description solr field with a multi-value description_txt text field	11 years ago
sixcooler	7d53ac86a3	fix for Blacklist (-Administration)	11 years ago
orbiter	f425b2c61c	re-try to fetch url after a soft commit	11 years ago
orbiter	bf0ad04e1b	apply load limitation also to dht-in	11 years ago
Roland Haeder	b58ca8622d	Some cleanups: - added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added - Added 'final' keyword to a string	11 years ago
Roland Haeder	e2ee412160	Use SwitchboardConstants.LISTS_PATH_DEFAULT instead of 'DATA/LISTS' Conflicts: htroot/api/blacklists_p.java	11 years ago
Roland Haeder	ae19401af0	Removed another duplicate occurance of Blacklist.BLACKLIST_FILENAME_FILTER	11 years ago
Roland Haeder	59225487ea	Fix for blacklist export, also applied the filename filter here	11 years ago
Roland Haeder	952fc0e7bd	Removed superfluous check for files ending '.black' as the previous commit already excluded all other files (e.g. .ser dumps), added logging in catch-all block	11 years ago
Roland Haeder	060fec1577	Reuse Blacklist.BLACKLIST_FILENAME_FILTER	11 years ago
Roland Haeder	29049c71f5	Possible fix for ticket http://bugs.yacy.net/view.php?id=270 , the filter for only including *.black must be applied	11 years ago
Michael Peter Christen	4c242f9af9	always use a default value for boolean options to have transparency for the outcome if the attribute is missing in servlets	11 years ago
orbiter	9c681cc00d	added segment sizes, postprocessing status and cpu load to crawler monitor	11 years ago
orbiter	86b514cf46	added load info to status_p.xml	11 years ago
orbiter	056b42f5aa	- added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end	11 years ago
orbiter	6fb2811e68	fixes for problems with remote solr and non-activated webgraph index	11 years ago
orbiter	e24016e30a	added the property federated.service.solr.indexing.timeout to yacy.init to provide a configurable time-out for solr; see also: http://bugs.yacy.net/view.php?id=254	11 years ago
orbiter	232100301c	removed double-ocurring value assignments	11 years ago
Roland Haeder	aaedc0405d	Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Conflicts: source/net/yacy/repository/LoaderDispatcher.java	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Felix Ableitner	376f9cd9d0	Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure	11 years ago
Michael Peter Christen	89c0aa0e74	added collection_sxt to error documents	11 years ago
Michael Peter Christen	0df5195cb0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	1fd006cc56	fixes using the embedded connector	11 years ago
orbiter	aba7cc5de7	added cpu load information to status page	11 years ago
Roland Haeder	59b4fdd5ad	Merge remote-tracking branch 'upstream/master'	12 years ago
orbiter	5493389576	stealth mode shall only be available for authorized users, because unauthorized users can otherwise be monitored by authorized users	12 years ago
Roland Haeder	ebbb3bc5c1	Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet	12 years ago
Michael Peter Christen	bcc623a843	refactoring of load_delay: this is a matter of client identification	12 years ago
orbiter	2be456e7fb	added a postprocessing field into api/status_p.xml to show if the postprocessing task is running at that time (status: busy) or not (status:idle)	12 years ago
orbiter	575f913154	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	c4efb612e2	added list of crawls to status_p.xml	12 years ago
Lotus	bb6caa346c	Do not allow automatic update in case YaCy is installed to the Program Files folder on Windows. There are no permissions to write that folder and update would fail.	12 years ago
orbiter	dac88561ae	minimum access time has a tight connection to ClientIdentification, therefore it is defined there.	12 years ago
Felix Ableitner	a020697d64	Fixed problems with blacklist entry insertion.	12 years ago
sixcooler	bff8c753c6	re-insert this file - was deleted by mistake + correct an other case-typo	12 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	c79f687110	enhanced the network scanner: find more hosts automatically by removal of common subdomains before application of protocol-specific prefix	12 years ago
orbiter	b4677d1cad	fix for bug #252 the naming of the servlet was wrong, the bug may not be present on systems where upper/lowercase matching is lazy (windows)	12 years ago
Michael Peter Christen	07261fe274	Merge remote-tracking branch 'nutomics/blacklist_structure'	12 years ago
Michael Peter Christen	dea71851d2	- better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index	12 years ago
orbiter	9f0cc9b401	enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains	12 years ago
orbiter	f8c28efd66	fix for rssTerminal coloring	12 years ago
Felix Ableitner	44f8fcf62e	Changed class structure of Blacklist.	12 years ago
Michael Peter Christen	3054a6d4b9	added a patch from Sebastian M.B., submitted by email for coloring of rss terminal	12 years ago
Michael Peter Christen	78af998f8f	Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594'	12 years ago
Michael Peter Christen	57ffdfad4c	added a crawl option to obey html-meta-robots-noindex. This is on by default.	12 years ago
Felix Ableitner	fd90fcc4e0	Fixes #196 .	12 years ago
Michael Peter Christen	f1c5338210	prepartion for greedy crawl profiles and refactoring	12 years ago
Michael Peter Christen	e6f361f474	adding the canonical tag to crawl queues	12 years ago
Michael Peter Christen	203921006a	redesign of citation index storage	12 years ago
Michael Peter Christen	e92b9275ce	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	56cdcfa2fa	fixed greedy learning mode - global is not a search attribute in searchitems	12 years ago
Michael Peter Christen	32aa1d4569	removed unused option for queries	12 years ago
Michael Peter Christen	0c5bed7e2c	added configuration option for greedy learning function to ConfigPortal servlet	12 years ago
sixcooler	5d1f619f07	possible helpful closing of solr-requests	12 years ago
Michael Peter Christen	9d291764d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
sixcooler	e5abccdfe4	added optimize-option	12 years ago
Michael Peter Christen	8ea6ddf636	removed attributes from ConfigPortal.html which are redundant to ConfigSearchPage_p.html	12 years ago
Michael Peter Christen	64140f35cd	fix for solr requests if no query part is given (prevent npe)	12 years ago
Michael Peter Christen	23fb458963	- fix to gsa searchresult answer in case that no query part is given - fix to gsa default number of results (is 'num')	12 years ago
Michael Peter Christen	660a196989	refactoring	12 years ago
Michael Peter Christen	54024958ac	added url_file_name_s in qeury for live-search of urls	12 years ago
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	12 years ago
Michael Peter Christen	f542cf7d9c	fix for daterange: the to-date is inclusive	12 years ago
Michael Peter Christen	c36720d45f	added daterange option to gsa api	12 years ago
Michael Peter Christen	4e3007f4a0	typo	12 years ago
Michael Peter Christen	2cb6b6bc21	added target="_blank" to shutdown links	12 years ago
orbiter	c8e94ad7c7	fix for citation search in case that the citation is very fresh	12 years ago
orbiter	57dcf68665	added a feed-back message inside the shutdown page	12 years ago
Michael Peter Christen	0600d510e1	show the citation report also in ViewFile	12 years ago
Michael Peter Christen	1a92b61d69	fixed usage of ViewFile which needs a commit before showing latest crawl result pages.	12 years ago
Michael Peter Christen	570511f3c8	removed fields references_internal_id_sxt and references_internal_url_sxt because they had been shown to be superfluous. The citation of referrer in the host browser is possible without them. Therefore now the host browser does not only show internal, but also external referrer to each link.	12 years ago
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	12 years ago
Michael Peter Christen	1762911f57	added synchronizations and timeouts in solr api; missing synchronizations in index modification methods causes deadlocks inside solr.	12 years ago
Michael Peter Christen	2fd7bbb450	reduced load on solr; no seed update in Status and no exists-check in HTTPLoader in case of redirects, that can be done using the htcache.	12 years ago
Michael Peter Christen	7ee71c2354	changed administration page headline to 'admnistration'	12 years ago
Michael Peter Christen	efd973d29d	changed p2p/stealth mode text and links a bit	12 years ago
Michael Peter Christen	6115bef335	added a 'greedy learning' mechanismn which will cause that a 'fresh' yacy will load linked web pages from search results until the total number of web pages reaches 15000. This shall give fresh peers a 'boost' to get faster a personalized search index.	12 years ago
Michael Peter Christen	a5e328d7c5	new icons	12 years ago
Michael Peter Christen	b85db72a73	added another response writer which can present search result with texts, separated by sentences. Then, these sentences can be used to search again in the index for the same sentence. This can be used to provide a tool for plagiarism-search. (not finished yet). Try the following: http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml .. to search for 'flut' and show only sentences in the result documents which contain the word 'wasser'. Consider this like using a grep-tool on documents: you select the documents by a search query and you grep sentences inside the found documents with the 'grep' attribute.	12 years ago
Michael Peter Christen	5132bf719c	added new buttons to search result page in p2p mode which show the switch between p2p search and the 'stealth mode' which is simply a non-p2p search within the p2p network. The functionality was there all the time, but the switch to this was not very visible.	12 years ago
orbiter	2b320313d9	replaced yacydoc servlet usage by a solr result output using an html output writer. This made the creation of a html result writer necessary which is included in this commit. The yacydoc servlet was used to present all metadata to a document, but the solr interface can serve for this purpose in a much better way. All usages (instead one) of yacydoc were replaced by a solr call. This affects also the 'metadata' link attached to search results.	12 years ago
orbiter	200769d0c6	show the cache link in search results only if there is actually a cache entry stored in HTCACHE	12 years ago
Michael Peter Christen	f7e77a21bf	Added a citation reference computation for intra-domain link structures. While the values for the reference evaluation are computed, also a backlink-structure can be discovered and written to the index as well. The host browser has been extended to show such backlinks to each presented links. The host browser therefore can now show an information where an document is linked. The new citation reference is computed as likelyhood for a random click path with recursive usage of previously computed likelyhood. This process is repeated until the likelyhood converges to a specific number. This number is then normalized to a ranking value CRn, 0<=CRn<=1. The value CRn can therefore be used to rank popularity within intra-domain link structures.	12 years ago
Michael Peter Christen	fdcd4e6a6f	fixes to index deletion: quoting of host name (a '-' may be part of the url) and disabling the engage button when changing the url field at 'Delete by URL matching'	12 years ago
reger	7480e87386	- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247 - append language setting specific stopword list - remove unused OVERHANG stack type	12 years ago
orbiter	5c7ddc67fe	in GSA api enable usage of solr fq-attribute together with GSA site-attribute	12 years ago
Michael Peter Christen	eb9d0ba5b1	ranking and boost function update, small bugfixes, better default search field for solr	12 years ago
Michael Peter Christen	5f92c68f1f	removed block rank ranking and all YBR files in /ranking	12 years ago
Michael Peter Christen	164603b946	cleanup	12 years ago
Michael Peter Christen	0c1a018bbd	removed 'later' tactic because it used too much RAM, reduced number of soft commits, reduced caching size of search events, ensured that solr results are processed before connection is closed to keep that stuff not too long in RAM	12 years ago
Michael Peter Christen	709e9b8ce7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	9e07447d47	added new link for SMW	12 years ago
Michael Peter Christen	3c04dd11de	removed dead link	12 years ago
Michael Peter Christen	281959a2d7	added option to re-boot the embedded solr during run-time. Added also API recording for this method so it can be repeated automatically. The index dump generation is now also available for API recording. Added some synchronization in backend which was necessary for this.	12 years ago
Michael Peter Christen	80a7989e8c	fixed ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.util.List; in robots.txt servlet	12 years ago
orbiter	da621e827e	prevent NPE in case RWI is disabled	12 years ago
Michael Peter Christen	7300d81f40	include API Table deletion requests to the API recorder	12 years ago
Michael Peter Christen	d2ade87b49	fixed missing thisaddress in yacysearch.html which caused that the opensearch link was not working	12 years ago
Michael Peter Christen	179d032181	added a (badly formatted) delete button for process scheduler entries	12 years ago
reger	c03f75ebc3	fix DHT url receive see http://bugs.yacy.net/view.php?id=242	12 years ago
Marc Nause	8fb1b1e290	*) simplified banner creation code	12 years ago
Marc Nause	cd0b5f31b4	*) updated links to description of regex	12 years ago
Michael Peter Christen	8f2d3ce2f9	reduced locking situation in crawler: shifted synchronized location and reduced time-out of robots.txt load limit	12 years ago
Michael Peter Christen	f93501e6e0	nice crawl name if crawl is started with file:// (was: null)	12 years ago
Michael Peter Christen	b4f0cac102	added the reindexing job servlet to the submenu structure	12 years ago
Michael Peter Christen	8dbc80da70	redesign of index.exist-test: this shall now not be done using a single id to be tested, but with a collection of ids. This will cause only a single call to solr instead of many. The result is a much better performace when testing the existence of many urls. The effect should cause very much less IO during index transmission, both on sender and receiver side.	12 years ago
Michael Peter Christen	c91c67c3cd	reject bad solr requests	12 years ago
Michael Peter Christen	44e363f37f	refactoring of WorkflowProcessor, added process counter, update of process counter if an blocking thread dies. Added also a new column in PerformanceConcurrency_p servlet to show the actual number of concurrent processes.	12 years ago
reger	79401cb938	added reindex option for documents with disabled or obsolete fields to Solr Schema Editor page (IndexSchema_p.html) this allows to remove obsolete fields from the index (according to current schema config) by selecting all documents containig disabled fields.	12 years ago
Michael Peter Christen	b24d1d18e4	removed synchronization and concurrency in Fulltext class, concurrent deletions are now handled in ConcurrentUpdateSolrConnector	12 years ago
Michael Peter Christen	f965d04496	added new peer icons for Mentor peers and Mentee peers (not used yet)	12 years ago
Michael Peter Christen	b9b446bca6	- added ssl configuration sign (a lock) to network statistic/table - fixed a bug in bitfield	12 years ago
Michael Peter Christen	7095446ad3	added checkbox (near port) to switch on ssl support (https access) to the admin interface.	12 years ago
Michael Peter Christen	e6c8b545c2	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	4baa0d4a97	Added a default keystore for ssl encryption of the YaCy web interface. This will enable https-access to YaCy, but this feature is disabled by default using the new server.https=false attribute. This has two purposes: - make it easier for everyone to use https (just set server.https=true) - provide the basis for secure yacy-to-yacy communication in the future	12 years ago
Michael Peter Christen	038f956821	fix for sitemap detection: the sitemap url was not visible if it appeared after the declaration of robots allow/deny for the crawler because the sitemap parser terminated after the allow/deny rules had been found. Now the parser reads the robots.txt until the end to discover also sitemap rules at the end of the file.	12 years ago
Michael Peter Christen	e26bdd4a52	fixes to deletion methods (removed unnecessary concurrency and added removal of crawl queue entries)	12 years ago
Michael Peter Christen	f7f3e28c5e	prevent that the size of the index is computed too many times. Because the index size is now provided by solr, and the only way to do that is a match for [* TO *], a size computation is quite complex and time-consuming. Therefore this patch prevents that the method is called at all and if necessary puts a DOS-preventing barrier in front of it.	12 years ago
Michael Peter Christen	cca19d94d4	re-declared some fields to be of type string rather than text which makes them more efficient and less large	12 years ago
Michael Peter Christen	ed1d5bace6	draw the names of other peers which receive/send dht into the network graphic	12 years ago
Michael Peter Christen	b528448332	enlarge network graph circle according to image height and reduce the image height in the Network servlet. Overall, the image is now larger but takes less space on the web page.	12 years ago
Michael Peter Christen	f1bb54943e	typo	12 years ago
Michael Peter Christen	d7fd346917	- added regular-expression based deletions - on-demand collection-list generation for collection-based deletions instead of a default collection-list presentation (this makes calling the interface much faster since the computation of collections lists for large indexes may take some seconds)	12 years ago
Michael Peter Christen	3841854c97	abstraction of catchall term	12 years ago
sixcooler	e145afb8d6	fix for PerformanceMemory showing UNRESOLVED_PATTERN by removing solr-cache-stuff, which is not available anymore	12 years ago
Michael Peter Christen	1b102d98d8	- added index deletion to index administration submenu - added index deletion processes to the process scheduler/recorder	12 years ago
Michael Peter Christen	0e2ee00fea	added an index deletion servlet and some style changes for the 'dangerous' engage-button	12 years ago
Michael Peter Christen	e4f7e5bcfe	fixed bad css change	12 years ago
Michael Peter Christen	3502b4c697	refactoring (renaming) of yacy-solr api	12 years ago
Michael Peter Christen	3a0fcfbeda	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	25499eead5	- added a new field for the regular expression in crawl start - added the field in crawl profile - adopted logging end error management - adopted duplicate document detection - added a new rule to the indexing process to reject non-matching content - full redesign of the expert crawl start servlet The new filter field can now be seen in /CrawlStartExpert_p.html at Section "Document Filter", subsection item "Filter on Content of Document"	12 years ago
reger	0a9b0992f3	RinkingSolr_p: include warning if boost field not in local index	12 years ago
orbiter	e1bfe9d07a	- reduction of the concurrently running processes to make YaCy more adjusted to smaller and 1-core devices. - the workflow processor now starts no process at all. these are started as soon as parser/condenser/indexing queues are filled. - better abstraction	12 years ago
Michael Peter Christen	c091000165	added collection attribute also to the rss feed reader	12 years ago
orbiter	f7571386a3	added a 'collection' property attribute in yacysearch.html which can be used to select between different collections as defined during a crawl start with the 'collection' attribute. This actually implements the ability to prepare search tenants which restrict their search results to a specific collection. The main use for this is to provide tenants to the yaml4 interface (at this time).	12 years ago
orbiter	3e79bd4b1f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d571e739b6	increased row limitation for authorized users from 10000 to 100000000 in solr interface	12 years ago
Michael Peter Christen	a1fffe8e86	fixed default ranking values	12 years ago
Michael Peter Christen	1d30082446	added hindi translation configuration	12 years ago
Michael Peter Christen	97775fbebc	fixed ranking for add-function queries: this did not work. The option was removed. All function queries are now boosts (multiplies the score according to a function). This is also the recommended way to boost rankings based on functions as explained in http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/	12 years ago
Michael Peter Christen	298bf2deb5	fix to ranking configuration servlet	12 years ago
Michael Peter Christen	2db058b551	added in RankingSolr_p.html a select box to switch between different ranking situations. By default, four situations can be configured.	12 years ago
Michael Peter Christen	6fbca35215	fixed api table navigation	12 years ago
Michael Peter Christen	f24ac518e6	redesign of exists()-query (can now be called with query) and the CachedSolrConnector which based its cache on the key value. This will be used to correct the title_unique_b and description_unique_b field.	12 years ago
Michael Peter Christen	27d6222880	added new field host_extent_i which, after a crawl and postprocessing, holds the number of documents for the host where the document is hosted. This is necessary for ranking and the norming of references per local host in the ranking computation.	12 years ago
Michael Peter Christen	579eb01a49	showing now the details of references count in host browser: external (ext), internal (int) and external hosts (hosts) for each indexed document.	12 years ago
reger	0f4237d8e5	add admin option to delete load errors from index	12 years ago
Marc Nause	e99c8789ff	) fixed encoding of query in link to map (in case geolocalization is enabled, "Show search results for "köln" on map") ) applied suggestions of Checkstyle plugin	12 years ago
Michael Peter Christen	082e3274d6	- setting the same default ranking in the solr interface as for YaCy search interfaces if no other ranking attributes are given - using the YaCy ranking in the GSA interface only if there was not given a GSA-style sort attribute - to avoid confusion about correct ranking attributes, only the default '0'-ranking profile is used and not scenario-adopted (site, date) because that should be configurable in the web interface before it is used actually for ranking.	12 years ago
Michael Peter Christen	edc0b33f6d	- showing references count and clickdepth in host browser - fixed generation and presentation of both values	12 years ago
orbiter	2c3b024196	if the crawl was paused (automatically), show the reason for pausing in the Crawler_p servlet.	12 years ago
reger	566a3b0294	fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search - removed duplicate QueryParams.hashes2Handles , redundant with .hashes2Set	12 years ago
reger	40b3f2c5fe	comment out dead menue link	12 years ago
reger	bf1e1ddca1	fix typo in prev commit	12 years ago
reger	d4d93be779	uncomment "used time" calculation for remote search log	12 years ago
reger	36202f27b0	improve remote search log, set "Returned Results" to transmitcount (instead of no value)	12 years ago
reger	254074b11d	Merge branch 'master' of git://gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	870aedf3c6	fixes for better search interface integration in yaml templates	12 years ago
Michael Peter Christen	735eb70525	better search timing; prevents '0 results' for very large local indexes >> 10 mio documents	12 years ago
Michael Peter Christen	342ba1049b	- callback fix - memory allocation problem in RowCollection: if memory is too low, do not to try to increase by 1 because this leads to very long execution time and at the end to the same OOM as if we allocate the memory at the moment we need it even if the resource observer states that this memory is not there. To compensate this, the increase size is reduced.	12 years ago
reger	31d16f20d7	fix invisible icon not found	12 years ago
orbiter	243b66ae6d	Merge branch 'master' of git://gitorious.org/~frankensteen91/yacy/frankensteen91s-yacy	12 years ago
Frank	7763f2554f	add the new PPMbar in Crawler_p for a better style and better use.	12 years ago
orbiter	e4d26d1cb4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	940c6849ee	enhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements)	12 years ago
reger	d57b221921	add: reset Solr schema filed selection to default button in IndexSchema_p	12 years ago
Michael Peter Christen	9406a2e438	fixed NPE during index abstract computation	12 years ago
Michael Peter Christen	d725782440	turned severe message to warning message about network failure events	12 years ago
Michael Peter Christen	2d36a7eaf5	- do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries	12 years ago
Michael Peter Christen	2080fc7406	removed unused tag fields	12 years ago
reger	7804c12976	fix error msg in ConfigHeuristics_p	12 years ago
reger	230a12bfe2	adjust Opensearch discover function to new webgraph Solr schema	12 years ago
orbiter	47114910d5	fix for possible memory leaks	12 years ago
Michael Peter Christen	addba047e2	changes in ranking computation - an existing ranking servlet for solr was extended. It is now possible to set boost values for fields, boost functions and boost queries. - The ranking can have different instances, but currently only the first one is used - added an abstraction layer for fields which can be used for search and those fields can be edited in the solr ranking configruation - the ranking value from solr within the field score is used to combine remote search requests, which all are created using the same locally defined boost values - reduced the number of fields which are used for search (makes it faster) - replaced some text fields by string fields (makes indexing faster) - removed classes which had no use - made a large number of experiments for a better ranking and created a temporary setting which prefers hits inside titles - adjusted also the RWI-based ranking computation to 'prefer title' - made special cases like for portal search where no post-processing and post-ranking is wanted: this keeps the original ranking order as done by Solr - fixed many bugs with old settings for ranking	12 years ago
Michael Peter Christen	68e739a90b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	3d9ce9cd04	- added more selection criteria for network seed list - enhanced up script	12 years ago
orbiter	168e8d9b4d	added/fixed missing DOCTYPE line (submitted by Thomas)	12 years ago
Michael Peter Christen	25300913fa	fixes to search debugging after testing with the different search debugging options	12 years ago
Michael Peter Christen	2d472a39f4	DHT-transferred metadata and crawl receipts now also use the delayed search cache to prevent that too much IO load is on the peer during search.	12 years ago
Michael Peter Christen	221ed7d764	- enhanced concurrency during search without IO blocking - introduced a second queue to flush remote search results (now: old metadata structure from DHT peers) - fixed result counters	12 years ago
Marc Nause	2714b59f38	*) For some reason this seems to fix a ClassCastException on my system (OpenJDK).	12 years ago
orbiter	0f7ea7ad9f	- enhanced solr.add procedure for mass adds - removed unused solr access classes - made snippet generation for documents aus YaCy RWI/DHT concurrent (as it was before the search process removation) - reduced the number of remote results in settings file because the processing of such mass documents add is too CPU-intensive (in Solr)	12 years ago
orbiter	7ff10bdb1b	fix of page navigation for formatted totalcount numbers	12 years ago
orbiter	a734fbc4a5	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d74472f562	corrected result counter	12 years ago
orbiter	aa3c26c62e	added recrawl/reload to CrawlStartSite for a timeout of 3 days	12 years ago
orbiter	c1b7e61882	added option to create empty vocabularies	12 years ago
bubu	e0edad689d	fix link to IndexSchema_p.html	12 years ago
Michael Peter Christen	c95a84103a	complete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter	12 years ago
Michael Peter Christen	35fa718b77	testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js	12 years ago
Michael Peter Christen	008288719c	fix for schema export to consider also automatically generated coordinate fields	12 years ago
Michael Peter Christen	089dee1770	- generalized SchemaConfiguration into super-class Configuration and adopted other classes which used the configuration-only access for that class - removed many warnings - adjusted logging	12 years ago
Michael Peter Christen	56d5946a59	- added flags in IndexFederated_p.html to switch on or off the webgraph index (new solr core webgraph) .. this is now off by default - completely redesigned this servlet - added description how to attach a remote solr - adjusted naming of servlet and menues - moved 'lazy initialization' attribut from IndexSchema to IndexFederated (this is a general option) back again.	12 years ago
Michael Peter Christen	14cceb6b17	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/IndexFederated_p.html source/net/yacy/cora/federate/solr/YaCySchema.java source/net/yacy/peers/Protocol.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/Segment.java also moved portalsearch-dev to yacy-portalsearch to be able to fix problems with new attachment to solr of the search widget	12 years ago
Michael Peter Christen	58e1e6fa2b	fixes to schema	12 years ago
reger	d31a109efe	remove obsolete Solr "commit within" input field from IndexFederated see `4111606654`	12 years ago
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	12 years ago
Michael Peter Christen	89ede0fe84	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	91a0401d59	introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema	12 years ago
orbiter	594ed63f2a	fixed interactive search which caused an error if pubDate is not present in a search result	12 years ago
Michael Peter Christen	98a4a4aa97	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	b6de1f42dc	Full redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet.	12 years ago
Marc Nause	efb6cf7d21	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	12 years ago
Marc Nause	ce5b7afab2	) removed Skype online indicator (was not working anymore) ) updated ICQ URLs	12 years ago
Michael Peter Christen	4111606654	removed the commitWithin attribute because that is not the way how the index is updated the right way for us. May also be be superfluous with the solr 4.0 softcommit.	12 years ago
Michael Peter Christen	c20fa3640d	fix to unbalanced tag and license for null objects	12 years ago
Michael Peter Christen	3a6097966d	added jsonp option to yjson result writer	12 years ago
Michael Peter Christen	de58043205	Added image license generation for solr image search results when results are generated within yjson result writer. This makes it possible to view images in yacyinteractive from solr.	12 years ago
Michael Peter Christen	d3508fa8ff	fixed json search, quotes, auto-facets, urls etc. for yacyinteractive.html	12 years ago
Michael Peter Christen	02fa31b5bf	better filesearch layout	12 years ago
Michael Peter Christen	e55ec3071d	reduced number of facets in yacyinteractive (only filetype necessary)	12 years ago
Michael Peter Christen	16d90859b7	reverted put-semantics back to as-usual in serverObjects and introduced an add-method to put in several objects for the same key	12 years ago
Michael Peter Christen	c34af7fe94	extended JSON Response Writer and Opensearch Response Writer for the Solr search interface in such way that it is possible to use this interface for the yacyinteractive search. This search interface is now much faster using the Solr search directly. For the Solr interface it was necessary to create a translation from the YaCy search modifiers to the Solr facet selection. This was added in such a way that it becomes generic for the normal YaCy search and as a on-top evaluation for Solr queries.	12 years ago
Michael Peter Christen	762b687e47	extended the serverObjects to be able to hold multipel values for a single key. This is done using the solr class MultiMapSolrParams. That class is needed in the OpensearchResultWriter to get multiple facet requests.	12 years ago
Michael Peter Christen	d70d99fab5	added more metadata fields and facets to OpensearchResponseWriter. This should make it possible to replace the original and enriched yacy opensearch result with a solr output in opensearch format.	12 years ago
Michael Peter Christen	51e7ab4f70	moved bookmarks back to more prominent location (even if this does not fit to the 'Search Interfaces' headline)	12 years ago
Michael Peter Christen	dee8b24d3c	better error handling for bookmarks	12 years ago
Marc Nause	27894d2c1a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	12 years ago
Marc Nause	75f9568472	) only install files from the RELEASE directory ) minor changes	12 years ago
Michael Peter Christen	eb80405a16	added a disable function in RemoteCrawl_p servlet which prevents setting of remote crawl if peer is not a senior or principal peer	12 years ago
Michael Peter Christen	1e3d8cc235	show a link for the host in the host browser; see	12 years ago
Michael Peter Christen	7de502f43d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Marc Nause	3bc5ee6e3d	*) added protection against CSRF in update download page (http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release does not work anymore)	12 years ago
Michael Peter Christen	3834829b37	bugfixes and more logging for solr connector	12 years ago
Michael Peter Christen	d1cb4cbc84	enhanced network scanner, is faster and more flexible now - start more processes - remove superfluous host name resolution - better/more flexible subnet ip range calculation - prefer ipv4 makes better usable ip pre-settings in servlet - extended servlet by new subnet /20 - option - redesign of scanner start process in servlet (generalization)	12 years ago
Michael Peter Christen	7dfcc92b71	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	0b6566a389	optimizations when starting large crawl requests with many start urls in one request: - allow larger match-fields in html interface - delete all host hashes at once from zurl - when deleting by host, do not count size of deleted entries since that was the reason it took so long	12 years ago
orbiter	a2160054d7	ability to create vocabularies also without any objectspace: this iterates over all urls in the index do create terms	12 years ago
Michael Peter Christen	be27567b53	allow more links when starting a crawl by file	12 years ago
reger	3777b338c7	bugfix: location url for migrate urldb button onclick	12 years ago
reger	8447814a31	correct headermenue in migrateurldb_p.html - update NetBeans project path	12 years ago
Michael Peter Christen	99185d7048	one more fix for author_sxt	12 years ago
Michael Peter Christen	b6ae6262f6	- add the copyField author_sxt only if author exists - set the solr default search field according to existing fields	12 years ago
Michael Peter Christen	088373b4ea	catch exception if solr connection change fails	12 years ago
Michael Peter Christen	e23a596c1d	added a copyField for author_sxt for automated schema generation	12 years ago
Michael Peter Christen	f1a4feda3e	security fix for suggest (don't let users ask for too much)	12 years ago
Michael Peter Christen	244b157299	fix for external solr schema definition	12 years ago
Michael Peter Christen	0fe7b6fd3b	migrated the index export methods from the old metadata to solr. Now exports are done using solr queries. removed superfluous methods and servlets.	12 years ago
Michael Peter Christen	8eebeea533	fix for search result link in ViewFile	12 years ago
Michael Peter Christen	31e854bef6	Merge remote-tracking branch 'copro/master'	12 years ago
Michael Peter Christen	4735bd47f4	- changed solr commit call and added an optimize option. Since Solr 4.0.0 there is a new softcommit feature which implements a near-real-time (NRT) search option. The softcommit does not do IO and does not cause performance issues. YaCy has now an extension in its solr connectors to use the softcommit feature. The softcommit call now replaces all places where a hard commit was used. Furthermore the commit strategy in when doing a search from the web interface was changed (it's done every time before a search is done). The softcommit feature was implemented because it was needed for the following changes (customer demands), which is also included in this git commit: - added a feature to identify all documents which have unique titles and/or unique descriptions. These unique flags are disabled by default. - added also a feature to set a flag when the url from a canonical tag is equal to the document url. This is also disabled by default. To support the new softcommit strategy, the commitWithinMs option was set to -1 do disable automatic commit based on document insert times. If documents are inserted permanently then also a commit would happen permanently whenever the commitWithinMs time is reached. This would conflict with the regular autocommit of 10 minutes and the new softcommit strategy.	12 years ago
Copro	0025983993	Fix typo embedd -> embed	12 years ago
Copro	3ea8380959	Adding Vimeo tag to wiki commands to embedd Video video with id	12 years ago
Copro	ee9d7fd93d	Added feature to embedd Youtube videos to wiki commands for usage in Wiki, Blog or other servlets	12 years ago
Michael Peter Christen	9ccdd21d76	Merge remote-tracking branch 'aleksejs/fixtrans' Conflicts: locales/ru.lng Tried to merge this but I had to made this 'blind'. Sorry if I deleted something that was right.	12 years ago
Michael Peter Christen	aa067da86b	set the 'all' option as option at end of the list because the all option currently select also lists which cannot be exported in xml correctly	12 years ago
Michael Peter Christen	edbc86d2b0	integrated search term into opensearch result title. this makes better bookmark names when subscribing multiple search results from the same peer	12 years ago
Michael Peter Christen	4faa07c214	added a timeout for topic computation (solr is here much slower than the old metadata-db)	12 years ago
Michael Peter Christen	d2d5be032d	added a 'inlink' search option according to the suggestion in the YaCy forum at http://forum.yacy-websuche.de/viewtopic.php?f=18&t=4572#p27410 The feature was not called 'haslink' but called 'inlink' to have a analogous naming like 'inurl'. This causes now that you can search for words in links of the document, like: * inlink:yacy searches all documents which link to pages which have an 'yacy' in the url.	12 years ago
Michael Peter Christen	76e1e91b11	with strict compiler settings, IndexFederated_p does not compile without @SuppressWarnings("deprecation")	12 years ago
reger	3897bb4409	added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index) - migrates all entries in old urldb Metadata coordinate (lat / lon) NumberFormatException still relative often (see excerpt below), - added try/catch for URIMetadataRow (seems not to be needed in URIMetaDataNode, as Solr internally checks for number format) - removed possible typ conversion for lat() / lon() comparison with 0.0f, changed to 0.0 (leaving it to the compiler/optimizer to choose number format) current log excerpt for NumberFormatException: W 2013/01/14 00:10:07 StackTrace For input string: "-" java.lang.NumberFormatException: For input string: "-" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525) at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279) at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277) at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329) at transferURL.respond(transferURL.java:152) ... Caused by: java.lang.NumberFormatException: For input string: "-" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525) at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279) at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277) at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329) at transferURL.respond(transferURL.java:152)	12 years ago
reger	3b6e08b49f	prevent checking of urldb if empty - disconnect urlIndexFile if empty - add missing lock class in submenuSearchConfiguration	12 years ago
reger	1fb452174a	read defaults from yacy.init for "Set to Defaults" button	12 years ago
reger	f143804382	fix configuration for search page navigators - added additional config page (ConfigSearchPage_p) for easy setup of search page layout (to not overload ConfigPortal page) - currently redundant setting with part of ConfigPortal page - added missing config for filetype and protocol navigator - adjusted init of SearchEvent to check navigation config setting - renamed RankigProcess.getTopicNavigator to getTopics (to distiguish between added SearchEvent.getTopicNavigator)	12 years ago
Michael Peter Christen	24db2fcd9d	fix for Network info	12 years ago
Michael Peter Christen	fc47109608	added 'Last Hour' to network statistics	12 years ago
Michael Peter Christen	38d3feae65	added separate delete commands for the local+remote solr index, the old metadata and old rwi and for the citation index. The important advancement is the separation of the citation index deletion because that index is responsible for the linkdepth calculation. Now a search index can be deleted without the citation index and that should cause that less clickdepths must be post-processed.	12 years ago
Michael Peter Christen	6f0baaa309	added the clickdepth post-processing: some links may have 'shortcuts' to already calculated click depths. There are then calculated if the crawl buffer is empty and therefore no new 'shortcuts' can be discovered. The status of the clickdepth stack (to-be-processed) can be seen using a solr search command like this: http://localhost:8090/solr/select?q=process_sxt:[%20TO%20]&start=0&rows=30&fl=sku,clickdepth_i,process_sxt	12 years ago
Michael Peter Christen	0f5b6f38c1	enhanced root-url detection	12 years ago
Michael Peter Christen	8ae08a2cac	moved HTCache, Heuristics and Parser servlet to a more appropriate menu location	12 years ago
Michael Peter Christen	5c0c56cfe1	Preparations to produce a click depth attribute in the search index. This attribute can be used for ranking and for other purpose (demand by customer) The click depth is computed in two steps: - during indexing the current fill-state of the reverse link index is used to backtrack the current page to the root page. The length of that backtrack is the clickdepth. But this does not discover the shortest click depth. To get this, a second process to check again is needed - added a process tag that can be used to do operations on the existing index after a crawl; i.e. calculation the shortest clickpath. Added a field to control this operation but not a method to operate on this. - added a visualization of the clickpath length in the host browser	12 years ago
Michael Peter Christen	295884fd54	- Merge commit '168b1d130d9d67b5e8855a0b50c4ba7ad4a416f8' - fixed conflict in htroot/yacysearch.java - removed nedres check because that causes that the remote server is not called at all in most cases (local index has already results but we want more) - fixed a regex bug (a '=' too much)	12 years ago
reger	276e63401e	small sanitary fixes - exclude unix shell scripts in NSIS windows install archive - replace link to env/grafics/yacy.gif to yacy.png (build.nsi) - remove unused code lines (Blacklist_p, Response, WordReferenceVars) - type & xhtml (RankingSolr_p.html)	12 years ago
reger	f301336adf	fix: no results with configuration citation reference index switched off - urlcitationindex != null check added to ResultEntry.referencesCount - plus other places where conflicting procedure was used (and urlcitationindex not already checked != null)	12 years ago
orbiter	fe50702eb0	added a filterscannerfail attribute to QueryParams which causes that a check to the network scanner fail/success status can be used/suppressed for search results. This is a feature that comes with the port scanner.	12 years ago
reger	168b1d130d	Adding heuristic to get search results from configured systems which support opensearch specification - any system supporting opensearch specification can be configured - search query is only forwarded to remote system if not enough results available on local peer - discover function provided, checking the local Solr index for links to opensearchdescription files, to add to the config - sample config file with some general search engines with opensearch support	12 years ago
reger	7761b60325	fix: Broken Link on Crawler_p.html - issue 218 http://bugs.yacy.net/view.php?id=218 - reduced Solr logging (/select)	12 years ago
reger	e9e0d63897	Add config option to show HostBrowser link in search result - ConfigPortal: added checkbox Host Browser - yacy.init: added search.result.show.hostbrowser as default = on (true) - fix HostBrowser: broken link to protected WebStructurePicture for public user	12 years ago
Michael Peter Christen	4a9182ae16	use the search configuration to default the cacheStrategy to the value as given in the search configuration	12 years ago
Michael Peter Christen	e1f89efd0d	- made image search in interactive search using the ViewImage servlet - that enables viewing of images for intranet SMB servers. - added a filter search for protocol, tld and ext again; otherwise p2p search produces a lot of rubbish	12 years ago
reger	fbf84e9ff3	fix SeedUpload setting propery name for include template file	12 years ago
Michael Peter Christen	9e4033f229	fix for event starter: delete start time when event is removed	12 years ago
Michael Peter Christen	99edbf6f14	fix for config basic: do not accept empty peer names	12 years ago
Michael Peter Christen	24c9bb35f7	extended the Scheduler: introduced scheduled events - an event type (once, regular) can be selected - for this event type, a fixed time can be selected. This may be either directly after startup or at one of the full hours at a day (==25 options) The main point about this feature is the opportunity to start an action directly after startup. That makes it possible to create YaCy distributions which, after started at the first time, start to index parts of the intranet/internet by itself.	12 years ago
Michael Peter Christen	433143ba40	removed protocol, tld, ext from the urlmask and created specific navigation field for these	12 years ago
Michael Peter Christen	84f82541e8	search process enhancements	12 years ago
Michael Peter Christen	02020b590b	- removed all extension types from extension navigation which are not proper/known - automatically show the protocol navigation if there is more than http and https - automatically show the extension navigation if there is some media content	12 years ago
Michael Peter Christen	01200f06cc	using the author field as solr-native facet. this makes it necessary to introduce a copy-field for the author field to be copied to a string field. This field is then used to generate facets. Without this field, the facet would consist only of the words of the author names, not of the full author string.	12 years ago
Michael Peter Christen	7ad5457db0	using the solr facets as navigation in yacyinteractive.html instead of counting locally result types	12 years ago
Michael Peter Christen	1052263af3	- added a new solr field references_i which stores the number of INCOMING links to the corresponding web page. This information is taken from the reverse link index (a 'little sister' of the RWI index). - this field can be of use to enhance the ranking because a web page with more incoming links can be more more important than others. But this is not true for typical link pages like menues. Therefore the number of outgoing links is needed. - added a new solr attribute 'bf' to solr queries which is a boost function extension. this field can contain a formula which comuptes the boost according to given field values. After some experiments the following forumla is now default: div(add(1,references_i),pow(add(1,inboundlinkscount_i),1.6))^0.4 This takes the number of references and the inbound links. Further experiments are needed to enhance that forumula.	12 years ago
Michael Peter Christen	34f8786508	removed dependency of vocabulary navigation from Jena and it's triplestore; the vocabulary search is now done using generic solr fields which are created on-the-fly during runtime.	12 years ago
reger	664499bb10	PerformanceQueues: disable input for hardcoded httpd performance values	12 years ago
Michael Peter Christen	9319b90d8a	- fixes for host navigation - fixes for filetype navigation - removed unused code	12 years ago
Michael Peter Christen	cb5cbec14d	distinguishing modified query string and original query string	12 years ago
Michael Peter Christen	fb0fa9a102	- fixed 'delete from subpath' during crawl start which deleted nothing; now works; - changed some crawl start html design details	12 years ago
orbiter	54e193a2b8	you can now search for '*' to get just ALL entries in the search index as result list. This makes sense if you intend to search just by using the navigation tools to cut the data set into navigation 'slices'.	12 years ago
orbiter	7f5526e6ef	allow larger no-proxy expressions	12 years ago
reger	e80dfeca23	- making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171 ) - blacklist test adding explicite response text "not blocked" if no blacklist match	12 years ago
Michael Peter Christen	4491072256	- clear the search cache when altering the solr boosts - better positions for submit buttons	12 years ago
Michael Peter Christen	2b7d46bc1f	using a filter query for the site parameter in GSA api	12 years ago
Michael Peter Christen	10527e28ae	fix for wrong display of error urls in HostBrowser	12 years ago
Michael Peter Christen	5f5d66921e	patch for funny symbols in url paths (like tilde)	12 years ago
Michael Peter Christen	8aa08261a7	update to Solr Boost handling	12 years ago
Michael Peter Christen	908ad2f174	Added a new servlet to configure the solr ranking using field boosts	12 years ago
Michael Peter Christen	a598fb6227	renamed Ranking_p.html to RankingRWI_p.html because there will be another Ranking servlet as well at next	12 years ago
Michael Peter Christen	72f165d58b	added a Boost class which stores solr query boost values. The class can be configured using the yacy.init file. The boost information is taken from the configuration each time when a query to solr is done.	12 years ago
reger	bb20691d4f	fix: respect config setting of "show Nav Top-Menu" in HostBrowser.html for public users (as hostbrowser is now available in search results)	12 years ago
Michael Peter Christen	3de784c8dd	replaced more split and replaceAll missing pattern pre-compilation with pre-compiled pattern	12 years ago
Michael Peter Christen	8fc3679c66	using more pre-compile pattern for split methods	12 years ago
Michael Peter Christen	d48e9788d2	enhanced search result processing behavior - query less at one time; query more often - in between the small queries, evaluate results - remove fields from search results which are not needed	12 years ago
Michael Peter Christen	eca68fa197	added debug code to crawler monitor	12 years ago
Michael Peter Christen	205f8b222b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	c54cb85422	added link to http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html to the /RegexTest.html servlet	12 years ago
Michael Peter Christen	b7004043ea	- added a field cache for solr queries which call only for a single value - fixed a version conflict exception within a solr add request	12 years ago
Michael Peter Christen	bf42179982	introduced more structure in HostBrowser, table view, better counting, distinguishing of error cases (fail/excluded)	12 years ago
Michael Peter Christen	4eab3aae60	removed overhead by preventing generation of full search results when only the url is requested	12 years ago
Michael Peter Christen	a114bb23bb	- using edismax in gsa interface - generating less field data for gsa search results - using a boost query in gsa interface to move double content to the end of the result list	12 years ago
Michael Peter Christen	d6b82840f8	added a feature to find similarities in documents. This uses an enhanced version of the Nutch/Solr TextProfileSignatue. As a result, a signature of the document is written to the solr search index. Additionally for each time when a signature is written, it is checked if the singature exists already in the index. If the signature does not exist, the document is marked as unique. The unique attribute can now be used to sort document lists and bring duplicates to the end of a result list. To enable this, a large portion of the search api to Solr had to be changed. This affected mainly caching of 'exists' searches to enhance the check for existing signatures and do this without actually doing a solr query. Because here the first time a long number is used as value in the Solr store, also the value naming in the YaCySchema had to be adopted and normalized. This caused that many files had to be changed.	12 years ago
Michael Peter Christen	f5ca5cea44	- added field options to all solr queries. This can be used to restrict the actual data which is fetched from solr. - used the new field options to reduce generic options like getting the load date or the count of search results. should increase overall speed - used the new field options to reduce overhead in the host browser during aquisition of links. - used the field options to make checking of links in crawler faster - if the crawler is paused, the crawl queue is not cleaned	12 years ago
Michael Peter Christen	46be4af5b9	Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'	12 years ago
Michael Peter Christen	952e143580	FINALLY YaCy can now search for full strings using double- or singlequoted strings in the search query line!!!	12 years ago
orbiter	5dfd6359cb	redesign of the QueryParams class: introduced QueryGoal which holds the query string parser. This shall be used to create a proper full-string matching which is handled then by QueryGoal.	12 years ago
Michael Peter Christen	5fd3b93661	added deletion of hosts during crawl start if deleteold option was given	12 years ago
Michael Peter Christen	d64445c3cb	because we have the inurl:<term> - searchmodifier, we don't actually need regular expressions as search attributes. They had now been removed from the advanced search page while they are still created internally. The filter is then expressed against solr as regular expression filter query. If the expression points out a selection of an specific protocol, host or filetype this is then translated into a facetted query.	12 years ago
orbiter	b55ea2197f	- redesign of crawl start servlet - for domain-limited crawls, the domain is deleted now by default before the crawl is started	12 years ago
orbiter	1c66de4bd4	- removed scheduled crawling options in crawl start because it is superfluous there; it can be changed in the scheduler servlet. It's also confusing in the presence of the delete-option, which will be implemented next. - removed unused crawl start servlet - some refactoring to make the time parser reusable	12 years ago
Michael Peter Christen	2e7219f9fd	removed hightlighting of search results within collections in GSA interface	12 years ago
Michael Peter Christen	074dfd297b	added icons and a selection for hosts with urls pending for crawler or with errors	12 years ago
cominch	21df1ad9e0	update and generalization of the SMW import and content control routines	12 years ago
Michael Peter Christen	4c4e0eece2	added new submenu 'Target Analysis' with three servlets which are useful to analyse the target servers: robots.txt table, mass target analysis and a regex tester	12 years ago
Michael Peter Christen	61995d508e	do the commit anyway before calling a search interface	12 years ago
Michael Peter Christen	86ec199126	using a better file name	12 years ago
Michael Peter Christen	5105256927	update to search result logging (this was a remaining issue from the solr 4.0.0 migration)	12 years ago
Michael Peter Christen	570e42c4e3	fix for filetype naviagtor	12 years ago
Michael Peter Christen	71ed8e5e07	bugfixes for crawler	12 years ago
Michael Peter Christen	29fbbb49dc	better colors for host browser and corrected document count	12 years ago
Michael Peter Christen	6244b084cd	fixed wrong order of result count values	12 years ago
Michael Peter Christen	631b08e7e2	update to HostBrowser	12 years ago
Michael Peter Christen	51f420e4f5	removed location search because it is only working in special cases	12 years ago
Michael Peter Christen	15d1460b40	added information about the reason of pausing of crawls	12 years ago
Michael Peter Christen	2371ef031c	added solr faceted search support to YaCy search results added solr highlighting / YaCy snippets to YaCy search results - facets are now much more complete - facets are computed and searched much faster - snippet computation is done by solr if solr knows the snippet	12 years ago
Michael Peter Christen	d481abd087	added the visualization of error-urls to host browser - only visible for admins - a faceted search generates a huge list for all hosts in the host list - the faceted search algorithms had to be modified for that - within the browsing of the directory path, the error cause is written to the url which is presented as error-url - the errors are also accumulated for directory sums	12 years ago
Michael Peter Christen	a15819fbec	fix for some interface problems	12 years ago
Michael Peter Christen	791e1dcfdf	when a new crawl is started, delete all entries about error-urls for crawl-start domains	12 years ago
Michael Peter Christen	c6a6f4c4e6	added a hack which makes the HostBrowser more performant when the given host has a lot of urls. If the number of urls is > 1000, then the list of documents is restricted to such which have no subpath, if the root path is selected. However, this can cause a problem if no documents on the root path exist but only on paths below that root path.	12 years ago
Michael Peter Christen	64ac2b7b7d	new submenu template	12 years ago
Michael Peter Christen	5e77801aac	update to web interface structure	12 years ago
Michael Peter Christen	8fb370d9f8	renovated the way how search results are count. should be correct now...	12 years ago
orbiter	354ef8000d	- added 'deleteold' option to crawler which causes that documents are deleted which are selected by a crawl filter (host or subpath) - site crawl used this option be default now - made option to deleteDomain() concurrency	12 years ago
Michael Peter Christen	19d1f474ce	host browser now shows also number of pending files per subdirectory + bugfixes	12 years ago
Michael Peter Christen	75dd706e1b	update to HostBrowser: - time-out after 3 seconds to speed up display (may be incomplete) - showing also all links from the balancer queue in the host list (after the '/') and in the result browser view with tag 'loading'	12 years ago
Michael Peter Christen	e2c4c3c7d3	migration to solr 4.0.0	12 years ago
Michael Peter Christen	9330ad4838	- fixed the delete option in host browser - added a delete method which can be used to delete a full subpath in solr.	12 years ago

... 5 6 7 8 9 ...

4724 Commits (d1091e79f83591502fdc08444aca84b733300a71)