yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	5d16c23a1f	specified more URIMetadata as URIMetadataNode	12 years ago
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	12 years ago
Michael Peter Christen	cc98496ff3	enhanced the HostBrowser: - showing also outbound links to other domains if there are any - the outbound links browser shows also the link structure image - showing even inbound links if the web structure graph has information about that - removed the left menu and made the HostBrowser a part of the top menu for search - moved the file search also to the top menu - added hover information in the HostBrowser to explain what the click means - because the HostBrowser also links to the Metadata viewer ViewFile, there should be a button to switch back to the HostBrowser: added that also.	12 years ago
Michael Peter Christen	21fe8339b4	- enhanced generation of url objects - enhanced computation of link structure graphics - enhanced collection of data for link structures	12 years ago
Michael Peter Christen	4023d88b0b	added date info in parser errors	12 years ago
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	12 years ago
Michael Peter Christen	53789555b9	fix for crawl start filter	12 years ago
Michael Peter Christen	abebb3b124	added a crawl start checker which makes a simple analysis on the list of all given urls: shows if the url can be loaded and if there is a robots and/or a sitemap.	12 years ago
Michael Peter Christen	941873fba4	moved the index deletion functions from IndexControlRWIs to IndexControlURLs where it appears more naturally. Because the RWI administration is less important in the presence of Solr, the IndexControlURL is now the default servlet when the Index Administration button on the main menu is selected.	13 years ago
orbiter	ae246c30c3	fixed interpretation of directDocByURL attribute during crawl start	13 years ago
Michael Peter Christen	a06930662c	replaced some more .getBytes() with UTF8/ASCII.getBytes()	13 years ago
Michael Peter Christen	bd769de604	since the solr index is now used for all pages that are indexed locally, there is no need for the RWI index if the index is not transfered to another peer. Therefore the creation of RWI index data is now suppressed if DHT is disabled. This applies for all intranet and portal mode configurations, but not for public robinson modes. A robinson may switch back to public mode and then transmit its data. That means if someone wants to switch never to DHT mode, it would be more appropriate to choose the portal mode.	13 years ago
Michael Peter Christen	554db5608b	fix for ViewFile	13 years ago
orbiter	9190599d21	use links in AccessTracker	13 years ago
Michael Peter Christen	42e525ca9a	enhanced the host browser	13 years ago
Michael Peter Christen	76d218fbef	fixes to crawl profiles	13 years ago
Michael Peter Christen	2f536cb54d	code cleanup: removed unised methods and made more methods and objects private	13 years ago
Michael Peter Christen	406e1f3e7e	added an option to start indexing right from the host browser	13 years ago
Michael Peter Christen	f8a3ab2d82	added the usage of synonyms to the GSA search interface	13 years ago
orbiter	be4c96f3b1	The HostBrowser now offers to index files that are discovered because they are linked in the web interface.	13 years ago
Michael Peter Christen	c4a3d8870f	fixed computation of links in host browser which are not indexed but knwon by the crawler. Such links are now displayed in grey color.	13 years ago
Michael Peter Christen	97a47319c8	added nice links to the host browser: - click on the file icon to get the metadata of the file - click on the link icon behind the link to open the original file in the browser	13 years ago
Michael Peter Christen	f45f7fc12e	added new Host Browser to main menu: this new search interface is something completely new for search, but completely common on desktops: browser a web space like one would browse a file system in a file browser. The file listing is created using the search index and a faceted restriction to specific domains.	13 years ago
Michael Peter Christen	280e36c90b	allow Cross-Origin Resource Sharing for all stream servlets, that is the solr and the gsa search interface. That means that all JavaScript in browsers now can Cross-Origin access all YaCy search interfaces, which opens the option of 'YaCy Client in Browser' and 'End-Point Fail-over' concepts.	13 years ago
Michael Peter Christen	ccd65ecf8d	fixed url search in IndexControlURLs_p.html / using now the solr interface	13 years ago
Michael Peter Christen	24d2ee3c52	- better date ranking - more protection against NPE and time travel effects	13 years ago
Michael Peter Christen	a4214694df	We assert that no other metadata storage than solr is used now. Therefore a property like solrConnected() must be true all the time. Removal of this method causes removal of all write operations to the old metadata index.	13 years ago
Michael Peter Christen	abab291162	made the index schema retrieval public and allow cross-domain retrieval	13 years ago
sixcooler	c65b576a6f	added filename for missing crawlname when crawling from file	13 years ago
Michael Peter Christen	562183932b	- removed ip_s from default profile since that needs a DNS lookup to create an document entry. This makes remote search much slower. - removed synchronization of add method if ip_s is activated to prevent that a user configuration causes bad behavior. The disadvantage of that is, that a index dump can cause data loss if an indexing is running during index dump - catched more exceptions and more NPE - better abstraction in MirrorSolrConnector - slight performance enhancement when only the index count is requested (rows=0 is sufficient to get a total count)	13 years ago
Michael Peter Christen	24f4ca4d85	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
apfelmaennchen	7efe9eb37b	adding CORS access header for Network.xml to overcome cross domain restriction (e.g. necessary to build a JavaScript YaCy client).	13 years ago
Michael Peter Christen	c913b2ba77	- fix for NPEs during remote solr configuration - fixed remote solr setting switch - added more logging	13 years ago
Michael Peter Christen	882d54067a	added dummy update servlet	13 years ago
Michael Peter Christen	1533bfd63b	refactoring	13 years ago
Michael Peter Christen	e49359cc95	removed tenant query attribute since it is not used any more and is replaced by the site-operator in the GSA interface. This operator can also be simulated in the Solr interface using the collections_sxt field.	13 years ago
Michael Peter Christen	872f83ebe0	refactoring	13 years ago
Michael Peter Christen	15ea053c3a	- added xml output in IndexControlURLs to get the storage page of index dump commands - adjusted the apicall.sh script to get the downloaded text as output to stdout which is necessary to parse the content out of it - added indexdump.sh script which creates a solr dump and prints out the storage path for the index dump - added synchronization to the Fulltext class to prevent that data is stored to a non-existing solr index while this index is disabled during the storage of the dump	13 years ago
Michael Peter Christen	1b474139dd	used the new zip writer/reader to add a solr dump process: the whole solr index can be written to a zip dump and also restored during runtime	13 years ago
Michael Peter Christen	e57bf2ca39	simplified DHT classes	13 years ago
orbiter	14897d4bfc	fixed mistake in wt-option which caused that the yacy json format overlapped the solr built-in json format	13 years ago
Michael Peter Christen	8219a445f3	refactoring	13 years ago
Michael Peter Christen	fa7f6f0be8	added HostBrowser servlet (stub)	13 years ago
Michael Peter Christen	00c1c777fa	refactoring	13 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	13 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	13 years ago
orbiter	089a03114e	full memory usage for debian and when changing the size: debian seems to dislike the big difference between xmx and xms (I have crashes here which stop if both values are same)	13 years ago
orbiter	60b1e23f05	added new crawl options: - indexUrlMustMatch and indexUrlMustNotMatch which can be used to select loaded pages for indexing. Default patterns are in such a way that all loaded pages are also indexed (as before) but when doing an expert crawl start, then the user may select only specific urls to be indexed. - crawlerNoDepthLimitMatch is a new pattern that can be used to remove the crawl depth limitation. This filter a never-match by default (which causes that the depth is used) but the user can select paths which will be loaded completely even if a crawl depth is reached.	13 years ago
Michael Peter Christen	6ec02deec6	added new crawl attributes in crawl profile (not active yet)	13 years ago
Michael Peter Christen	a13e5153ac	- added the possibility to have not one but a list of crawl start urls - the list of urls is entered in the expert crawl start in a textfield; the one-line input field was replaced with a text box - start urls can also be given in one single line where the urls are separated by a '\|'-character - as an effect, the crawl profile cannot carry a single start url for identificaton because it is possible to have more. Therefore the url was removed from the crawl profile - this affect all servlets which display a crawl profile: removed the url field from all there servlets - to work consistently with several start urls and the other crawl starts which computed crawl start url lists from sitelists or sitemaps, the crawl start servlet was restructured completely - new rules for must-match patterns were created to make it possible that site crawl starts also work with several crawl starts at once	13 years ago

1 2 3 4 5 ...

4034 Commits (ed803708ab45bb65132325bd0dbb53b87a4d794e)