yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	167c5a51f0	IPv6 fix	10 years ago
Michael Peter Christen	fe537679de	fix for exact_signature_unique_b, exact_signature_copycount_i, fuzzy_signature_unique_b and fuzzy_signature_copycount_i: apply same criteria for 'valid document' as for title and description uniqueness test.	10 years ago
sixcooler	eb9d2705d2	fix for ConnectionInfo.cleanup of server-connections	10 years ago
Michael Peter Christen	2e5214eb21	added field postprocessing.partialUpdate to settings which can be used to switch on or off partial updates. Both options should cause the same result. Default is on.	10 years ago
Michael Peter Christen	11074d8d24	fix for a ssl bug that appear only in java 7. The bug was reported in http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5407&p=30956#p30956 a solution was described in http://teknosrc.com/javax-net-ssl-sslprotocolexception-handshake-alert-unrecognized_name-solved/ which worked for this example given in the yacy forum	10 years ago
Michael Peter Christen	e96490e3a1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	77662e08e1	concurrently initialize the error cache; extended also the cache by factor 10 up to 1000 entries. This error cache is only used to catch up paused crawls between shutdown+startup	10 years ago
sixcooler	d8fcc4a2f5	added a timeout on Jetty connectors	10 years ago
Michael Peter Christen	0f0b60404b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
sixcooler	72561926aa	do not overwrite yacy.conf in case of an exception may be a fix for http://mantis.tokeek.de/view.php?id=180	10 years ago
Michael Peter Christen	07c5b57953	removed warnings	10 years ago
orbiter	fa2ad101ec	enhanced graphics computation (avoiding long string parsing for colours)	10 years ago
orbiter	ef813cec91	added proper copyright notice to OSM tiles presented at the search result page	10 years ago
Michael Peter Christen	fca11701f0	better profiling of solr queries	10 years ago
Michael Peter Christen	2e09da9832	npe fix	10 years ago
Michael Peter Christen	d80418f1b1	added partial updates to solr during postprocessing: during postprocessing the solr documents are now not completely retrieved. instead, only fiels, needed for the postprocessing are extracted. When Solr document are written, this is done using partial updates. This increases postprocessing speed by about 50% for embedded Solr configurations. For external Solr configurations the enhancement should be much higher because the postprocessing with remote Solr is very slow. When doing partial updates to a remote Solr, this method should perform much better than before, it is expected that this is even much higher than the increase with local Solr.	10 years ago
Michael Peter Christen	b1cfbc4a04	added new solr field url_paths_count_i which can be used to enhance the index browser and maybe also for ranking; possibly also for SEO-with-YaCy applications.	10 years ago
Michael Peter Christen	e69883d5ab	fix-fix for `30d4402cd1`	10 years ago
Michael Peter Christen	30d4402cd1	fixed location search	10 years ago
Michael Peter Christen	6983dff334	explain crawl denial when not switched to intranet mode	10 years ago
Michael Peter Christen	f818f84adb	more ipv6 fixes	10 years ago
Michael Peter Christen	afd5bd5f5f	slightly enhanced Network table computation by using a lazy initialized bitfield for peer flags	10 years ago
Michael Peter Christen	2c2b50e65d	refactoring (class name should start with uppercase letter)	10 years ago
Michael Peter Christen	bc275dca07	added network history graph image /NetworkHistory.png which can show many different statistics about the history of the peer.	10 years ago
Marc Nause	ce9368246b	Merge branch 'master' of gitorious.org:yacy/rc1	10 years ago
Marc Nause	5603809deb	Minor changes: ) reduced visibility of a method ) updated comments	10 years ago
Michael Peter Christen	d8beafba3a	fix for values in CrawlProfileEditor table and xml; now the full profile is available in the xml.	10 years ago
Michael Peter Christen	ec95dfa2e6	fixed crawl profile xml result which did not show the correct crawl status.	10 years ago
Michael Peter Christen	8c1a89cb34	added another decoration flag to switch off network graphics in crawler monitor and index browser: decoration.grafics.linkstructure Please set this to false to remove the graphics from the interface.	10 years ago
Michael Peter Christen	ee27be3399	misc bugfixes (concurrency, memory protection)	10 years ago
Michael Peter Christen	9b1958e8ca	more ipv6 bugfixes	10 years ago
Michael Peter Christen	7817fc50c9	added a high cpu cycle monitor to PerformanceQueues	10 years ago
Michael Peter Christen	5082feb103	less volume for effect sounds	10 years ago
Michael Peter Christen	e8392e2ff2	fix for local search	10 years ago
Michael Peter Christen	0bfc69b29b	more ipv6 bugfixes	10 years ago
Michael Peter Christen	a27563e5c3	removed the atmo sound clips because they had been too large	10 years ago
Michael Peter Christen	883622306e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/peers/Protocol.java	10 years ago
Michael Peter Christen	97995a1dd9	fix for remote search process	10 years ago
Michael Peter Christen	0843b12ef3	ipv6 fix: avoid that shrinked own ip set is overwritten with (non-valid) set of local IPs	10 years ago
Michael Peter Christen	92c5d97486	fix for bad node flag setting with IPv6	10 years ago
orbiter	c27bad9326	more ipv6 fixes	10 years ago
orbiter	cddf884bc4	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	10 years ago
Michael Peter Christen	460858fb22	more ipv6 fixes	10 years ago
Michael Peter Christen	5cef88a315	argh.. adding missing java class for latest audio feature	10 years ago
Michael Peter Christen	74957f3760	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	2a052f446a	Added an experimental audio feedback system. This is the first element of a new 'decoration' component which may hold switches for different external appearance parameters. The first switch in that context is decoration.audio (as usual in yacy.init). This value is set to false by default, that means the audio feedback element is switched off by default. To switch it on, set decoration.audio = true (using /ConfigProperties_p.html). You will then hear sounds for the following events: - remote searches - incoming dht transmissions - new documents from the crawler Sound clips are stored in htroot/env/soundclips/ which is done so because a future implementation will read these files using the http client and with configurable urls which will make it very easy for the user to replace the given sounds with own sounds.	10 years ago
Marc Nause	1e6e69bc40	Finished implementation of UPNP: ) will try other ports if YaCy standard ports are not available ) distinguish between internal and external port (not sure if this works 100%) Still to add: propery in config to enter own external port (in case of manually configured NAT)	10 years ago
Michael Peter Christen	d0358e568b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	e1bc768f9d	more IPv6 bugfixes	10 years ago
reger	59c6532a65	add link extraction to pdfParser this extracts clickable links in pdf and adds it to the list of links include a test case for this function this is the corrected comment for commit: `aa2e15d846`	10 years ago
reger	aa2e15d846	allow url parameter in worktable apicall allow url=wwwl?param=a&param=b (with ?, & encoded) fix: http://mantis.tokeek.de/view.php?id=100 fix double adding of '&' in MultiProtocolURL.escape()	10 years ago
orbiter	f3a12801f0	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	10 years ago
orbiter	d93325a578	lazy handling of process_sxt field (part of postprocessing)	10 years ago
Michael Peter Christen	b31db00010	toString fixes	10 years ago
Michael Peter Christen	961f06c0b6	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
reger	209e0f2fe8	allow url parameter in worktable apicall allow url=wwwl?param=a&param=b (with ?, & encoded) fix: http://mantis.tokeek.de/view.php?id=100 fix double adding of '&' in MultiProtocolURL.escape()	10 years ago
reger	b5ca20de15	preserve content_type (mime) if supplied in preference of construct in from file type. (this eventually can benefit image search by using mime only) reduce redundant field assignment for Solrdocuments created from URIMetadataNode (URIMetadataNode = SolrDocument with partially assigned fields)	10 years ago
reger	fe9f1c594e	fix char encoding parameter in UrlProxy	10 years ago
reger	b0c87d8240	fix image search expand box, cut-off of 2nd capture line height tested with IE11 and Firefox 32 (change worked for both to show 2nd line without cutting off height) +fix charset parameter in metadataImageParser +update start errMsgTxt to "java 1.7"	10 years ago
Michael Peter Christen	2c2ed8bf4e	typo in javadoc	10 years ago
Michael Peter Christen	528f583d72	ipv6 fixes	10 years ago
Michael Peter Christen	6ee5b4352d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	247e626083	IPv6 host parsing bugfixes	10 years ago
reger	fb1fcc2b03	handle noarchive tag, skip writing page to cache http://mantis.tokeek.de/view.php?id=44	10 years ago
Michael Peter Christen	fe917deb2d	when pinging other peers, be able to select the right IP option	10 years ago
Michael Peter Christen	65e6ae52fb	IPv6-enhanced Network monitoring page	10 years ago
Michael Peter Christen	3073c69aee	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	6491270b3a	large IPv6 redesign of peer ping methods! removed preferred IPv4 in start options and added a new field IP6 in peer seeds which will contain one or more IPv6 addresses. Now every peer has one or more IP addresses assigned, even several IPv6 addresses are possible. The peer-ping process must check all given and possible IP addresses for a backping and return the one IP which was successful when pinging the peer. The ping-ing peer must be able to recognize which of the given IPs are available for outside access of the peer and store this accordingly. If only one IPv6 address is available and no IPv4, then the IPv6 is stored in the old IP field of the seed DNA. Many methods in Seed.java are now marked as @deprecated because they had been used for a single IP only. There is still a large construction site left in YaCy now where all these deprecated methods must be replaced with new method calls. The 'extra'-IPs, used by cluster assignment had been removed since that can be replaced with IPv6 usage in p2p clusters. All clusters must now use IPv6 if they want an intranet-routing.	10 years ago
reger	eaccce3467	added metadataImageParser for tif and psd (Photoshop) images. This is a modified genericImageParser adding tif (and psd) support even if java ImageIO plugin for tif is not installed in JDK. Adds just tif and psd to the available parsers. Uses the same library to extract metadata, so could eventually be merged with genericImageParser. All detected metadata are added to the parsed document (potentially some more as with genericImageParser)	10 years ago
reger	a69f5358ff	use javax ImageIO getReader to add supported image extension/mime genericImageParser uses javax ImageIO, supported images depend on available plugins for ImageIO package (this is JDK installation specific). Jpeg, png and gif are availabel by default. Tif and others only on avalable plugin (in classpath). Add supported image type dynamically on startup.	10 years ago
reger	8b1ce49ee6	remove unused variable timeout	10 years ago
reger	48aed15c48	skip loader wait cycle on concurrent access in nocache configuration. In nocache config resource is loaded online, leaving no benefit to wait for a faster cache hit.	10 years ago
Michael Peter Christen	67cd4c37bd	activated the new apk parser which was already ready but not included in the parser initialization. To make the apk parser usable, the handling of application type links had to be modified. Now all documents which have not a parser attached are placed to the noload-queue while all other documents are parsed using the associated parser class. This may have side-Effects on other parsers and the display of different file classes (images, apps, videos).	10 years ago
orbiter	a922b122a3	added a hack to forward solr search results from an external attached solr to the YaCy built-in solr search servlet. Its not complete and not fully correct (there is still a utf8 encoding problem) but it is a way to get easily requests forwarded through YaCy to an external Solr.	10 years ago
Michael Peter Christen	025516f682	fix for crawl limit for number of pages fail	10 years ago
Michael Peter Christen	2645dc816a	added warning for not well-formed postprocessing queries	10 years ago
Michael Peter Christen	437ce3b8a0	added internal api for partial updates to Solr	10 years ago
orbiter	3ac31614a3	added option to reverse-sort YaCy tables (internal API change only)	10 years ago
Michael Peter Christen	6d3d4c4ea6	changed the concurrent enumeration of query results in such a way that it is now possible to get the results in two steps: - first retrieve all IDs as given for a query - then retieve each document individually This was necessary for very large result sets where a query may run for hours and is possibly terminated by a solr-internal timeout. This occurs regulary during postprocessing and therefore this commit may fix unwanted postprocessing terminations.	10 years ago
Michael Peter Christen	ad35d9294f	added a 'stats' table which records some peer statistics twice every hour. The table can be shown with http://localhost:8090/Tables_p.html?table=stats The entries have the following meaning: aM: activeLastMonth aW: activeLastWeek aD: activeLastDay aH: activeLastHour cC: countConnected (Active Senior) cD: countDisconnected (Passive Senior) cP: countPotential (Junior) cR: count of the RWI entries cI: size of the index (number of documents) The entry keys are abbreviated to reduce the space in the table as the name is written again for every row. This is the beginning of a 'yacystats' micro-alternative als built-in function in YaCy. Graphics may follow after some time if enough test data is available.	10 years ago
reger	8284ea751a	catch TimeoutException during ping and do not delete yacy.conf during prereadconfigfile found a situation after crash (reboot) with existing running semaphore but YaCy not running. Ping generated exception which finally deleted the conf file (during pre-read procedure) - change to ping (catch exception solved it) - additionally removed delete yacy.conf file (if needed we need to make a backup)	10 years ago
reger	ffa7c7116f	better fix for NPE in image search replace `8931e14514`	10 years ago
Michael Peter Christen	759e7d9538	fix for http://forum.yacy-websuche.de/viewtopic.php?p=30720#p30720	10 years ago
Michael Peter Christen	bf18a39d0e	replaced warning with info	10 years ago
Michael Peter Christen	f1032fb8fe	more enhancements to image search in case that a restriction to a single domain is done	10 years ago
Michael Peter Christen	475125f9d7	hack to get more results when doing a remote site search	10 years ago
Michael Peter Christen	81f9b34da7	increaesed ability ot search for all images on a single server within the p2p remote search	10 years ago
Michael Peter Christen	2c26013c50	better contentdom abstraction	10 years ago
Michael Peter Christen	6a8fb8190b	changed default value for maximum number of connections to 50	10 years ago
Michael Peter Christen	ca8b2bf099	removed www and welcome servlet, these had been demo servlets and are not needed any more	10 years ago
reger	03a7a29db3	limit OAI import urn resolver try for Deutsche National Library The resolver service of National Library uses name space nbn, limit use of nbn-resolving.de accordingly to urn:nbn: - add resolver for rfc's	10 years ago
Michael Peter Christen	0838326a76	changed error message, see http://mantis.tokeek.de/view.php?id=439	10 years ago
reger	b5e0f70197	- remove repositoryPath post from ConfigBasic (obsolete) - remove static snippetComputationTime from ResultEntry (not used)	10 years ago
reger	8931e14514	fix NPE in image search	10 years ago
Michael Peter Christen	1735dbc9d9	enhanced image search: bugfixes and performance enhancements	10 years ago
Michael Peter Christen	ebd0be2cea	fixes and speed updates for search process	10 years ago
Michael Peter Christen	7611bf79bd	Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1 Conflicts: locales/ru.lng	10 years ago
Michael Peter Christen	524bedc00a	fixed text in startup tray icon and added shutdown icon during shutdown	10 years ago
Michael Peter Christen	4709d8417c	npe fix for non-tray users	10 years ago
orbiter	5b5635e187	replaced font for boot tray icon with image and added some more images for further tray icon displays	10 years ago
orbiter	aa6cdc4ab5	speed-up of start process if remote DNS waits for timeout	10 years ago
orbiter	40b3977c21	added an animation of the tray icon during the boot phase of YaCy. Additionally, there is a tooltip and a new headline at the tray menu which states the current booting status.	10 years ago
Michael Peter Christen	ec6082c872	very bad language detection hack fix hack	10 years ago
Michael Peter Christen	39615de3f9	adding the buffer size is not wrong but may cause confusing information when the buffer is cleaned after a buffer flush which is not then available in Solr since that is waiting for a commit. In such cases the counter would run backwards which is prevented by ignoring the buffer size.	10 years ago
Michael Peter Christen	395edec6f1	changed strategy to count the number of documents: get the max of solr+buffer and the hit cache. This shall help during first crawls to see a running document counter even if there was no commit meanwhile to solr. To support that strategy, the hit cache must be written earlier.	10 years ago
Michael Peter Christen	e87dc08c0d	set the correct fail time in error docs	10 years ago
Michael Peter Christen	cfb20bc0ce	removing the [] for ipv6 addresses may be a bad idea..	10 years ago
orbiter	b6d57f06eb	enhanced the apk parser (up to beeing production-ready). The parser is not yet activated and will be after the next release step.	10 years ago
Michael Peter Christen	a7dd89c4de	changed method to write the citation index: do not catch up references during document parsing; instead use the same references that would also be written into the webgraph. That should cause that the webgraph and the citation index express the exact same semantic.	10 years ago
Michael Peter Christen	57ce7eeff3	fixed localhost authorization and replaced the adminRealm with an info string which is visible in the browser. That makes it possible that the browser instructs the user how to change a forgotten admin password (during runtime).	10 years ago
orbiter	f318d7c285	enhanced date-ordered ranking	10 years ago
reger	a6891ff7f8	fix Querygoal.parse exception on +/-null-term covers http://mantis.tokeek.de/view.php?id=452	10 years ago
reger	c7335318eb	remove unused legacy procedure from httpserver (deleted generateSocketAddress(port) )	10 years ago
Michael Peter Christen	eab0d3e1a9	bugfix for wrong lock display, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5321&p=30484#p30484	10 years ago
orbiter	49d4f95faf	bugfix to latest commit	10 years ago
orbiter	68211f8244	enable Crawler_p servlet if a rss feed or a wiki dump import was submitted.	10 years ago
orbiter	a65df4ce7e	do not push noindex errors into log if in intranet mode. noindex attributes are attached to artificial constructed index.html files which list directories. Such files are naturally rejected by the crawler and should not appear in the error log because these files are part of the construction of file crawlers and confuse users if they see them in the error log.	10 years ago
orbiter	688c6d8954	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	10 years ago
orbiter	4ae7aead28	addon to latest fix	10 years ago
Marc Nause	2af56fa37d	Improved UPnP. (still not perfect) ) set HTTPS port if enabled ) improved data structures (may not be final) *) moved UPnP to own package	10 years ago
orbiter	b3ebd38079	removed the HTDOCS repository concept because the concept to host files on the YaCy http server is obsolete; YaCy can index file:// and smb:// paths	10 years ago
reger	1fdcc2d67b	change seedfile upload ip check to allow intranet ip in intranet mode - this allows to setup a principal peer in intranet environment	10 years ago
reger	e31b0e6d67	- update javadoc Seed.getIP - default mySeed.ip to hostip in SeedDB.initMySeed() if Intranetmode this allows to become senior status in intranet hosted search network with view peers, otherwise peer would stay junior because of default init with loopback ip as public (dna) ip.	10 years ago
reger	350c6b8250	in IntranetMode allow intranet hosted seedlist with Network_Domain "any" - so far intranet seedlist hosts are always denied but need to be allowed in intranet mode	10 years ago
orbiter	d68438c3d9	make sure that the postprocessing background thread never dies by any exception	10 years ago
orbiter	b4f2a1db6e	added a unlock icon for all protected pages that are unlocked because the administrator is logged in.	10 years ago
reger	ea6c9e9b07	reduce mem buffer overhead for gap files during r/w (they are typically small compared to idx allowing to use smaller buffersize -> set to 16k records)	10 years ago
reger	e88537522d	allow single quote " ' " in query see http://mantis.tokeek.de/view.php?id=379 -add QueryGoal test case for this	10 years ago
orbiter	487021fb0a	snippet computation update	10 years ago
orbiter	1c2f1f233a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	10 years ago
reger	5a4995ded3	fill solr rss writer dc:subject tag with keyword content	10 years ago
orbiter	927aaa95a6	concurrency bugfix	10 years ago
orbiter	c9e593cf78	removed warnings	10 years ago
reger	7584352e7b	use more predefined Solr query parameter constants - use CommonParams and DisMaxParams constants - fix typo in get sort parameter - getDocumentCountByParams redundant implementation and risk of not optimized call (row parameter unspecified) -> as only used from getCountByQuery removed from interface	10 years ago
reger	f9db5dd6c5	reduce doublecontent check document (prevent out of memory) see http://mantis.tokeek.de/view.php?id=437 test result (concurrency=7) 2000 docs = eom always 1000 docs = eom always 100 docs = eom never chosen -> 200 docs (eom not encountered during test with 1GB mem setting)	10 years ago
reger	e9eae45b55	simplify rssreader and improve atom feed link extraction - type detection (rss/atom) - init type parameter overwritten during parse, parameter obsolete - detection by endtag changed to simpler first-tag evaluation - channel image not used, removed related extra parser handling - remove unused code (set/getImage) in rssfeed - atom link extraction to account for possible multipe link tags - spec limits link to one with rel="alternate" or one without rel attribute not accounting for the follwing type & hreflang exception yet: o atom:entry elements MUST NOT contain more than one atom:link element with a rel attribute value of "alternate" that has the same combination of type and hreflang attribute values.	10 years ago
reger	a8508417d1	catch NPE during crawl (OAI import) - condenseDocument mime=null (allowed) - collectionconfiguration responseheader = null (allowed)	10 years ago
reger	3dde94422f	center searchevent lines on network graph (PerformanceSearch_p.html)	10 years ago
Michael Peter Christen	3860711aef	fix for possible interruption of concurrent queries	10 years ago
Michael Peter Christen	6344718f8b	reducing the concurrent query stack size and reduced concurrency of postprocessing to avoid OOM situations	10 years ago
Michael Peter Christen	eca9380e3d	bugfix for crawler double-check: if an url is redirected, the redirect-target was not double-checked. This is now done by replacing the redirect-URL on the crawl queue again (where it is double-checked)	10 years ago
Michael Peter Christen	9ac0c93f17	fix for subpath crawl filter	10 years ago
Michael Peter Christen	66106bdaf0	fix for crawler attribute maxdompages	10 years ago
Michael Peter Christen	49d91b94c3	npe fix in crawler	10 years ago
Michael Peter Christen	b7183a7321	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
reger	ea2e627662	fix ConfigAccounts del user with uppercase letter in name (usernames are case sensitive, userdb.delete used toLower)	10 years ago
Michael Peter Christen	c465b791af	typo	10 years ago
Michael Peter Christen	191ec8c82a	added concurrency to postprocess rewrite process	10 years ago
Michael Peter Christen	a1e8bdd5e9	log ppm instead of docs/second	10 years ago
Michael Peter Christen	cc0ded7abd	set process type of web graph according to fields as defined in the schema	10 years ago
Michael Peter Christen	12fb9d7cd1	log postprocessing constraints in case that postprocessing is not performed	10 years ago
Michael Peter Christen	3c23b89823	less logging	10 years ago
Michael Peter Christen	a0c53174c5	better solr query logging to detect unnecessary sort requests for more performance profiling	10 years ago
Michael Peter Christen	338f574bdc	no sorting if http/www unique fields are not demanded (makes query faster) and some code restrucuring	10 years ago
Michael Peter Christen	1609763be5	toString fix	10 years ago
Michael Peter Christen	b983e68254	more retries, less sleep	10 years ago
Michael Peter Christen	1503ba7794	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
reger	8f77719091	fix "Ljava.lang.String" in crawl queue anchor name (e.g. IndexCreateQueues_p.html?stack=LOCAL with images in queue)	10 years ago
Michael Peter Christen	0ceeceb35e	more logic on Solr queries; usage of the query terms in posprocessing, saving one query for double document detection now per document	10 years ago
orbiter	38864ae004	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	10 years ago
orbiter	4099296b45	added new classes which shall reduce call overhead to Solr (stub)	10 years ago
reger	d0c02e1de7	adjust rss lat/lon to double (common format across other classes)	10 years ago
orbiter	3491ab4c38	removed unused images from webgraph edge computation	10 years ago
orbiter	2371d6b8db	target linktexts must be string to enable search facets on these fields	10 years ago
Michael Peter Christen	001e05bb80	do not store failure of loading of robots.txt into the index as a fail document	10 years ago
Michael Peter Christen	05d58e4df0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	98f45c9032	fix for image alt attachment to AnchorURLs in html parser.	10 years ago
orbiter	22ce4fb4dd	better error handling for remote solr queries and exists-checks	10 years ago
Marc Nause	9df14fc126	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Marc Nause	477be17c51	Replaced old UPNP library with Weupnp. UPNP should work now, at least it does on my network. UPNP code in YaCy can still be improved though (see TODO comment: make port on gateway configurable or find free one). ) removed old code ) added new lib *) changed code to work with new lib	10 years ago
orbiter	738989aab7	reverted commit `f94c91315b` because the webgraph has not enough performance for that	10 years ago
orbiter	e9163e7e10	fix for malformed hostpath names in crawl balancer	10 years ago
Michael Peter Christen	c115f3869c	enhanced snippet computation and test method in ViewFile	10 years ago
reger	6c10b59f3e	move bootstrap peers test systems to its test class var assignment not needed elsewhere.	10 years ago
orbiter	1027f3d04a	fix for the usage of ready-prepared solr queries, some queries are formulated as edismax query but this was not set as query attribut. The defType=edismax property needs a qf-field, so this was added as well. Do not remove that field again! This fixes also a problem with title-unique computation.	10 years ago
Michael Peter Christen	f94c91315b	if the webgraph is used, then use it also for reference computation to avoid contradictions with references_i in the collection index.	10 years ago
Michael Peter Christen	6e1dc444c3	added a snippet test function in ViewFile: you can now search for a specific word on the document; the servlet returns the snippet in the same way as it would be shown in a search result.	10 years ago
orbiter	4b06adb751	fix for file urls	10 years ago
orbiter	08409ec680	no idea why the words max was an ordered one. This change increaes speed dunring document processin a bit	10 years ago
reger	e5854a5cdb	fix localhost link to opensearchdescription.xml	10 years ago
Michael Peter Christen	b44626e55b	fixed target_alt_t in webgraph	10 years ago
Michael Peter Christen	504327b15c	fix for condition for writing the webgraph	10 years ago
Michael Peter Christen	542c20a597	changed handling of crawl profile field crawlingIfOlder: this should be filled with the date, when the url is recognized as to be outdated. That field was partly misinterpreted and the time interval was filled in. In case that all the urls which are in the index shall be treated as outdated, the field is filled now with Long.MAX_VALUE because then all crawl dates are before that date and therefore outdated.	10 years ago
Michael Peter Christen	4eec1a7452	refactoring (change Metadata name of load time data structure to avoid confusion with Node data which is also called metadata)	10 years ago
reger	c95ba52cf0	improve logexception info - log a message or class name insted of msgtxt "null"	10 years ago
orbiter	e441831a24	reverted toString() change in AnchorURL to prevent mistakenly used toString(). This fixes also the update link bug.	10 years ago
reger	47f201a6b8	Add Solr default query fields (&qf) to select servlet according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query). This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration. Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields and does not relay on the duplication of text to text_t. - add author to reset-default boost fields (support results for author nav)	10 years ago
reger	f96cfdc84d	prevent array out of bound exception on getRankingProfile(x) on faulty &profileNr= query parameter	10 years ago
reger	5f5fb4ecdc	remove unused static (RSS)search from protocol	10 years ago
reger	7c1706d83a	use CRLF in generated bat command scripts for windows - for easier viewing with standard viewers	10 years ago
reger	a2cb366b25	Combine /heuristic search modifier with opensearch configured targets - with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid) - this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches - the index.html searchoption text adjusted to be displayed only if option configured - add Archive-It to predefined systems	10 years ago
Michael Peter Christen	2de159719b	added an option to set 'obey nofollow' for links with rel="nofollow" attribute in the <a> tag for each crawl. This introduces a lot of changes because it extends the usage of the AnchorURL Object type which now also has a different toString method that the underlying DigestURL.toString. It is therefore not advised to use .toString at all for urls, just just toNormalform(false) instead.	10 years ago
Michael Peter Christen	bf1b6b93e7	do not write CR values to webgraph if no CR values are computed	10 years ago
Michael Peter Christen	e039e78210	small bugfixes	10 years ago
Michael Peter Christen	32a2ff925c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	d07cdd8c3b	added SolrCloud access mode and configuration	10 years ago
Michael Peter Christen	8514bffc22	enhanced postprocessing status report	10 years ago
reger	b24572f304	fix GSA filter query assignment - use more parameter constants	10 years ago
Michael Peter Christen	b5fc2b63ea	removed exist() retrieval functions from error cache and replaced it with metadata retrieval from connectors directly. This should cause better usage of the cache. Automatically increase the metadata cache if more memory is available.	10 years ago
Michael Peter Christen	62c72360ee	cleanup of checkAcceptanceInitially in CrawlStacker, should avoid double-calling of solr	10 years ago
Michael Peter Christen	dd5cdfe212	reverted filter query hack, it did not work	10 years ago
Michael Peter Christen	b5d78ba156	reduced number of solr queries during crawling	10 years ago
Michael Peter Christen	5326970d6c	enhanced solr queries for single document extraction	10 years ago
Michael Peter Christen	525575bd97	added debugging of filter queries in thread dump thread names	10 years ago
Michael Peter Christen	f319ef268f	testing filter queries instead of queries to retrieve documents by id	10 years ago
Michael Peter Christen	fd87fa1613	removed more unnecessary exist-checks in ErrorCache	10 years ago
Michael Peter Christen	f2b476e08b	don't do a double check to solr for failed documents if they are not written to solr	10 years ago
Michael Peter Christen	06ab72d1af	enhanced crawler host round-robin strategy	10 years ago
orbiter	dab9a0786a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	51bf5c85b0	Renamed the transmission cloud to buffer in dispatcher since the name 'cloud' was a bad idea. Changed also the accumulation process for peer targets so that every dht chunk is not assigned the set of redundant targets but they are assigned to redundant targets individually. This enhances the granularity of the target accumulation and should enhance the efficiency of the process. Finally the dht protocol client was enriched with the ability to remove the 'accept remote index' flag from peers or remove peers completely if they do not answer at all.	11 years ago
Michael Peter Christen	a694b6a8fc	another fix for unique field computation	11 years ago
Michael Peter Christen	fb3dd56b02	fix for processing of noindex flag in http header	11 years ago
Michael Peter Christen	b0d941626f	fixed bugs in canonical, robots and title/description unique calculation	11 years ago
reger	d9472d043a	cleanup older unused classes	11 years ago
reger	665e12f88e	move startup time from old serverCore to switchboard (most used here) to make servercore eventually obsolete.	11 years ago
reger	336425912a	remove unused localSearchThread from SearchEvent	11 years ago
reger	32bd2a61c1	add local ip to AbstractRemoteHandler local hostname cache	11 years ago
Michael Peter Christen	f3a6b6e21e	fix for bad URL decoding	11 years ago
Michael Peter Christen	1092e798a5	fixed double content postprocessing	11 years ago
Michael Peter Christen	aee5b108e5	added linkScraperParser, a parser which ignores the text like the generic parser but extracts links like the htmlParser. This should be used for ASCII documents without known text format annotation like source code files or json documents. Probably also good for xml files without known schema.	11 years ago
reger	2b8cc5832c	fix seek error for 0 file size records file by add extra check for file size = 0 in cleanlast() - (http://mantis.tokeek.de/view.php?id=411)	11 years ago
reger	2ba394333f	fix Crawler HostQueue release of stackfile - close stackfile inputstream at end of ChunkIterator This should solve startup delay while unfinished crawl jobs exist (maybe also too many open file situation)	11 years ago
reger	40133ba2d0	fix NPE in Condenser, discovered by calling IndexControlRWI, "Word Deletion" with "for every resolvable and deleted URL reference"	11 years ago
orbiter	59160984cc	timeline performance update	11 years ago
orbiter	54bea96e67	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
Michael Peter Christen	841cc77391	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	e09218129c	remove check for local solr. This check was made during a time when Solr was optional and another alternative metadata store was available. Since that store is now removed, Solr is always available (internally or externally)	11 years ago
orbiter	2073e69034	fix for long periods in timeline	11 years ago
reger	1f94df29e7	fix NPE in solr rss where snippet contains only the title text and adjusted xslt, for solr snippets (&hl=true) to decode the xml encoded html <b> tag by adding disable-output-escaping (still open item description may be double as dc: tag and rss.description tag)	11 years ago
Michael Peter Christen	09dcdb9b19	update to solr 4.9.0	11 years ago
Michael Peter Christen	1cd4b2e8be	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	8c52f0651b	refactoring of AccessTracker events & timeline fix	11 years ago
reger	431a5f9c4e	added test case for TextSnippet, removed obsolete/unused parameter and reference to MediaSnippet	11 years ago
Michael Peter Christen	5b94a257ce	no timeout for large reference collections	11 years ago
Michael Peter Christen	f5b817bac4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	cb2c17d236	extract author and keywords in .doc and .ppt parser	11 years ago
reger	a5707cd2eb	enable proper Author navigator - author facet is based on omitted author_sxt field - adjust to make author nav available on exist of author field but keep using author_sxt to construct the facet (why!?) - add check for querymodifier author in searchevent	11 years ago
Michael Peter Christen	74206a10c7	refactoring	11 years ago
orbiter	fec673c9d1	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	4a66af716d	added apkParser stub (work in progress)	11 years ago
orbiter	c59da9fe7a	added access tracker log reader stub	11 years ago
reger	2d67f29244	adjust mergeDocument after parsing to - preserve charset and languages - fix merge of author	11 years ago
Michael Peter Christen	0d29b972cc	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	36e623d8bf	enhanced metadata enrichment for media file type search: - Web servers may now deliver YaCy-specific http header field with a title and keywords. The new http header fields are: X-YaCy-Media-Title - to be used for media (image, audio, video) titles X-YaCy-Media-Keywords - to be used for media (image, audio, video) keywords - both fields are written to document fields title and keywords and are searched also during image search. - to make the usage of arbitrary http header fields (including this new fields) possible in the /api/push_p.json servlet, a new POST argument is also introduced to push http header fields. The new POST attribute is named "responseHeader-X" (where X is the counter). It is allowed to use this attribute as multi-attribute several times, each can be filled with a http header line. - see /api/push_p.html for examples	11 years ago
Michael Peter Christen	49886fab08	enhanced debugging	11 years ago
Michael Peter Christen	b893c42a0f	bugfix for image search	11 years ago
Michael Peter Christen	c7995d3e2a	increased fixed limit for http POST request sizes to 100MB	11 years ago
reger	7847a93558	fix AbstractParser.singleList not adding null strings - prevents null titles in oo... parser (as detected by ParserTest) - correct ParserTest dc_description check (dc_description allowed to return 0 length array)	11 years ago
Michael Peter Christen	8acae852a0	write <em>-tagged texts also into the bold_txt field	11 years ago
reger	90c4576361	add a link to recrawl index entry to metadata html page - to allow manually renew index content for this url (e.g. in case it is a remote search result with metadata only) - use simply a QuickCrawlLink_p javascript snippet (minimalistic 1st solution)	11 years ago

... 3 4 5 6 7 ...

7610 Commits (f989f955dc377278806f5a74fae3bcdc297564e7)