yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	2a4d826d9e	adjust servlet RequestHeader.getLocale init jvm defaultLocale matching UI language	8 years ago
reger	9db68acb4f	remove obsolete X_YACY... header declarations not in use (no writes, only remove and try to read). Obsolete parameter setupHttpClient	8 years ago
reger	8e9aece786	more use of RequestHeader constant referer, authorization in Jetty9YaCySecurityHandler	8 years ago
reger	d631fbc019	make more use of the new ServletRequest interface methodes getScheme, getServerPort (in QuickCrawlLink_p & YaCyDefaultServlet)	8 years ago
reger	395f2e8946	Make ServletRequest implement the standardized HttpServletRequest interface, to make all readily available information from the original ServletRequest available to YaCy servlets (without converting data to internal structures). The implementation of the common interface allows easier integration of YaCy servlets with the servlet standard (e.g. shared login service with the servlet container etc.)	8 years ago
luccioman	74fec066f4	Converted more URLs to pure relative ones. Easier YaCy peer configuration behind a reverse proxy subfolder : no need for the reverse proxy to rewrite HTML links or URLs in css files. Tested on Debian Jessie with an apache2 reverse proxy. See related mantis issues http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
luccioman	0f0393e5e3	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
luccioman	7296e3884f	Switched even more URLs to pure relative ones. Thus a YaCy peer can run behind a reverse proxy subfolder without need for the reverse proxy to rewrite HTML links (a CPU costly operation). Tested on Debian Jessie with an apache2 reverse proxy. See related mantis issues http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
reger	49eae79c01	fix Tables.hasIndex check for tablename = key apply same functionality to hasHeap (to not create new table on call hasHeap)	8 years ago
luccioman	84b81c1af0	Switched more URLs to relative ones when possible. This permits an easier and more flexible reverse proxy configuration. Some related mantis issues : http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
luccioman	731684105a	Improved absolute URLs rendering in OpenSearch desc and RSS feeds. When the peer is behind a reverse proxy providing SSL/TLS encryption, the rendered absolute URLs should start with https when the user browser requested https : added limited support to the X-Forwarded-Proto HTTP header notably provided on Heroku platform. Also added some unit tests.	8 years ago
reger	669f60223e	upd Column.toString to output encoder "{bytes}" used for String and binary Column types	8 years ago
reger	c9e81d2fa0	fix Column parsing from celldefinition string, without cellwidth def. (outofbound exception)	8 years ago
reger	e0816ef2e5	use human readable date format in CrawlStacker error message "double in: local index, oldDate = "	8 years ago
luccioman	54d879a9b3	Generate HTML relative (to each peer) links from hosted WikiCode. When WikiCode inserted in a peer hosted Blog, Wiki, Messages or Profile contains relative links (images or any content, hosted in DATA/HTDOCS), it is more reliable to keep these links relative, especially when the peer is behind any kind of reverse Proxy.	8 years ago
luccioman	2da5f339f8	Fixed /News.html and /Wiki.html pages in Search Portal mode (issue #87 ). Also fixes theses pages rendering when the peer is not online. Re-factored code in common with /opensearchdescription.xml and ConfigPortal.html.	8 years ago
reger	8fe28a83f2	harmonize used lastmodified date for rwi and fulltext in storeDocument	8 years ago
reger	3d1d297308	refactor namespace navigator as part of navigatorplugin map, this allows the navigator to include counts all matches (rwi+fulltext). Fixing also unresolved_pattern in navigators title (of the counter) The use of inurl: query modifier as filter has not been changed keeping it as soft (unsharp) filter facet. Upd StringNavigator to prevent empty string form multivalued solr fields, removed date value conversion (better handled elsewhere, not need here).	8 years ago
reger	67f660523b	Make navigators underlaying indexfield name accessible in interface use interface in declaration and extend facet check to include navigator field.	8 years ago
reger	5eb3ee4e20	Add search navigator interface to allow for additional navigators (plugins) Prepared the first basic navigators (for authors and collections) for the list of SearchEvent.navigatorPlugins and adjusted servlet to use these. - this allows to configure display order of these navigators (by ordering config string) - eventually allows for additional and/or custom navigators using any available index field without need for changing servlets - the Collection navigation has been adjusted to exclude the internal, default robot_* and dht collections from displaying - rwi results are now also checked for navigatior by the refactored navi's So far no config options were added to customize or add navigators (may come later if route of upcoming modularization/plugin system is defined).	8 years ago
reger	fd3f58fcaa	improve query modifier parsing of "collection:" and possible collision with "on:" in case multiple collection modifier were entered (by mistake) http://mantis.tokeek.de/view.php?id=702	8 years ago
reger	af39a76bf6	Reduce number of default max. search navigator lines (from 10000) to 100 + make it configurable	8 years ago
reger	20a1b29ed3	add simple test case for ReferenceContainer helpful for debugging calculated ranking parameter	8 years ago
reger	3c7220bc7b	Refacture rwi reference word position and word distance calculation used for rwi ranking. Main changes: - introduce a posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access) - use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null - adjust assignments and the min() max() and distance() calculation accordingly	8 years ago
luccioman	f0639d810c	Customized name for Threads still using the default "Thread-n" pattern. This makes threads monitoring easier to read.	8 years ago
luccioman	db3b9db9c2	Crawl from local file : faster task end when manually terminating crawl.	8 years ago
reger	4c67ed3f8d	catch rwi ranking div by zero exception during rwi search result processing worddistance calculation is effected by concurrent update (normalization) of min/max ranking parameter for wordpositions. On update of min/max the exception is raised in distance calc and now catched. This concurrent update and change of ranking results is needed for speed but should be further checked for optimization	8 years ago
luccioman	47af33a04c	Advanced Crawl from local file : better processing of large files. Applied strategy : when there is no restriction on domains or sub-path(s), stack anchor links once discovered by the content scraper instead of waiting the complete parsing of the file. This makes it possible to handle a crawling start file with thousands of links in a reasonable amount of time. Performance limitation : even if the crawl start faster with a large file, the content of the parsed file still is fully loaded in memory.	8 years ago
luccioman	ee92082a3b	Updated javadocs : warning about closing stream responsibility.	8 years ago
luccioman	6f49ece22f	Fixed redirected URLs processing as crawl start point. See mantis 699 (http://mantis.tokeek.de/view.php?id=699) for details.	8 years ago
reger	68217465fe	div by null in word distance calculation (again, description in http://mantis.tokeek.de/view.php?id=698) as root cause was not seen, added just workaround reducing in favour over a try catch (for easier followup).	8 years ago
luccioman	7263d17436	Removed mentions of deprecated LURL-db. Thanks to LA_FORGE asking about if on YaCy forum ( http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5895 )	8 years ago
reger	8b74a6bf57	fix min/max calculation of WordReferenceVars.distance() Issue was the calculation in AbstractReference with positions.clear() call, this made distance result always 0 (distance needs min 2 positions) and created concurrency issues. + unit test of changes	8 years ago
luccioman	da362628fb	Added fine log level for too long blacklist matching processing.	8 years ago
reger	aaae7c6462	adjust ConcurrentScoreMap internal value map to interface and use parameter Long -> Integer (saves some bytes)	8 years ago
reger	31d2a5645e	remove obsolete query variable leftover from `8fb370d9f8 (diff-1d4259005ebfddc11083387857a86175)` harmonize ranking shift parameter to 0xFF correct addresult weight parameter to long	8 years ago
luccioman	a588ed7628	Applied image headers customization to the new ViewFavicon servlet.	8 years ago
luccioman	7717a3d43d	Fixed license headers on files created to improve favicon management.	8 years ago
luccioman	6e1959f469	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Conflicts: htroot/yacysearchitem.java source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java source/net/yacy/search/schema/CollectionConfiguration.java source/net/yacy/server/serverObjects.java	8 years ago
reger	685d8e86bf	Avoid frequent data type casting (float/long) for rwi score refactor to using long in URIMetadataNode too (and related call parameters) As remote rwi score's are not used (since v1.83) skip reading float-score , but keep in toString() for communication with older versions.	8 years ago
luccioman	3ccd89e274	Fixed MultiProtocolURL.resolveBackpath to handle remaining '..' segments	8 years ago
luccioman	4b699c469a	Blacklist refactoring : extracted a function for easier unit testing	8 years ago
luccioman	54cfcc3f56	CrawlCheck_p.html : also display info about disallowed URLs.	8 years ago
luccioman	8b341e9818	Robots : properly handle URLs including non ASCII characters This fixes GitHub issue 80 ( https://github.com/yacy/yacy_search_server/issues/80 ) reported by Lord-Protector.	8 years ago
reger	e68b00678e	prevent negative score on URIMetadataNode - in the special case were no solr score is supplied. + assert before use & test case	8 years ago
luccioman	242707f9b4	Fixed loadFromCache with strategy IFFRESH. This fixes mantis 695 ( http://mantis.tokeek.de/view.php?id=695 ) : crawl start with 'Link-List of URL' option on websites using cookies.	8 years ago
reger	b752bcfecb	adjust date in text detection to ignore some program version strings like "3.1.2.0102" see http://mantis.tokeek.de/view.php?id=650 + expand test case	8 years ago
reger	b017e97421	optimize condenser language detection a little. langdetect probabilities take letter case into account, add words from description and anchors etc. as is. + add it to javadoc	8 years ago
reger	ae3717d087	adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! ) + remove unused sentenceword map (we use only the count) + upd test case for sentence count	8 years ago
reger	474f0476c6	adjust Tokenizer sentence count on trailing text after last recognized sentence + upd test case for rwi multi-word-query (leaving results known to fail untested)	8 years ago
reger	3861ac9293	upd maven dependency-check plugin to reflect changes of https://nvd.nist.gov + upd unknown ant script with current lib/jsch version	8 years ago
reger	681a61dafb	adjust rwi index result word position handling used for rwi ranking - correct WordReferenceVars.toRowEntry posintext parameter to set expected min posintext (the difference is on multi-word queries, while positions are ordered by search word order). - modified posofphrase/posinphrase join operation - to set min posofphrase - and keep posinphrase if not same posofphrase (was set to 0, no differentiation during ranking) + fix compiler msg (missing type declaration)	8 years ago
reger	14f7577231	add support for older Word versions (Word6/Word95) to docParser	8 years ago
reger	1a79c64495	generalize DateDetection with holiday date rules readily available in icu to make sure current dates are recognized (was fixed to 2014 - 2016) + adjust holiday date parser from pattern.match to pattern.find to deal with leading and trailing text + moved relative date recognition (morgen, tomorrow) to parseline (used by query parser only), as not working and problematic for indexing + add test case for parseline (used by query parser)	8 years ago
reger	6f68f08354	correct DateDetection Silvester date add Thanksgiving	8 years ago
reger	32a2e3a22a	have RSSFeed.getChannel return empty message on missing channel element, a) required b) prevent NPE in rss servlets + add test	8 years ago
luccioman	8d57b5b970	Added some javadocs.	8 years ago
luccioman	60df09fff9	Fixed some HTML validation errors : Illegal character in query Now encode space characters in URLs query part.	8 years ago
reger	862f28eaa6	display number of documents/rss-items for label "docs" in load_rss_p servlet (as replacement for the rarely used "docs" rss-tag for a url to the rss-specification)	8 years ago
luccioman	dcdea2d02f	Fixed shutdown for crawler.MaxActiveThreads value greater than 200 Shutdown was hanging in CrawlQueues.close() at this.workerQueue.put(POISON_REQUEST) when config value crawler.MaxActiveThreads was greater than 200. Revealed by "Collision" Threads dumps in mantis 689 (http://mantis.tokeek.de/view.php?id=689#c1312) Fixed consistency between this.worker.length and this.workerQueue capacity, and made the process more reliable using non-blocking offer() function.	8 years ago
luccioman	d286ba2c3e	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
luccioman	b8f6458152	Prevent yacy main thread from hanging on browser opening process. First fix for mantis 689 (http://mantis.tokeek.de/view.php?id=689). On Debian Linux, with a headless jre and no open browser, browser.openBrowserClassic() was called and waited forever the browser process end (p.waitFor()). YaCy shutdown was therefore not working until the browser was closed. Also modified browser opening command for Unix platform to open the default the browser (with xdg-open util) instead of Firefox. xdg-open also has the advantage to be asynchronous (not blocking).	8 years ago
reger	70e1eb30a5	prevent StringIndexOutOfBounds in getLocalFile() + tighten patching of DOS path w/o protocol to drive "LETTER":	8 years ago
luccioman	1bb0b135ac	Avoid duplication of various MS Windows file URLs flavors Fix for mantis 692 (http://mantis.tokeek.de/view.php?id=692)	8 years ago
luccioman	b9a8476f02	Removed unused import	8 years ago
reger	e73c1eea8c	remove unused rootpattern, leftover from commit `9a5ab4e2c1 (diff-d2b184283abed53ae260fc9eabdaef40)`	8 years ago
reger	6f8c3ccea4	improve url hash computation for file path with mixed java & windows file.separator to compute equal hashes (by normalizing path for computation) + expand test case for to check mixed java / windows file url notation like e.g. file:///c:/test/file.html vs. file:///c:\test/file.html - relates partially to http://mantis.tokeek.de/view.php?id=692	8 years ago
reger	efcb6a1e74	fix supported mime XML -> xml for rssParser (mime normalized to lower case for comparison) + add mime text/xml as in use for rss in the wild	8 years ago
luccioman	b3b75b0498	Accessibility : add a customizable alternative text to YaCy log Applied W3C recommendations : https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image and https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems	8 years ago
luccioman	f2bc1b268d	Updated URL fragment validation rules according to current standards See RFC 3986 (https://tools.ietf.org/html/rfc3986) or URL living standard (https://url.spec.whatwg.org/)	8 years ago
luccioman	b1b8e69da8	Fixed NullPointerException cases	8 years ago
luccioman	3ee4f56c39	Improved ErrorCache behavior when switching networks Even after network switch, ErroCache was still holding a reference to the previous Solr cores, thus becoming useless until next YaCy restart. Initial error cache filling with recent errors from the index was also missing after the swtich.	8 years ago
luccioman	7d5ba2afa4	Added some JavaDoc and moved crawlStacker close at the right place.	8 years ago
luccioman	8edbcd8ad4	Log eventual Solr instances close errors. We do not want to block on this kind of error, but this should not silently fail as it may have later consequences.	8 years ago
reger	330768c8a2	fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686 The embedded core holds a lock on the index and must be closed. Earlier commit comment states that core should be closed with solr instance instead on close of connector. Adjusted the InstanceMirror.close() to take care of closing the embedded instance to release the lock. In 2 routines of fulltext this was already explicite implemented (disconnectLocalSolr). Now this disconnect is part of the InstanceMirror.close().	8 years ago
reger	585d2a6441	test case: for NewsPool to check the id modificator (for unique id) and observe the distribution order .. hands on. + add test/DATA to gitignor	9 years ago
luccioman	de5c873e38	Removed unused JavaScript file docs.min.js This file is used by Bootstrap documentation website (http://getbootstrap.com/) but is not part of the Bootstrap distribution and has not be included in a Bootstrap based application.	9 years ago
Michael Peter Christen	df51e4ef07	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	e063aaf97f	enable fuzzy search, solr style (append a ~ to get a fuzzyness on the word)	9 years ago
reger	ff6589fc0f	test case: simulating multi word query for local rwi index Purpose of the test case is to be able to (controlled) analyse the rwi ranking for multi word searches (with focus on posintext and word-distance ranking)	9 years ago
reger	e990297d2e	avoid NPE on hello message with missing "yourip" key http://mantis.tokeek.de/view.php?id=684	9 years ago
reger	e51ab8c7aa	hack to generate a unique message-id for messages created in the same second by optionally add a 1 second offset counter to the current time (which is used as the unique id part)	9 years ago
Michael Peter Christen	b82300358a	removed version number check because it does not work any more if version numbers are expressed in a different way as we expect. That could cause that YaCy does not run on systems which are appropriate but we simply do not understand the version string.	9 years ago
Michael Peter Christen	2107674999	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	0d28f563f4	fix for java version "9-ea"	9 years ago
reger	3b694b3935	add some javadoc to rwi wordreference distance, position to remember facts for http://mantis.tokeek.de/view.php?id=683 Init missing word position to 0 like in other non text body words	9 years ago
reger	a4465c97d6	as requested, disable/remove old swf parser http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5861#p33098	9 years ago
reger	7f63fc50f3	prepare a IndexSegment test case for RWI index testing + prevent NPE in Segment.clear() on missing embedded solr instance.	9 years ago
reger	96467c5467	remove not needed counter in Tokeninzer (completing last changes) including a small change, word posintext counting. We remember/store 1st posintext. Previously following words got a handle (posintext) excluding found. Now it just counts and assigns true posintext as handle (posintext)	9 years ago
luccioman	d66b0f7b7b	Fixed french messages encoding in YaCy tray. Also added the missing french translations.	9 years ago
reger	7efb66ee10	adjust the WordReference.join wordsintext calc to take the max (instead of sum) The reference is for the same url (add same for title and phrases). + del redundant join() procedure	9 years ago
luccioman	0a9ff14d96	Fixed NullPointerException case and added Javadoc	9 years ago
luccioman	06d4f93d03	Merged master into postprocessing branch	9 years ago
Michael Peter Christen	b73d2db914	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	25a3c7a6d0	catch exception and write end of object	9 years ago
reger	272cdd496a	reactivate sentence counter in WordTokenizer for phrasepos ranking, by counting punktuation (delivered as 1 char word) again.	9 years ago
Michael Peter Christen	5e165a8150	removed unused imports	9 years ago
Michael Peter Christen	c716648c78	enhanced json encoding of strings	9 years ago
Michael Peter Christen	6139bd85a8	fix for broken facet names	9 years ago
Michael Peter Christen	5060f9fee9	fix for too long snippets	9 years ago
Michael Peter Christen	8681cee3f3	fix for bad comma	9 years ago
Michael Peter Christen	db6d8fc197	fix for bad json	9 years ago
Michael Peter Christen	8f4a341735	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	9934f546bb	added default fl to solr query, removed large texts retrieval and changed snippet to description tag if no other description is available	9 years ago
reger	120bf7e6e2	implemented RWI WordReference to return the word position value (was always left empty) This is needed and enables existing word position ranking for RWI. The upcoming concurrency issue in word position min/max calculation were eliminated by iterator.hasHext check before next() access.	9 years ago
reger	e310ec5f70	fix posInText ranking calculation to score 0 on no position info + fix Word posInText calc in Tokenizer to start with 1 + test case	9 years ago
luccioman	74f9927ddc	Merge remote-tracking branch 'origin/master' into dist_macOS	9 years ago
reger	51c077f493	adjust the getTopics() and getTopicNavigator() to current useage - move the maxcount limit restriction completely to getTopicNavigator (as there not used in getTopics) - let search servlet use getTopics by default (w/o RWI connected check, as of now, Topics are available w/o any additional index interaction)	9 years ago
reger	39dd244693	fix ConcurrentScoreMap.set() calculation of totalCount() + test case	9 years ago
reger	ebf818ad95	log a error on aborted news publish (due to duplicate news.id) + change printed err msg to log entry in PeerAction.processPeerArrival	9 years ago
reger	cc2d9dd3f1	reactivate the use of included-in-topwords boost in postRanking + changed the postRanking to add one score only if word appears more as one time. + getTopics() unused code block rem'd (save performace)-> routine needs rework !	9 years ago
luccioman	39ea28adfd	Merged master to dist_macOS branch.	9 years ago
luccioman	8255e91c99	Fixed serverClassLoader.findClass method htroot is a supposed to be a subfolder of appPath and not of dataPath, as assumed in other places where htroot is loaded. This issue was not visible when dataPath and appPath are equals.	9 years ago
reger	6801673a07	apply postranking media search boost only on media queries	9 years ago
luccioman	1dc4306058	Fixed indentation for better readability.	9 years ago
luccioman	8c49a755da	Postprocessing refactoring Added Javadocs to refactored methods. Added log warnings instead of silently failing some errors. Only fill collection1hosts when required ( shallComputeCR true).	9 years ago
luccioman	42f45760ed	Refactored postprocessing For easier understanding and performances profiling.	9 years ago
reger	4386e84b55	correct NewPool rentention calculation (was still clearing everything after one day)	9 years ago
reger	5e72d37f0a	TransNews_p: add ad-hoc translation of target file on positive vote (additon to local translation) + errmsg on language=default	9 years ago
reger	9462a32244	Added news service for easy, community driven UI translation support. New or modified translation (via /Translator_p.html) can be shared/distributed via the YaCy internal news service. Remote peers can see and vote on the translation via the new http://localhost:8090/TransNews_p.html servlet. A positive vote will add the received translation to the local translation list and post a voting message to the news service. (at this no processing of received votings is implemented) + fixed the msg service retention time check (NewsPool.automaticProcessP)	9 years ago
reger	f8d6543a23	Rename class CreateTranslationMaster to TranslationManager and add additional routines and the capability to handle translation maps internally (to reduce complexity of handling translation maps for calling servelets)	9 years ago
reger	19b4509d54	speed-up reading of xlif language file, by using xmlparser (stax) instead of jaxb making xliff-core-1.2-1.1.jar obsolete	9 years ago
Michael Peter Christen	e1fac86f53	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	a9316ceff6	force browser-caching of favicons from search results	9 years ago
Orbiter	503312ca43	Merge pull request #61 from luccioman/heroku_experiments Deploy YaCy on Heroku	9 years ago
reger	33bf35d90f	missing file for prev commint "Introduction of additional language setting browser"	9 years ago
reger	16e8ed3f01	Introduce additional language setting "browser/Browser Language" for UI internationalization. If language is set to "browser" the client/user browser language is used to choose from available translation. simply: one users browser speaks English -> YaCy responds in English, other users browser speaks French -> YaCy responds in French. ! To make a translation/language available you have to activate the language once ! (or manually use the utility class TranslateAll) In ConfigBasic.html availabel translations are marked green on setting language=Browser The client language is determined by http header Accept-Language (checked in DefaultServlet)	9 years ago
reger	3b47a07dd1	change unused servletProperties entry CONNECTION_PROP_CLIENT_REQUEST_HEADER to use directly HttpServletRequest. This is used to get the http protocol version in HTTPDProxyHandler.fulfillRequestFromWeb() for error response to client. - adjust YaCyProxyServlet and UrlProxyServlet accordingly - use more http_version constants in headerframework and httpdeamon - equalize servlets (3) use of HeaderFramework.CONNECTION_PROP_HOST to HeaderFramework.HOST	9 years ago
reger	036c1dc6ef	fix CookieTest_p formatting (output of <br> as text), change to dataoutput only by servlet, leave formatting to html. + removed link to obsolete env/grafics gif	9 years ago
Michael Peter Christen	bf6709d196	fixed missing browser activation in linux	9 years ago
Michael Peter Christen	d8504418b6	enhanced browser-caching of static content	9 years ago
Michael Peter Christen	079112358c	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	efeb592661	don't do solr optimization, this create high IO load. We should leave this task to solr to do that on it's own instead of forcing it.	9 years ago
luccioman	46b8836548	Copy image resources contained in donation iframe. Handle eventual images loading errors.	9 years ago
reger	4c7a77662a	eleminate dependency on file-extension in storeDocument but use supported mime-type to also support handling of urls w/o corresponding file-extension. For this refactor use of document.getParserObject() to alway return a Parser (for clean logic) and define/move the scraperObject as local var of AbstractParser. Adjust related calls to getParserObject (where actually a scraperObject is wanted). Addionally skip appending url token to parsed text for dht metadata entries (by default returned as result by rwi index).	9 years ago
reger	ebde21079a	refactor xlsParser to include Excel file attribute (like author) in parser result doc. Similar to ppt and doc parser, completing a TODO in xlsParser.	9 years ago
luccioman	744c9a2615	Opensearch desc : handle https protocol url with default port (443) This completes modifications made for mantis 669 (http://mantis.tokeek.de/view.php?id=669)	9 years ago
luccioman	b9c28893ee	Merged master to 'heroku' branch.	9 years ago
Michael Peter Christen	103a8348b3	fix for NPE and small performance enhancement	9 years ago
reger	2910fe35c1	add missing scheduler calc of next exec_date (call of calculateAPIScheduler) - after last_exec_date is altered, next_exec_date should be recalculated - makes the recalculation of next_exec in advance (without api call surely made) in Switchbard.schedulerJob() obsolete Slightly modify next_exec calc. on missed event to now+schedule_time (from fix 10min)	9 years ago
reger	70d47ae38a	keep scheduler selection by repeat entry from `07311020d4` to allow exec schedule on actual exec event. Iterate on exec date (of advantage after interruption/shutdown) to schedule older or missed events first.	9 years ago
reger	7c3f932e5d	revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)	9 years ago
reger	07311020d4	postpone apicall exec date init until actual call fix for http://mantis.tokeek.de/view.php?id=677 The difference is on scheduling a large number of rss feeds and loading is not finished before shutdown of YaCy. The change makes sure not already loaded RSS will be loaded by the scheduler on next startup.	9 years ago
reger	5e335b32da	fix Blacklist.contains() matching path pattern to string similar to `5e9e871192` + add proof testcase	9 years ago
reger	5e9e871192	fix Blacklist.remove by using pattern.toString to find pattern to remove, parameter String path did never equal Pattern. + delete unused removeAll, as it does not persist changes after restart	9 years ago
reger	1843ea7e69	on Blacklist.add pattern to source file also update internal entry maps as in Blacklist.add(blacklistType) to make entry effective w/o restart fix for http://mantis.tokeek.de/view.php?id=676	9 years ago
reger	bf6ce33da3	Correct use of _htDocsPath config in YaCyDefaultServlet to use servlet config variable + add some javadoc and remove a not useful static declaration	9 years ago
luccioman	480027ec98	Merge remote-tracking branch 'origin/master' into heroku_experiments	9 years ago
reger	fcad2d0744	add uses of config constant INDEX_RECEIVE_ALLOW	9 years ago
reger	226f81cfcf	declare poison pill url MultiProtocolURL() as protected to make sure not used from outside. After double checking use of poison url revert path init from commit `f8632ad292`	9 years ago
reger	f8632ad292	prevent string index out of bounds MultiProtocolURL.getPaths as path maybe a empty string + init path to "" also in init for poison url (to guarantee success for all existing uses of path w/o check for null)	9 years ago
reger	35a7d57260	update lucenematchversion to current (5.2.0 -> 5.5.0) there should be no need for reindex by the update	9 years ago
reger	9b07bbf955	deprecate newurl(), not used and already replaced instead of making it handle all supported the protocols	9 years ago
luccioman	47d486298f	Merged changes from master.	9 years ago
reger	774b3906a9	fix GenericFormatter.parse ("time","timeoffset") change: UTC offset internally expected in minutes	9 years ago
reger	27163af0e1	improve detection of referenced links by taking http and https link protocol into account + correct query start detection of commit `f89d4eb51d`	9 years ago
reger	f89d4eb51d	fix MultiProtocolURL init (assign of host) for urls with '/' in query part + add to test case	9 years ago
reger	87fcfc6d78	Adjusted hash computation and toNormalform for file:// protocol to deliver same hash same file on Windows filesystem path with forward- and backslash in path. Background see http://mantis.tokeek.de/view.php?id=671 +Test case	9 years ago
luccioman	d6bf90803f	Merged from maain master branch.	9 years ago
luccioman	9b9c112263	Handle more propertly local port configuration by system property And prefixed property with "net.yacy" to avoid ambiguity.	9 years ago
reger	3811184abd	fix GSA servlet clientIP retrival	9 years ago
reger	7ab41d4ff1	use directories original lastmodified date in file- & smbloader in response	9 years ago
reger	708bcbb042	one more replacement to use cached hosthash vs. calculated	9 years ago
luccioman	b57a06d88e	Let Heroku decide which http port to use	9 years ago
reger	22db449f2a	to prevent crawler to concurrently access and alter same crawl queue after restart, put hosthash in queue's filename (which is used as primary key for crawl queue. Hint: initial hosthash from url and recalculated hosthash from just hostname:port are not the same. fixes http://mantis.tokeek.de/view.php?id=668 (partially)	9 years ago
luccioman	893a40995a	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	9 years ago
Orbiter	50c5ddf1a1	Merge pull request #56 from luccioman/LibreJS LibreJS compliance : YaCy JavaScript license information	9 years ago
Michael Peter Christen	7466d390b2	small refactoring + do not accept too old peers during bootstrap	9 years ago
luccioman	6e96c7341a	Merge remote-tracking branch 'origin/master' Conflicts: htroot/Load_MediawikiWiki.java htroot/Load_PHPBB3.java htroot/ViewImage.java	9 years ago
reger	8d58a48029	remove wrong log line in CrawlSwitchboard + don't allow CrawlSwitchboard to exit application making network param unused	9 years ago
reger	5aaa057c65	ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read. equalizes behavior with getListString() improves: case were blacklist file contained a undesired empty line, not fixed by blacklist-cleaner.	9 years ago
reger	41c36ffd75	exclude rejected results from result count (by using the resultcontainer.size instead of input docList.size) skip waiting for write-search-result-to-local-index (by removing the Thread.join - which will bring a small performance increase)	9 years ago
reger	d4da4805a8	internal wiki code, require header line to start with markup (to allow something like "one=two" as text) + incl. test case	9 years ago
reger	e952e355a2	have Translator servlet adhoc apply added translation by translating a single file + fix NPE in Translator, coming from translation read by TranslatorXliff which allows null content for not translated key's	9 years ago
reger	b119ff65be	clean out not used Switchboard variables counter indexedPages, const xstackCrawlSlots	9 years ago
reger	223071337b	Translator to take caution of word boundaries to identify text portion to be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it" translation of partial terms/words. Add check of word boundary before and after sourcetext (incl. take care of current praxis for key to be delimetered by > < + add test case	9 years ago
luccioman	009657791e	Merge remote-tracking branch 'origin/master' into LibreJS	9 years ago
luccioman	a73c9327a5	JavaScript License fixes for LibreJS compatibility	9 years ago
reger	0c40401d28	fix MessageBoard test for null data	9 years ago
reger	5b22c63030	Adjust TranslatorXliff to load default 1st and merge downloaded or modified local translation. process 1. load default from locales/. 2. load and merge(overwrite) from DATA/LOCALE/. (can be partial translation as it is merged) - include all entries from DATA/LOCAL to be edited in Translator servlet and save just modifications (instead of full list) to DATA/LOCALE This shall make it easy to share modifications.	9 years ago
reger	a2e0f00456	optimize Translator - translateFilesRecursive: load translation once (reduce io), return true on complete success - remove resulting unused translateFiles() variant - translate: use StringBuilder parameter (skip toString conversion) - remove not needed static declaration - upd some javadoc	9 years ago
reger	a6ba1faa80	introduce a translation edit servlet Translator_p.html YaCy's UI text translation This is the 1st rudimentary approach to support the translatio utilities. It allows currently to edit untranslated text and save it in a local translation file in the DATA/LOCALE directory. + refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine + adjust TranslatorXliff to check for local translations in DATA/LOCALE, this includes storing manually downloaded translation files in DATA as well (to keep default untouched) + on 1st call of Translator_p a master tanslation file is generated, checking the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages) Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.	9 years ago
reger	b3c9041f79	remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames to free unused resources	9 years ago
reger	bd8f7c11f5	Use transparent addToCrawler in AutoSearch instead of addToIndex This would likely also be of advantage for RSS import/schedule as following bug-reports suggest http://mantis.tokeek.de/view.php?id=569 http://mantis.tokeek.de/view.php?id=655	9 years ago
reger	f23d8ab47b	fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP() returning null in intranet mode (in servlets: ConfigSearchBox, Load_PHPBB3 +remove unused (const ∅) seed.IPTYPE	9 years ago
reger	bb0076c3dd	fix: assure close inputstream in TranslatorXliff after reading xlf file by using try-wiht-resource block	9 years ago
reger	6384b7d82e	fix NPE in Load_MediawikiWiki servlet in intranet mode - in intranet mode getip returns null causing a NPE - adjust starturl (which was set to http://localip/repository) which is never the start url for the Mediawiki + correct javadoc for seed.getIP()	9 years ago
Michael Peter Christen	596b5dfa59	add the JRE version in the seed. Purpose: identify if it is possible to migrate to new JRE version	9 years ago
reger	4cc38e979d	add InputStream close after reading input file (Vocabulary_p servlet)	9 years ago
reger	6bf9c55584	adjust Solr select servlet to lates bugfix for boostquery (bq param) to split query into multiple parameter on line separator in input query. e.g. split "crawldepth_i_0^10.0 \n crawldepth_i:1^5.0" but allow "url_file_ext_s:jpg OR url_file_ext_s:png" to be unsplitted	9 years ago
Burkhard	9a18e2297b	Merge pull request #51 from JeremyRand/multiple-boost-query Fix multiple boost queries	9 years ago
reger	f0d7b93372	make use and activate autodetect charset in Vocabulary input from file + revert mistake of empty cn.lng	9 years ago
JeremyRand	433217b33e	Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)	9 years ago
JeremyRand	58824dfa6c	Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.	9 years ago
reger	9e94989237	upd to PDFBox 2.0.1	9 years ago
reger	d0a571bed2	del cytag trail for own index.html (save resource not used by default)	9 years ago
reger	de46879637	fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut)	9 years ago
reger	24b0fa2a38	extend snapshot Html2Image.pdf2image to use PDFBox image export capability if no external tool installed (and for Win) Resulting jpg are not always perfect (if graphic included) but imho sufficient.	9 years ago
reger	eb2a00b1d8	fix NPE on missing crawldepth_i	9 years ago
reger	efb9f1a8b7	save resource for unused blacklistFiles map	9 years ago
reger	5f113be760	cleanup connectPeer & yacyVersion.latestRelease usage obsolete since `527b3decde`	9 years ago
reger	7097dcbdbd	cleanup hack for partial Solr update on multivalued datefields has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050	9 years ago
reger	f10ea3c155	clean-out unused SwitchboardConstants	9 years ago
reger	ef24593347	delete obsolete SEARCHRESULT busythread constants not used since 29.05.2013 18:27:27 `0c1a018bbd`	9 years ago
reger	125b5e26a5	apply bugfix for ChartPlotter from Pullreq 42 https://github.com/yacy/yacy_search_server/pull/42 thanks to otteresk (https://github.com/otteresk)	9 years ago
reger	06ce9ae711	prevent "unchecked conversion" compiler message + include "translate" property in xlf "trans-unit" export	9 years ago
reger	b4a576dbdf	exclude unused protocol param "duetime" (receiver interpretes param "time" only)	9 years ago
reger	3bd6ae8d8b	keep addon/Notepad++ keyword marker on lng export (length of remarks devider line) + harmonize status_p.inc lng text	9 years ago
reger	16837d60c7	fix version in locale version file (it's compared to full version)	9 years ago
reger	0fb01e429e	fix migration, account for ssl port in config (for auto-disable https)	9 years ago
reger	7be1c7a05a	fix logger name	9 years ago
reger	1d940e5a94	upd commons-compress 1.11	9 years ago
reger	7789c32c82	delete crawl queue on init exception (happens occasionally on path name vaiolation and will never get resolved)	9 years ago
reger	f781b9dd47	revert call condition f. migration.installSkins (a bug introduced in `fb8ae14b21` , see comment on that commit )	9 years ago
reger	3adb670f44	remove never used Domains.myHostNames set	9 years ago
reger	6ecc180299	fix rwi doubledom return best (highest) ranking	9 years ago
reger	2343e3f1cd	keep and update existing xlf translation master instead of create new in utility CreateTranslationMasters + small fixes in lng's	9 years ago
reger	a1935f485f	Added utility class CreateTranslationMasters to create a language independant translation master as source to harmonize individual translation files Included a main to create masters in YaCy an xliff format for testing + restrict TranslatorXliff to use only entries with State=translated P.S. used https://open-language-tools.java.net/editor/about-xliff-editor.html to experiement with xlf output (haven't a Pootle avail.)	9 years ago
reger	acaf51b296	keep ConfigLanguage_p as 1st entry in exported translation file + rem untranslated text & some typo fixes in several translations (considering to create a translation master file to harmonize entries)	9 years ago
reger	61c5b6b403	fix empty drop down list in ConfigLanguage after wrong/empty download + add xliff translated attribut + append japanese lng name	9 years ago
reger	4eddabee42	translate Network History screen -> de + remove leftover debug line	9 years ago
reger	90c79014ae	remove unused translator routine which also doesn't handle rel path input + correct some language file match issues	9 years ago
reger	902e79e261	Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map. This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649 Allows longer term also to store translation maps for the htroot files in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ). + added test case creating and comparing xliff file with internal custom prop file. (currently the introduced class is not used in core code)	9 years ago
reger	d9adc2c255	load handler for Transparent Proxy on startup only if feature is activated to save the resources and keep handler chain small if the feature is not used. +add a warning message on settingsack_p page to restart on first activation	9 years ago
reger	ec24a0c85a	add test case for optimized toTokens()	9 years ago
reger	cada24f918	adjust utility ListNonTranslatedFiles for path compare on windows (backslash replace)	9 years ago
reger	fb8ae14b21	make migration version safe	9 years ago
reger	258cd41577	reduce logging (EmbeddedSolrConnector.query) mainly to reduce the frequent metadat checks like > EmbeddedSolrConnector.query QUERY: q={!cache=false raw f=id}xXxXxX&rows=1&start=0&fl=id,load_date_dt (p.s. direct servlet queries logged via AccessTracker.addToDump)	9 years ago
reger	6783ef5540	move example code SearchClient out of yacycore package to example directory	9 years ago
Michael Peter Christen	b89465d952	0N - basic dump upload servlet infrastructure, to share index dumps within an experimental new sharing model	9 years ago
Michael Peter Christen	f12a900f3e	harmonization of http post of files for one and several files - this had been differently - and wrong for several files. also: base64-encoding for gzipped push files because our data structures currently only supports ASCII POST pushes..	9 years ago
Michael Peter Christen	849ab671a9	0n: modified the p2p bootstraping process - rules had been too tight and did not support the re-start of a network with just one principal peer.	9 years ago
reger	764f5100f0	fix delete of temp file after odt % ooxml parser Close zipfile after parsing	9 years ago
reger	379e9b330d	use supplied url port to get robots.txt in crawlers hostqueue	9 years ago
reger	58a959403d	fix mixed logfactory in UrlProxyServlet, Class doesn't use functions of declared ancestor, change to extend on httpservlet	9 years ago
Michael Peter Christen	2494a820c7	0N - added recording of dump exports if given time frame is not negative	9 years ago
Michael Peter Christen	ef2cc4f690	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	a6bf0b1649	0N - added option to generate index export files for a specific number of minutes in the past and reverted latest change. The export file dump will now contain four data elements: f - first date of index entry write date, l - last date of index write date, n - now-date of index dump time, c - count of numbers inside the dump. '0N' denotes a series of changes which will lead to the opportunity to exchange index data dumps in a way that is needed to integrate ZeroNet index data. This will be based on index dump sharing; that causes this commit.	9 years ago
reger	6d56beaed8	fix assertion exception in toString of MultiProtocolURL toString of AnchorURL and MultiProtocolURL are identical code (no need to override or to protect call to parent) as reported in https://github.com/yacy/yacy_search_server/issues/43	9 years ago
reger	42a7bdb2af	fix SolrSelectServlet authentication to default to true	9 years ago
reger	dbb28bb4f3	del unused statistic parameter (from status servlet)	9 years ago
reger	06d0e2aeb9	result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. - Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).	9 years ago
reger	caf9e98f09	put metadata dc_publisher in corresponding schema field	9 years ago
reger	38e2b054d4	remove servlet classloder internal cache map (to save the resources, cache hits marginal) - DefaultServlet includes already a class cache "templateMethodCache" which is emptied on low mem status - avoid classloader cache gets has no hits but over time holds all (used) servlet classes	9 years ago
luc	3f338777f7	Also check and index eventual icon url information from metadata.	9 years ago
luc	9f712146df	Display icons in ViewFile "links" mode.	9 years ago
luc	26f1ead57c	Created ViewFavicon class specialized in favicon viewing. Main image processing is now in ImageViewer, used by both ViewImage and ViewFavicon. Fixed URIMetadataNode.getFavicon to use non-standard icons with no size ass fallback.	9 years ago
reger	6f0b073bf3	override detected language (statistic langdetect) only with TLD determided language if langdetect probability is not high. + additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh used by YaCy	9 years ago
reger	b65e2b527d	include use of condenser's content text for language detection. Language identification may show poor performance on documents with short or no title but clear lang indication in text content. Using content text too improves lang detection. + remove double caching of text in Identificator	9 years ago
luc	07222b3e1a	Added favicon url transmission in RWI chunks.	9 years ago
luc	480772c070	Fixed json search results from commit "Improved URLLicence reliability"	9 years ago
reger	937fbb0b9f	correct isHidden() for smb from last commit	9 years ago
reger	535d4bf75f	respect hidden attribute for file and smb directory listing (hidden directories are not listed, effects crawling of local file system)	9 years ago
luc	3cc5619d93	Improved HTML icons indexing and rendering in search results. See http://mantis.tokeek.de/view.php?id=629	9 years ago
luc	edef6cd0dc	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	c28142095a	add findClass() to servlet class loader (used in YaCyDefaltServlet) In the 2 cases where servlet calls servlet the jvm classloader chain is invoked and servlet class loaded by jvm loader (successful while requiring htroot in system classpath). This patch uses the standard override design for loaders to handle these cases (making in not longer crucial to have htroot in system classpath, as this classLoader is mainly used for servlets and looks in this case for the class in the configured path). + As the default classloader is parallelcapable we should register this too.	9 years ago
luc	f7b854465b	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	a6617ad887	expand initRemoteCrawler() to terminate worker threads if called to deactivate remote crawl. On startup we save the resources for remote crawler if disabled. Once started threads are running idle after disable remote crawl. Now threads are terminated to save the resources also while disabeling during runtime. + remove empty class Channels	9 years ago
reger	2048b7e057	support scraping start-/enddate from html tag with property "datetime" This may be used in html5 <time> tag (which we don't explicite support yet for date in content scraping).	9 years ago
reger	900d4584ba	complet resource cleanup of lists in contentscraper's close()	9 years ago
luc	aa60ad1dbc	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	1f18653de0	pass parsed swf content trough htmlscraper Swf may contain subset of html tags which shoul'd appear as text. Especially <font> tag may totally screw up metadata servlet if not filtered out.	9 years ago
reger	18ecf57792	add support of compressed swf to swfParser from JavaSWF2 (source compatible to WebCat). Moved swf file signature check to parser Changed use of synced vector to list swf InStream	9 years ago
sixcooler	5cb7ba0dc4	fix for connections not getting closed to get favicon.ico during seach	9 years ago
luc	ef83e34b8a	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	ed3e16e092	apply remote result count config value to Bookmark Autosearch + prepare to make the widely unused Bookmark feature optional	9 years ago
Ryszard Goń	a98c395023	Add the Autocrawl thread	9 years ago
Ryszard Goń	1728cd30c6	Create autocrawl profiles	9 years ago
luc	41767a01c2	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	ff27824964	fix swfParser reading file signature before passing to library (current version expects data w/o signature)	9 years ago
luc	7aa1a29e33	Return more accurate HTTP status 400 with detail message when some error occurs on ViewImage : - missing required parameters - url licence invalid	9 years ago
luc	bd9dc2f32b	Corrected NullPointerException cases occuring in YJsonResponseWriter when no description is available.	9 years ago
luc	0076f9f97d	Updated documented sample url	9 years ago
luc	cfdbc2b487	Improved URLLicence reliability for use by conccurrent non authaurized users. Removed URLLicence generation when unnecessary (authorized users)	9 years ago
reger	c91e712178	further refactor using standard java / (one) utf-8 charset variable extending initiative of commit `9a25751850`	9 years ago
luc	571bc55937	Refactoring : use StandardCharsets constants instead of hard-coded charset names.	9 years ago
reger	1af0e9ef74	remove workaround for Solr bug regarding multivalued date fields fixed in 5.4.0 http://issues.apache.org/jira/browse/SOLR-8050	9 years ago
sixcooler	5a35f9383a	bump to solr/lucene 5.4.0	9 years ago
reger	a58d34a4e8	check error URL cache before adding errorDoc to index - del obsolete related switchboardconstant	9 years ago
reger	e9539b1086	reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart - add filename to parameter fieldname - add filecontent to special parameter fieldname$file (some servlets use this $file parameter) fix for http://mantis.tokeek.de/view.php?id=542	9 years ago
reger	cd26717ba2	fix low memory status hint (dht-in disabled) http://mantis.tokeek.de/view.php?id=619	9 years ago
reger	a5faf73afa	remove obsolete yacy.init entries interaction.* (related to removed triplestore)	9 years ago
sixcooler	dce1cb65c4	Merge remote-tracking branch 'choose_remote_name/master'	9 years ago
reger	46ac0867ff	fix poison mediawikiimporter output queue also after ExecutionException in worker thread. Writer of importer keeps needs a poison to close the file. On exception (e.g. OOM) add a poison marker in outer most try/catch to assure output queue will terminate in this condition too (and closes+renames the surrogate/in/xxx.prt file)	9 years ago
reger	a7591d3ed0	fix mediawikiimporter number format exception on coordinate parsing handle uncomplete metadata like "NS=43/50//N". For other {expr ... } type entries a try catch added	9 years ago
reger	9da1712a31	increase http header EXPIRES for css and images in DefaultServlet to increase browser cache hits for not changing content	9 years ago
reger	6d54eb3d36	skip loading document on crawl start for YMark bookmarks by adding a constructor giving the already loaded document as parameter.	9 years ago
reger	80e2c82249	fix NPE on empty blog importfile parameter	9 years ago
reger	e84d94f8ca	fix mime table for ms office / open office documents (causing wrong parser detect in intranet mode)	9 years ago
reger	45b9bd8403	adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters, and feeding hyperlinks to webgraph processing.	9 years ago
reger	d5fd031449	fix reading of ippattern config array in URLProxy	9 years ago
reger	b7e8358645	make use of header.getContentType where possible (mime is normalized afterwards) otherwise use header.mime() differentiated in prev. commit.	9 years ago
reger	7a8c077838	fix HeaderFramework.mime() to strip charset parameter. Differentiate mime() and getContentType() which gives the raw header field. This improves parser detection if charsets are included in http content-type field.	9 years ago
reger	b4b6910d60	fix (todo): correct doc.id of remote search result if no match with newly calculated doc hash if different. Testing showed that in some cases delivered url doesn't match the local calculated hash. In this case replace doc.id (and host_id_s) with calculation from url.	9 years ago
reger	dec3e6ad96	fix: adjust urlstub for mailto links (skip protocol)	9 years ago
reger	cb83e65f89	drop returning document language "en" if unknown (fix todo) which also harmonizes handling of query.modifier for rwi and solr results (to result must match a given language filter)	9 years ago
reger	0c5548a7ff	fix (todo) remove redundant holding of email link nameproperty in parser document	9 years ago
reger	71c416f383	show mailto links in ViewFile.html linklist	9 years ago
reger	6b7c10cef8	fix dc:date in mediawikiimporter/document.writexml to use lastmodified	9 years ago
reger	14803d58cd	let html scraper accept html5 <link rel="icon"> for favicon links	9 years ago
luc	b4cdacee76	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
luc	ba0a293f5c	Corrected another case of org.apache.lucene.store.AlreadyClosedException" occuring when SearchEvent.cleanup() was called while committing local solr index.	9 years ago
reger	4d2b934487	prevent mailto links getting into parser result document's in/outbound link collection by checking mailto scheme early. - fix upper case mailto protocol assignment - add test case for getProtocol	9 years ago
luc	8c4ab9c76b	Added an option to eventually limit size of remote solr documents put to local index. See mantis #626.	9 years ago
luc	a2c08402af	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
luc	70595d05d0	Modified MemoryControl.main() test to properly end for better results displaying.	9 years ago
sixcooler	1be67d9ab6	CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years ago - time to let it go Commented out unused table of cache-objects	9 years ago
reger	28b8bc290a	fix use of NETWORK_SEARCHVERIFY for rwi verification was not used to set the searchevent parameter (done in SearchEventCache.getEvent) - remove unused corresponding QueryParams.filterfailurls param.	9 years ago
reger	020630efd8	remove unused network scanner parameter from queryparameter Search event is not using networkscanner (removed filterscannerfail param always init to false)	9 years ago
luc	ad5586f8f6	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
luc	8ebefa4233	Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was failing. Looks like it was broken since Commit `b43811d38c`	9 years ago
luc	7736ee5a42	Updated MediaWimporter main() : display usage in console and stop properly without calling System.exit	9 years ago
reger	cdb8f3b10d	make current ranking score value avail. to search interface / api Update the result score result field with the result queue ranking value to reflect the actual calculated/used score, for rwi & solr stack results. (calc. etc. is unchanged, it's just that result entry carries the latest val as api retrieves the number from it)	9 years ago
luc	27d11f8671	Fixed isSolrDump function : PushBackInputStream was not unread when returning false (for example with a WikiMedia dump).	9 years ago
Michael Peter Christen	135a123a77	less logging in new language detection	9 years ago
Michael Peter Christen	ef8cd80593	fix for npe	9 years ago
reger	0347bfa71f	Apply collection query constraint/modifiert to rwi result stack. Collection is not available in pure rwi entries (but in local solr metadata) But if user wishes to filter by query constraint also rwi shall adhere to this (even if only rwi entries with parsed or solr received metadata may fit)	9 years ago
luc	2a67d2ba6f	Corrected error management for unsupported image formats, parsing errors, and unavailable resources : avoid logging to much Exceptions as these errors easily occur when searching images.	9 years ago
Michael Peter Christen	d6e9834040	Merge branch 'master' of https://github.com/Scarfmonster/yacy_search_server # Conflicts: # .classpath # build.xml	9 years ago
Michael Peter Christen	d82d311995	Merge branch 'master' of https://github.com/luccioman/yacy_search_server # Conflicts: # .classpath	9 years ago
reger	b5371ea8c1	read/init crawl queue in a thread to speed-up YaCy start on large existing crawler queues	9 years ago
reger	1160b13172	remove unused md5 from ViewFile servlet params	9 years ago
reger	e163ea88f6	fix vsdParser (Visio) parser return statement (final block un-necessary throw)	9 years ago
reger	b2c8bc0ae6	remove md5_s from default index fields it is not assigned a value / not used Due to above also excluded from transfer protocol.	9 years ago
luc	e40ae0943b	- No max dimensions specified : render raw image data when source and target image format are the same. - Corrected scaling condition.	9 years ago
reger	90686a75a2	fix flux factor (additional crawl delay by access count) calculation	9 years ago
luc	4af27289e5	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	297fdb60d3	throw exception if crawler hostqueue can't create hostpath directory. In rare cases hostname may not be a valid filesystem directory name, which can't be created (e.g. containing '*' char). To prevent crawl queue looping on this invalid entry by throwing a malformedurlexception.	9 years ago
luc	755efac17d	Use same max file size when loading all resource bytes or opening stream content	9 years ago
luc	bc6c79fc12	Corrected scaling function for non RGB images.	9 years ago
luc	1565559df8	Refactoring : extracted write InputStream method.	9 years ago
luc	f0478bb14d	BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys imageio-bmp-3.2 library. - better BMP format flavours support - handle PNG encoded icons - handle transparency Added some javadoc url references to .classpath	9 years ago
luc	07437986e7	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	97cc03ef6a	start using a template for urlproxy header It is included as iframe /proxmsg/urlproxyheader.html to allow full servlet functionallity and flexibility to display some index/meta data in future.	9 years ago
luc	f01d49c37a	Process large or local file images dealing directly with content InputStream.	9 years ago
luc	3c4c77099d	If available, check content length before downloading. Check also content length is not over Integer.MAX_VALUE.	9 years ago
luc	5bbb2e1730	Ensure resource is closed when reading a full file InputStream	9 years ago
luc	6291a57300	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	0d3c5b223e	have psParser cleanup temp file	9 years ago
reger	7d0d19cb8e	avoid File.deleteOnExit() on temp files JVM registers each file in a list regardless of already deleted and never cleans up the list during runtime. This accumulates to a considerable amount of mem during large crawls and/or long uptime. To tackle this, all temp files are now created in a subdir of java.io.tmpdir and the jvm tmpdir property is set to this subdir, which is deleted by code on shutdown. Additionally let pdfParser use this tmp subdir too.	9 years ago
luc	bfe51001e3	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	02e4489a23	set tmpfile.deleteOnExit by default, to make sure files are removed on shutdown.	9 years ago
reger	2985baaa01	Exclude repetitive protocol part in tokenized url used as description if none is avail. from parser.	9 years ago
reger	ca3d26a401	harmonize wordsintitle & CollectionSchema.title_words_val calculation, remove obsolete partial init of wordreference from urimetadata	9 years ago
reger	52a9040ae6	Sort out double keywords (dc_subject) early in parsed documents - by direct using Set vs. List - remove not neede String[] getter	9 years ago
luc	49331dc523	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	47d70732f6	improve locale translator - skip empty line - robustness file section detection (space independant)	9 years ago
sixcooler	646afe9183	do not store subfield *_coordinate + make all num-fields being docvalues	9 years ago
sixcooler	194df613de	not using 'location' as defaultfacetfield - since we removed it being default.	9 years ago
sixcooler	d3b9349b6f	simplification / speedup of GenerationMemoryStrategy	9 years ago

... 5 6 7 8 9 ...

8507 Commits (8a48f80909b2f2885a6cb23a9c488ea1c7548123)