yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	49d91b94c3	npe fix in crawler	11 years ago
Michael Peter Christen	b7183a7321	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	ea2e627662	fix ConfigAccounts del user with uppercase letter in name (usernames are case sensitive, userdb.delete used toLower)	11 years ago
Michael Peter Christen	c465b791af	typo	11 years ago
Michael Peter Christen	191ec8c82a	added concurrency to postprocess rewrite process	11 years ago
Michael Peter Christen	a1e8bdd5e9	log ppm instead of docs/second	11 years ago
Michael Peter Christen	cc0ded7abd	set process type of web graph according to fields as defined in the schema	11 years ago
Michael Peter Christen	12fb9d7cd1	log postprocessing constraints in case that postprocessing is not performed	11 years ago
Michael Peter Christen	3c23b89823	less logging	11 years ago
Michael Peter Christen	a0c53174c5	better solr query logging to detect unnecessary sort requests for more performance profiling	11 years ago
Michael Peter Christen	338f574bdc	no sorting if http/www unique fields are not demanded (makes query faster) and some code restrucuring	11 years ago
Michael Peter Christen	1609763be5	toString fix	11 years ago
Michael Peter Christen	b983e68254	more retries, less sleep	11 years ago
Michael Peter Christen	1503ba7794	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	8f77719091	fix "Ljava.lang.String" in crawl queue anchor name (e.g. IndexCreateQueues_p.html?stack=LOCAL with images in queue)	11 years ago
Michael Peter Christen	0ceeceb35e	more logic on Solr queries; usage of the query terms in posprocessing, saving one query for double document detection now per document	11 years ago
orbiter	38864ae004	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	4099296b45	added new classes which shall reduce call overhead to Solr (stub)	11 years ago
reger	d0c02e1de7	adjust rss lat/lon to double (common format across other classes)	11 years ago
orbiter	3491ab4c38	removed unused images from webgraph edge computation	11 years ago
orbiter	2371d6b8db	target linktexts must be string to enable search facets on these fields	11 years ago
Michael Peter Christen	001e05bb80	do not store failure of loading of robots.txt into the index as a fail document	11 years ago
Michael Peter Christen	05d58e4df0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	98f45c9032	fix for image alt attachment to AnchorURLs in html parser.	11 years ago
orbiter	22ce4fb4dd	better error handling for remote solr queries and exists-checks	11 years ago
Marc Nause	9df14fc126	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Marc Nause	477be17c51	Replaced old UPNP library with Weupnp. UPNP should work now, at least it does on my network. UPNP code in YaCy can still be improved though (see TODO comment: make port on gateway configurable or find free one). ) removed old code ) added new lib *) changed code to work with new lib	11 years ago
orbiter	738989aab7	reverted commit `f94c91315b` because the webgraph has not enough performance for that	11 years ago
orbiter	e9163e7e10	fix for malformed hostpath names in crawl balancer	11 years ago
Michael Peter Christen	c115f3869c	enhanced snippet computation and test method in ViewFile	11 years ago
reger	6c10b59f3e	move bootstrap peers test systems to its test class var assignment not needed elsewhere.	11 years ago
orbiter	1027f3d04a	fix for the usage of ready-prepared solr queries, some queries are formulated as edismax query but this was not set as query attribut. The defType=edismax property needs a qf-field, so this was added as well. Do not remove that field again! This fixes also a problem with title-unique computation.	11 years ago
Michael Peter Christen	f94c91315b	if the webgraph is used, then use it also for reference computation to avoid contradictions with references_i in the collection index.	11 years ago
Michael Peter Christen	6e1dc444c3	added a snippet test function in ViewFile: you can now search for a specific word on the document; the servlet returns the snippet in the same way as it would be shown in a search result.	11 years ago
orbiter	4b06adb751	fix for file urls	11 years ago
orbiter	08409ec680	no idea why the words max was an ordered one. This change increaes speed dunring document processin a bit	11 years ago
reger	e5854a5cdb	fix localhost link to opensearchdescription.xml	11 years ago
Michael Peter Christen	b44626e55b	fixed target_alt_t in webgraph	11 years ago
Michael Peter Christen	504327b15c	fix for condition for writing the webgraph	11 years ago
Michael Peter Christen	542c20a597	changed handling of crawl profile field crawlingIfOlder: this should be filled with the date, when the url is recognized as to be outdated. That field was partly misinterpreted and the time interval was filled in. In case that all the urls which are in the index shall be treated as outdated, the field is filled now with Long.MAX_VALUE because then all crawl dates are before that date and therefore outdated.	11 years ago
Michael Peter Christen	4eec1a7452	refactoring (change Metadata name of load time data structure to avoid confusion with Node data which is also called metadata)	11 years ago
reger	c95ba52cf0	improve logexception info - log a message or class name insted of msgtxt "null"	11 years ago
orbiter	e441831a24	reverted toString() change in AnchorURL to prevent mistakenly used toString(). This fixes also the update link bug.	11 years ago
reger	47f201a6b8	Add Solr default query fields (&qf) to select servlet according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query). This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration. Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields and does not relay on the duplication of text to text_t. - add author to reset-default boost fields (support results for author nav)	11 years ago
reger	f96cfdc84d	prevent array out of bound exception on getRankingProfile(x) on faulty &profileNr= query parameter	11 years ago
reger	5f5fb4ecdc	remove unused static (RSS)search from protocol	11 years ago
reger	7c1706d83a	use CRLF in generated bat command scripts for windows - for easier viewing with standard viewers	11 years ago
reger	a2cb366b25	Combine /heuristic search modifier with opensearch configured targets - with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid) - this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches - the index.html searchoption text adjusted to be displayed only if option configured - add Archive-It to predefined systems	11 years ago
Michael Peter Christen	2de159719b	added an option to set 'obey nofollow' for links with rel="nofollow" attribute in the <a> tag for each crawl. This introduces a lot of changes because it extends the usage of the AnchorURL Object type which now also has a different toString method that the underlying DigestURL.toString. It is therefore not advised to use .toString at all for urls, just just toNormalform(false) instead.	11 years ago
Michael Peter Christen	bf1b6b93e7	do not write CR values to webgraph if no CR values are computed	11 years ago
Michael Peter Christen	e039e78210	small bugfixes	11 years ago
Michael Peter Christen	32a2ff925c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	d07cdd8c3b	added SolrCloud access mode and configuration	11 years ago
Michael Peter Christen	8514bffc22	enhanced postprocessing status report	11 years ago
reger	b24572f304	fix GSA filter query assignment - use more parameter constants	11 years ago
Michael Peter Christen	b5fc2b63ea	removed exist() retrieval functions from error cache and replaced it with metadata retrieval from connectors directly. This should cause better usage of the cache. Automatically increase the metadata cache if more memory is available.	11 years ago
Michael Peter Christen	62c72360ee	cleanup of checkAcceptanceInitially in CrawlStacker, should avoid double-calling of solr	11 years ago
Michael Peter Christen	dd5cdfe212	reverted filter query hack, it did not work	11 years ago
Michael Peter Christen	b5d78ba156	reduced number of solr queries during crawling	11 years ago
Michael Peter Christen	5326970d6c	enhanced solr queries for single document extraction	11 years ago
Michael Peter Christen	525575bd97	added debugging of filter queries in thread dump thread names	11 years ago
Michael Peter Christen	f319ef268f	testing filter queries instead of queries to retrieve documents by id	11 years ago
Michael Peter Christen	fd87fa1613	removed more unnecessary exist-checks in ErrorCache	11 years ago
Michael Peter Christen	f2b476e08b	don't do a double check to solr for failed documents if they are not written to solr	11 years ago
Michael Peter Christen	06ab72d1af	enhanced crawler host round-robin strategy	11 years ago
orbiter	dab9a0786a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	51bf5c85b0	Renamed the transmission cloud to buffer in dispatcher since the name 'cloud' was a bad idea. Changed also the accumulation process for peer targets so that every dht chunk is not assigned the set of redundant targets but they are assigned to redundant targets individually. This enhances the granularity of the target accumulation and should enhance the efficiency of the process. Finally the dht protocol client was enriched with the ability to remove the 'accept remote index' flag from peers or remove peers completely if they do not answer at all.	11 years ago
Michael Peter Christen	a694b6a8fc	another fix for unique field computation	11 years ago
Michael Peter Christen	fb3dd56b02	fix for processing of noindex flag in http header	11 years ago
Michael Peter Christen	b0d941626f	fixed bugs in canonical, robots and title/description unique calculation	11 years ago
reger	d9472d043a	cleanup older unused classes	11 years ago
reger	665e12f88e	move startup time from old serverCore to switchboard (most used here) to make servercore eventually obsolete.	11 years ago
reger	336425912a	remove unused localSearchThread from SearchEvent	11 years ago
reger	32bd2a61c1	add local ip to AbstractRemoteHandler local hostname cache	11 years ago
Michael Peter Christen	f3a6b6e21e	fix for bad URL decoding	11 years ago
Michael Peter Christen	1092e798a5	fixed double content postprocessing	11 years ago
Michael Peter Christen	aee5b108e5	added linkScraperParser, a parser which ignores the text like the generic parser but extracts links like the htmlParser. This should be used for ASCII documents without known text format annotation like source code files or json documents. Probably also good for xml files without known schema.	11 years ago
reger	2b8cc5832c	fix seek error for 0 file size records file by add extra check for file size = 0 in cleanlast() - (http://mantis.tokeek.de/view.php?id=411)	11 years ago
reger	2ba394333f	fix Crawler HostQueue release of stackfile - close stackfile inputstream at end of ChunkIterator This should solve startup delay while unfinished crawl jobs exist (maybe also too many open file situation)	11 years ago
reger	40133ba2d0	fix NPE in Condenser, discovered by calling IndexControlRWI, "Word Deletion" with "for every resolvable and deleted URL reference"	11 years ago
orbiter	59160984cc	timeline performance update	11 years ago
orbiter	54bea96e67	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
Michael Peter Christen	841cc77391	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	e09218129c	remove check for local solr. This check was made during a time when Solr was optional and another alternative metadata store was available. Since that store is now removed, Solr is always available (internally or externally)	11 years ago
orbiter	2073e69034	fix for long periods in timeline	11 years ago
reger	1f94df29e7	fix NPE in solr rss where snippet contains only the title text and adjusted xslt, for solr snippets (&hl=true) to decode the xml encoded html <b> tag by adding disable-output-escaping (still open item description may be double as dc: tag and rss.description tag)	11 years ago
Michael Peter Christen	09dcdb9b19	update to solr 4.9.0	11 years ago
Michael Peter Christen	1cd4b2e8be	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	8c52f0651b	refactoring of AccessTracker events & timeline fix	11 years ago
reger	431a5f9c4e	added test case for TextSnippet, removed obsolete/unused parameter and reference to MediaSnippet	11 years ago
Michael Peter Christen	5b94a257ce	no timeout for large reference collections	11 years ago
Michael Peter Christen	f5b817bac4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	cb2c17d236	extract author and keywords in .doc and .ppt parser	11 years ago
reger	a5707cd2eb	enable proper Author navigator - author facet is based on omitted author_sxt field - adjust to make author nav available on exist of author field but keep using author_sxt to construct the facet (why!?) - add check for querymodifier author in searchevent	11 years ago
Michael Peter Christen	74206a10c7	refactoring	11 years ago
orbiter	fec673c9d1	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	4a66af716d	added apkParser stub (work in progress)	11 years ago
orbiter	c59da9fe7a	added access tracker log reader stub	11 years ago
reger	2d67f29244	adjust mergeDocument after parsing to - preserve charset and languages - fix merge of author	11 years ago
Michael Peter Christen	0d29b972cc	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	36e623d8bf	enhanced metadata enrichment for media file type search: - Web servers may now deliver YaCy-specific http header field with a title and keywords. The new http header fields are: X-YaCy-Media-Title - to be used for media (image, audio, video) titles X-YaCy-Media-Keywords - to be used for media (image, audio, video) keywords - both fields are written to document fields title and keywords and are searched also during image search. - to make the usage of arbitrary http header fields (including this new fields) possible in the /api/push_p.json servlet, a new POST argument is also introduced to push http header fields. The new POST attribute is named "responseHeader-X" (where X is the counter). It is allowed to use this attribute as multi-attribute several times, each can be filled with a http header line. - see /api/push_p.html for examples	11 years ago
Michael Peter Christen	49886fab08	enhanced debugging	11 years ago
Michael Peter Christen	b893c42a0f	bugfix for image search	11 years ago
Michael Peter Christen	c7995d3e2a	increased fixed limit for http POST request sizes to 100MB	11 years ago
reger	7847a93558	fix AbstractParser.singleList not adding null strings - prevents null titles in oo... parser (as detected by ParserTest) - correct ParserTest dc_description check (dc_description allowed to return 0 length array)	11 years ago
Michael Peter Christen	8acae852a0	write <em>-tagged texts also into the bold_txt field	11 years ago
reger	90c4576361	add a link to recrawl index entry to metadata html page - to allow manually renew index content for this url (e.g. in case it is a remote search result with metadata only) - use simply a QuickCrawlLink_p javascript snippet (minimalistic 1st solution)	11 years ago
Michael Peter Christen	2626c8f6db	using concurrency to do base64 encoding in file POST commands	11 years ago
Michael Peter Christen	e132689818	fixed and enhanced Base64 (en)coder (again)	11 years ago
Michael Peter Christen	2415e3db43	enhanced ASCII byte[] -> String conversion	11 years ago
Michael Peter Christen	4751ed974f	enhanced base64 encoding	11 years ago
Michael Peter Christen	e949071160	removed superfluous date method	11 years ago
Michael Peter Christen	501d55cd35	removed superfluous assert	11 years ago
orbiter	0bbb5040b8	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	9d5d86cd03	Added filter query options to the ranking servlet /RankingSolr_p.html. Filter queries are not actually related to ranking, but user requests have pointed out that specific boost queries to move results to the end of the result list are not sufficient. Such boost filters may be better executed as actual filter and therefore such a filter can now be statically applied to every search request. A typical use could be the expression "http_unique_b:true AND www_unique_b:true" which uses the recently introduced fields http_unique_b and www_unique_b which are true only for one of the alternatives with/without http(s) and with/without prefix 'www.' in host names.	11 years ago
Michael Peter Christen	d2151857f1	Added collection navigation: The collection field (can be filled i.e. in Crawl Start) can be used to add categories to YaCy index entries. The usage of that field was restricted to solr searches and post argument filters as implemented in commit `f7571386a3`. This commit extends collections to a full navigation option in the standard YaCy search interface. The field is not active by default but can be activated easily in the /ConfigSearchPage_p.html servlet (just check the 'Collection' facet field). Collections can now be used for (at least) two purposes: - to provide search tenants (through post argument collection) - to provide self-made category navigation Search requests may now have (independently from switched on or off collection facet) a "collection:<collection-name>" modifier attached; firthermore collection names may use disjunctions using the '\|' pipe symbol. For example, this is a valid search request: www collection:user\|proxy	11 years ago
Michael Peter Christen	74c249288a	added a push api to make it possible to upload files directly without crawling to the YaCy indexer. Files are uploaded using POST multipart requests; multiple file uploads are possible as well. Each file has attached the file date and mime type which is used to get the right parser for the submitted data. Also an url is submitted which is assigned to the document. The CrawlSwitchboard has a new option for default Crawl Profiles which are assigned dynamically from the new push interface.	11 years ago
Michael Peter Christen	f13c8aa7dd	re-implementation of file push option in the context of POST http requests. The internal representation of post-arguments is String and therefore not appropriate for byte[] object as submitted by file pushes. Therefore all pushed files are encoded to base64 _after_ uploading with an http form (you do not need to do that encoding yourself) to hand-over the byte[] as string in the post argument. Servlets which read such files must decode the base64 data to get the original byte[] array. This is considered as a temporary solution for file uploads and a proper implementations would need to consider all attributes as handed over as Objects with either String or byte[] Object instances. This would be a major code change and is not done at this time here now. The feature was submitted to realize a feature as pushed with the next commit.	11 years ago
Michael Peter Christen	ba6ffddefc	refactoring	11 years ago
reger	982601017e	crawling of filenames with + fails due to url decoding modified UTF8.decodeURL to apply x-www-form-urlencoded ( space -> + ) to the query part of the url only.	11 years ago
reger	3b559e7846	optimize pdfParser skip starting reader thread if all content already read	11 years ago
reger	09f73b790f	fix pdfParser not closed warning from pdfbox for encrypted pdf on exit due to missing permission to extract	11 years ago
reger	92d1604a31	Crawler hostbalancer does not delete finished queue files, use alternative delete to fight the sympthom (and fix deletion of host dirs on startup) Root cause (which class holds a lock on .stack) not found. http://mantis.tokeek.de/view.php?id=404	11 years ago
Michael Peter Christen	0c324d735c	NPE fix for postprocessing without term index	11 years ago
Michael Peter Christen	922979aae1	added option to prefer http over https in unique-protocol ranking	11 years ago
Michael Peter Christen	b3b174e2b8	fixed webgraph postprocessing and status display in Crawler_p servlet	11 years ago
Michael Peter Christen	e6b28f5958	removed check on protocol for double content (user request)	11 years ago
reger	d8d318233e	fix logging settings - add missing .level - remove obsolete jena settings - set default level=INFO to prevent debug logging of not explicite specified classes	11 years ago
Michael Peter Christen	698f053658	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	f23c4142e0	added option to configure a custom user agent within allip networks	11 years ago
reger	8e233e2eb4	- fix typo in Message_p (defaultpath) - use more existing switchboardconstants for getproperties - replace depriciated call defaultservlet	11 years ago
orbiter	d7d38f9135	made number of open files in crawler configurable and increased default maximum number of open files from 100 to 1000. This number can be changed with the attribut crawler.onDemandLimit	11 years ago
Michael Peter Christen	8ad41a882c	fixed several problems with postprocessing: - unique-postprocessing was destroying results from other postprocessings; removed cross-updates as they had been not necessary - unique-postprocessing did not restrict on same protocol - inefficient concurrent update cache was redesigned completely - increased limits for concurrent blocking queues to prevent early time-out	11 years ago
reger	ca5437dd50	fix crawl of file:// , also http://mantis.tokeek.de/view.php?id=149 local files can be crawled (intranet mode) url parsing fixed according to RFC 1738 (for unix and windows) for win like file:///c:/tmp or file://localhost/c:/tmp for linux like file:///tmp or file://localhost/tmp Host is ignored and path must be absolute	11 years ago
Michael Peter Christen	ff5b3ac84d	added new fields http_unique_b and www_unique_b which can be used for ranking to prefer urls containing a www subdomain or using the https protocol	11 years ago
sixcooler	5b1c4ef191	Monitoring and limit connection-count for Jetty	11 years ago
Michael Peter Christen	f0db501630	better handling of ranking parameters and new default values for date navigation which is done using ranking in solr.	11 years ago
Michael Peter Christen	53948da7d0	tried to make last_modified recognition smarter	11 years ago
Michael Peter Christen	2d03037965	'Last-Modified', not 'Last-modified' according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html	11 years ago
Michael Peter Christen	3dc5fb0050	fix for operator precedence bug (cast binds stronger than bitwise AND) in peer hash hashing. This should not change anything if java casts long to int by masking with 0xFFFFFFFFL but you never know. The important thing is, that the hashCode() should not return numbers that have the same order as the hash code order because hashing of seeds is used to remove the order in some places.	11 years ago
Michael Peter Christen	6634b5b737	debug code for index distribution testing	11 years ago
orbiter	49e344e8d9	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	7705e36703	fix for latest generic warning fix	11 years ago
sixcooler	10326892a8	avoid erros from ConnectHandler, correction for #6d16fa9	11 years ago
orbiter	97983ba89f	fixed generics warnings for generic array instantiation that appeared after migration to Java 7	11 years ago
sixcooler	830057d788	lower Segment-size (hope to get Segments of 10GB) see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5216&p=30036#p30034	11 years ago
orbiter	c028ae9b09	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
reger	e31493e139	"Use remote proxy for yacy" has no function, remove option and related config item see/fix bug http://mantis.tokeek.de/view.php?id=23 http://mantis.tokeek.de/view.php?id=189	11 years ago
orbiter	181784a5cb	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
reger	0587077d06	cleanup obsolete and not used serverswitch Authentify code as auth is mostly delegated to Jetty container.	11 years ago
orbiter	c9f66be20b	move unnecessary nested else out of condition	11 years ago
orbiter	0d8072aa99	removed warnings	11 years ago
orbiter	88f4af90da	removed warnings	11 years ago
orbiter	0f425e01ca	another circle computation enhancement	11 years ago
reger	a8d162810c	Exclude = from percent-encoding in MultiProtocolURL fix http://mantis.tokeek.de/view.php?id=185 and http://mantis.tokeek.de/view.php?id=280	11 years ago
reger	024f8e9b33	fix truncated urls containing "," adressing http://mantis.tokeek.de/view.php?id=58 Exclude comma from percent-encoding in MultiProtocolURL (see RFC 1738 2.2 and RFC 3986 2.2)	11 years ago
Michael Peter Christen	9112f0a2df	enhanced circle tool initialization	11 years ago
Michael Peter Christen	a1ac4c3b76	automatically clear graphics cache	11 years ago
Michael Peter Christen	505f58c79c	enhanced circle computation time and memory footprint	11 years ago
reger	cd8c0dbda9	assign serialVersionUID for proxyservlet, too.	11 years ago
reger	b300d7f4ce	set serialVersionUID on urlproxyservlet to skip compiler warning - remove commented out code	11 years ago
reger	e9060d31bd	update to Jetty 9 besides adjustments in code it makes the servlet settings in web.xml significant. This applies to solr, gsa and proxy servlet. There is no longer a default setup in code during init (as jetty 9 checks for double definition).	11 years ago
reger	1432a817dd	respect "index media" switched off in CrawlStartExpert.html fix http://mantis.tokeek.de/view.php?id=64	11 years ago
orbiter	39e1913585	next development step: migration to java 1.7 This includes also a small code change to test generic type inference, a java 1.7 feature	11 years ago
Michael Peter Christen	4e734815e8	enhanced snippets: remove lines which are identical to the title and choose longer versions if possible. Prefer the description part.	11 years ago
Michael Peter Christen	e84e07399a	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	89f76da24b	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
sixcooler	390f03e041	o not check for segments-count on optimize: this is also done in Solr and our getSegmentsCount() does not return up-to-date values	11 years ago
reger	8a7c68e4c7	content of surrogates/out never accessed (remove) After import the conent is never accessed but may take up a lot of disk space, also the getLoadedOAIServer (which lists the files in surrogate out) is not used. Making the surrogate.out obsolete. Removed keeping of xmls after import.	11 years ago
sixcooler	b8cee9b7d8	remove tables from tabletracker on close to avoid lots of dead entrys in /PerformanceMemory_p.html	11 years ago
reger	1600414450	fix NPE on continuing crawls after YaCy restart (Agent is then nulll)	11 years ago
Michael Peter Christen	229f2248b8	added configuration option for maxmimum load and minimum ram for postprocessing	11 years ago
orbiter	f15c832587	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
Marc Nause	c97da1a0d8	First draft of a blacklist API.	11 years ago
Michael Peter Christen	d4f65833a1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	c1c1be8f02	fix for slow crawling and better logging in balancer	11 years ago
Michael Peter Christen	3acf416335	npe fix	11 years ago
reger	2eb7682772	add html5 audio/video <source> tag to html content scraper - <source src=.. type=..> tag content is added to embed collection	11 years ago
reger	0b6db04e40	fix contentscraper img height/width parsing prevent numberformat exception on common "100px" property - include in test case	11 years ago
reger	ffc5b75c73	optimize and fix lat / lon assignment	11 years ago
reger	9313447de2	reimplement tighter lat/lon calc in URIMetadataNode from old MetadataRow, considering http://mantis.tokeek.de/view.php?id=272	11 years ago
reger	d812f80784	add exit proxy link to UrlProxy on proxied pages a link to exit proxy is added to top of page. Link text can be configured in web.xml init-parameter (see default/web.xml). If missing no link is displayed.	11 years ago
reger	78d08998db	throw MalformedURLException on unknown protocol on other than the supported http https ftp file smb \\ mailto	11 years ago
reger	bb8181b2be	fix: resolve url without path but searchpart e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/" fixes http://mantis.tokeek.de/view.php?id=47 added test case for getHost	11 years ago
orbiter	a3542f29b4	npe fix	11 years ago
orbiter	c48d2a2a02	npe fix	11 years ago
reger	121d25be38	recover sax fatal error on OAI-PMH import of xml with entity error this allows to continue loading next resumptionToken even if import file caused sax parser error fix http://mantis.tokeek.de/view.php?id=63	11 years ago
reger	81dc2aa536	add current css to HTMLResponseWriter to fix metadata view (using css from metas.template except js links)	11 years ago
orbiter	2fd8a0ead6	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	8e5ce7cd51	fixed a situation where finished crawls had not been detected.	11 years ago
orbiter	2f63bd0261	enhanced Host Balancer strategy: fair round robin	11 years ago
orbiter	0c88a32c36	do not apply lazy value instantiation for numeric or boolean values because that is misleading and confusing in case of 0- or false-values and may cause NPEs in retrieval functions.	11 years ago
orbiter	8e04030596	in case of short memory, do not cut down robinson peers to 1, just reduce by 50%	11 years ago
reger	86f6975edc	exclude html tags in in/outboundlinks_anchortext_txt parsed text - some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags, remove all tags for text property (inline img tags are still parsed) - added test case for above (to htmlParserTest) - fix solr test case	11 years ago
orbiter	ccb1864d55	catch IllegalArgumentException for wrong process types (that is needed for migrations when new process types are introduced or disappear)	11 years ago
orbiter	4ee4ba1576	fix for NPE in IndexCreateParserErrors_p.html caused by bad handling of lazy value instantiation of 0-value in crawldepth_i	11 years ago
orbiter	12ba890205	removed warnings	11 years ago
reger	d51f9cc863	add custom Jetty errorhandler to provide custom error page footer line - remove redundant mime check in UrlProxyServlet	11 years ago
reger	c193a02023	defer creation of new ArrayList after possible early return (to skip not used object allocation)	11 years ago
reger	727dfb5875	refactore URIMetadataNode to further unify interaction with index - URIMetadataNode extending SolrDocument - use language as stored (String), reducing conversion to string - optimize debug code in transferIndex	11 years ago
reger	79e7947442	- remove empty http0_9 status text array and unused default_charset = ISO-8859-1	11 years ago
reger	2dabe2009d	- remove unused manual http KeepAlive config (reducing references to obsolete httpdemon) - add port info to settings_http	11 years ago
Michael Peter Christen	5746aae3db	add canonical links to the same crawldepth, not the next crawldepth	11 years ago
Michael Peter Christen	74ab5ef9fa	increased runtime for postprocessing query job	11 years ago
Michael Peter Christen	8b32dd5f9e	special strategy for balancer: do not remove targets with zero wait time from the queue	11 years ago
Michael Peter Christen	9c6228d948	fix for deadlocks in crawler	11 years ago
Michael Peter Christen	10cf8215bd	added crawl depth for failed documents	11 years ago
Michael Peter Christen	7fefebaeca	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	c2f62e783f	- better subgraph handling, less overhead for crawls without the webgraph - usage of crawler crawldepth cache for the linkgraph target depth computation	11 years ago
Michael Peter Christen	06afb568e2	new Strategies in Balancer: - doublecheck cache now records the crawl depth as well - doublecheck cache is available from the outside (made static) - no more need to crawl hosts with lowest depth first, instead all hosts which have only singleton entries are preferred to reduce the number of files.	11 years ago
Michael Peter Christen	1aea01fe5b	fix for Table in case that requested file does not exist and paths also do not exist	11 years ago
reger	710054bb37	implement gzip input handling directly in defaultservlet (making reference to legacy httpdemon obsolete)	11 years ago
Michael Peter Christen	9a5ab4e2c1	removed clickdepth_i field and related postprocessing. This information is now available in the crawldepth_i field which is identical to clickdepth_i because of a specific crawler strategy.	11 years ago
Michael Peter Christen	da86f150ab	- added a new Crawler Balancer: HostBalancer and HostQueues: This organizes all urls to be loaded in separate queues for each host. Each host separates the crawl depth into it's own queue. The primary rule for urls taken from any queue is, that the crawl depth is minimal. This produces a crawl depth which is identical to the clickdepth. Furthermorem the crawl is able to create a much better balancing over all hosts which is fair to all hosts that are in the queue. This process will create a very large number of files for wide crawls in the QUEUES folder: for each host a directory, for each crawl depth a file inside the directory. A crawl with maxdepth = 4 will be able to create 10.000s of files. To be able to use that many file readers, it was necessary to implement a new index data structure which opens the file only if an access is wanted (OnDemandOpenFileIndex). The usage of such on-demand file reader shall prevent that the number of file pointers is over the system limit, which is usually about 10.000 open files. Some parts of YaCy had to be adopted to handle the crawl depth number correctly. The logging and the IndexCreateQueues servlet had to be adopted to show the crawl queues differently, because the host name is attached to the port on the host to differentiate between http, https, and ftp services.	11 years ago
Michael Peter Christen	075b6f9278	refactoring of the crawl balancer: the balancer is turned into an interface and the old balancer class is moved into LegacyBalancer to make room for a fresh implementation of a crawl balancer.	11 years ago
Michael Peter Christen	8470dfe3f8	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	46016fa153	autoupdate fails to download latest release (1.71) due to default release blacklist - removed the default version blacklist regex from init (for future versions) !!! left existing update blacklist setting untouched !!! (existing installation wanting autoupdate for 1.71 need to change blacklist in ConfigUpdate_p.html) - moved old blacklist patch to migration.java	11 years ago
Michael Peter Christen	8aeef73d49	fix for virtual root nodes	11 years ago
Michael Peter Christen	7c7fbb9818	find depth-matches also for edge targets	11 years ago
Michael Peter Christen	dd12dd392f	introduction of a data structure for HyperlinkEdges which should use less memory as it does no double-storage of source links for each edge of the graph.	11 years ago
Michael Peter Christen	6ea8bb7348	using MultiProtocolURL for edge data which is faster (hash computation is now much easier) and smaller in size	11 years ago
Michael Peter Christen	b21c208b4d	enhanced hashcode computation for MultiProtocolURL	11 years ago
Michael Peter Christen	ce1d1b2fa0	fix for maximum tag length in parser	11 years ago
Michael Peter Christen	17e0956312	refactoring of SystemLoad calls (only one backend tool)	11 years ago
Michael Peter Christen	a37d067692	refactoring	11 years ago
orbiter	95780eed32	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
Michael Peter Christen	67beef657f	strong redesign of html parser: object recursion is now made using a stack on html tag objects, not using a recursive parse-again method which may cause bad performance and huge memory allocation. The new method also produced better parsed image objects with exact anchor text references.	11 years ago
Michael Peter Christen	6bd8c6f195	fix for wrong status codes of error pages	11 years ago
Michael Peter Christen	9e503b3376	also delete the robots.txt file from the cache when a new crawl is started	11 years ago
orbiter	67501c9dda	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
Michael Peter Christen	1c21b3256d	fix for robots.txt handling: delete old entry before starting a new crawl.	11 years ago
orbiter	c250fac9f4	linkstructure refactoring to get more options for clickdepth analysis	11 years ago
Michael Peter Christen	8068e68474	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	bd886054cb	new structure and enhancements for link graph computation: - added order option to solr queries to be able to retrieve document lists in specific order, here: link length - added HyperlinkEdge class which manages the link structure - integrated the HyperlinkEdge class into clickdepth computation - extended the linkstructure.json servlet to show also the clickdepth and other statistic information	11 years ago
reger	f326a67561	fix: typo in default charset in metadata2solr update pom and NB build to Solr 4.7.1 libs	11 years ago
Michael Peter Christen	df138084c0	do solr optimization independently from memory and load constraints: - not doing an optimization will likely cause a too many files exception - without optimization performance will be even worse which would prevent optimization in the future as well (prevent a deadlock situation)	11 years ago
Michael Peter Christen	ebd44a7080	replaced solr 4.6.1 with solr 4.7.1 and added index migration to lucene_47	11 years ago
Michael Peter Christen	734778c0c8	fixed a time-out problem in the default servlet which is also a logging problem because the error log showed the wrong reason (file not found) instead the actual reason (time-out).	11 years ago
Michael Peter Christen	466d90ad42	fixed a problem with resource observer; probably coming from uncatched exceptions within the apache library which appear only in concurrency environments.	11 years ago
Michael Peter Christen	e8ddd415a8	enhanced the new link structure graph	11 years ago
Michael Peter Christen	926d28dd3f	fixed a bug which prevented crawl starts after a network switch	11 years ago
Michael Peter Christen	3ce8eff21b	another fix for inbound/outbound detection	11 years ago
Michael Peter Christen	d4b5c457e4	NPE fix	11 years ago
Michael Peter Christen	36a66b0704	fix for parsing of numeric value in case that boolean values are given	11 years ago
orbiter	41730c8048	better logging in template engine: shows filename of servlets where errors in templates occur	11 years ago
orbiter	3c1274057d	fixed thread dump in case of wrong seeds	11 years ago
orbiter	18f9c40302	moved Edge class out of linkstructure servlet as this does not work on non-eclipse driven environments (all non-dev cases)	11 years ago
orbiter	de95e5e524	reduced search activity corona strength in network image	11 years ago
reger	da413af664	move baseurl after parsing orig source in urlproxyservlet to calculate absolute href links for rewrite from unmodified source.	11 years ago
reger	af6ad20728	fix: remove obsolete ref to yacy.home (use Switchboard instead)	11 years ago
Michael Peter Christen	74ab094587	fix for solr query size; too many documents had been retrieved in case that less than _pagesize_ had been requested.	11 years ago
Michael Peter Christen	c64c10ef00	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	48fbfa60c1	bugfix to inbound/outbound identification	11 years ago
reger	227c42bc96	eleminate obsolete URIMetaDataRow class by joining it with/into URIMetaDataNode.	11 years ago
Michael Peter Christen	cca851a417	introduced new solr field crawldepth_i which records the crawl depth of a document. This is the upper limit for the clickdepth_i value which may be shorter in case that the crawler did not take the shortest path to the document.	11 years ago
orbiter	b1ba764d81	fix for first start options and added german translation for popup texts	11 years ago
orbiter	429a874222	- added COLS field in GSA response (non-gsa standard by customer request) - updated document link in GSA response writer	11 years ago
Michael Peter Christen	1b9ec9a1c5	- added popover to p2p/stealth mode button to explain the peer mode and privacy issues. - added popover to first-time use case to explain that specific servlets are only visible after customization and/or crawl starts	11 years ago
Michael Peter Christen	62a36fa584	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	c9f92abddc	fix: application link count (URIMetadataNode)	11 years ago
Michael Peter Christen	a267c46e1a	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	5b83887da8	npe fix	11 years ago
Michael Peter Christen	63c9fcf3e0	free configuration of postprocessing clickdepth maximum depth and time	11 years ago
Michael Peter Christen	39b641d6cd	added tutorial mode - some menu items will only appear if you 'qualify' for them. Thus, the first-time user will only see four menu items. The other items will unfold as the user interacts.	11 years ago
sixcooler	f06775850f	fix receiving DHT / parse pultipart + another close to fix possible resource leak warning	11 years ago
reger	49e76a1c55	make use of detected charset in htmlParser if none is given.	11 years ago
reger	e11504309f	adding a hint to javascript browser short cut on Url-Proxy page (AugmentedBrowsing_p.html)	11 years ago
reger	b12200cafe	alternative UrlProxyServlet (for /proxy.html) using different url rewrite rules - use JSoup parser for selective rewrite of html body <a href= links only, instead of regex which rewrites also header href/src links - this improves display of pages which use header <base> tag - tags with src attribute are taken from original location (like css) improving display and are not routed trough the indexer Disadvantage: scripting links will drop out of proxy Setting of the servlet through web.xml exclusivly (in case one would like to quickly switch back to the YaCyProxyServlet, leaving the existing code of YaCyProxyServlet untouched available)	11 years ago
reger	2953ebe701	fix: port in local target adress & button style	11 years ago
Michael Peter Christen	fda591695c	fixed visibility of custom icon	11 years ago
Michael Peter Christen	a9b9950d7f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	b488f33975	added close to fix possible resource leak warning	11 years ago
Michael Peter Christen	56710ecb26	prevent opening of new files as that could be a cause for the latest too-many-open-files exception. The old file is just truncated if the table is cleaned.	11 years ago
Michael Peter Christen	8b44fcf0f4	added missing @Override annotation	11 years ago
reger	d7055904a6	fix: proxyservlet path header setting	11 years ago
Michael Peter Christen	e515dd460d	added linkscount_i and linksnofollowcount_i to the default solr schema	11 years ago
Michael Peter Christen	1a764135be	one more Thread Dump fix for new bootstrap css style	11 years ago
Michael Peter Christen	bb21d825f9	fix for thread dump line spacing	11 years ago
Michael Peter Christen	cbdfef7ce1	changed protocol facet to show also all other counts if one facet is selected	11 years ago
reger	b9056ef2db	remove unused private header entries (HeaderFramework) X_YACY_ORIGINAL_REQUEST_LINE X_YACY_KEEP_ALIVE_REQUEST_COUNT CONNECTION_PROP_REQUESTLINE	11 years ago
sixcooler	6d16fa993d	make transparent proxy handle https-connections: the implemented handle for connect did not work for me - so lets try the connectHandler	11 years ago
Michael Peter Christen	61ad194065	fix for source and target clickdepth in webgraph index	11 years ago
Marc Nause	809b4e1fd9	Team added support for URLs with unicode characters in host part to blacklist. Punycode is used to handle unicode characters.	11 years ago
reger	b126b9ba17	add some InputFileStream close at end of reads to make sure file is released	11 years ago
reger	ca7444dbdf	limit filetype nav to known extension also on image/media search - on text search we limit filetype nav already to known extension, apply filter to image search	11 years ago
reger	651d057e93	surrogate import translate dc:language 3-char codes OAI records often use 3-char language codes, start converting some 3-char lang's to the internal ISO639-1 2-char code	11 years ago
orbiter	22618e3ba2	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	01989f6af9	restrict write buffer size to a limit	11 years ago
Michael Peter Christen	d1091e79f8	- added stealth button to navigation menu - more fixes to progress bar	11 years ago
reger	c297de5145	remove check for unused virtual path /currentyacypeer/ - del jqueryheader.template (not used)	11 years ago
orbiter	3c8d6e1eee	added adminAccount switch to ConfigAccounts_p servlet to switch on protection of all pages; some refactoring as well	11 years ago
orbiter	7d24bcb98d	added flag to require that all web pages, even such without a "_p" extension require authorization. (default off)	11 years ago
Michael Peter Christen	7a6658abec	removed synchronization in embedded solr connection (that was probably a mistake?)	11 years ago
Michael Peter Christen	a7d4379ef9	fixed shutdown of solr cores in case that more than one local core is to be closed (this happens if webgraph is enabled and the index is dumped using /IndexControlURLs_p.html	11 years ago
Michael Peter Christen	453bfd0f17	removed unused variables and warnings	11 years ago
Michael Peter Christen	05655d98df	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	9f02d2c47b	fix: remove link to triplestore in Vocabulary_p (triplestore does not longer exist) - should be investigated in more detail to look for additional implications Remove "yacyaction" from proxyservlet as it was only needed for removed interaction routines.	11 years ago
reger	81a846ec33	fix: set YaCy CONNECTION_PROP_HOST Header in ProxyServlet to host incl. port	11 years ago
reger	251be9ecfa	remove unused ProxySettings ref. from loader clean unused whois test code	11 years ago
reger	82dc815af9	cleanup: remove unrelated and unused code	11 years ago
Michael Peter Christen	85a427ec54	support for multiple sitemaps in robots.txt	11 years ago
reger	a373fb717d	remove more unused from legacy server.http - triggerOnlineAction not used - useTemplateCache not used	11 years ago
reger	749d020aeb	remove redundant url string manipulation in HTTPDProxyHandler (still used by ProxyServlet)	11 years ago
reger	612294cf84	use servletPath in ProxyServlet instead of fixed name to allow servlet-mapping via web.xml	11 years ago
reger	1d01672bd3	fix DCEntry.getIdentifier on successful url parameter	11 years ago
Michael Peter Christen	b08375da33	fix for bad/missing values of size_i	11 years ago
reger	6306d28a6a	OAI import get multivalued keywords (dc:subject)	11 years ago
reger	0a8c8102de	allow YaCy to start w/o ssl if JKS init fails	11 years ago
sixcooler	0b2101c59c	Speed up the ProxyHandler: simplified cache-storing and make it concurrent in order to free the clientconnection asap let other prozesses wait on proxy-access like it was bevore	11 years ago
reger	516f8c2489	fix: to allow unix scripts (bin/*.sh) to allways submit http admin apicalls using auth via config hash (legacy requirement)	11 years ago
Michael Peter Christen	ea3aa30593	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	dd5bf0b71b	cleanup old reference to HTTPDemon.setAlternativeResolver optimize .yacyh check in AbstractRemoteHandler	11 years ago
Michael Peter Christen	51800007c4	- added concurrency to postprocessing of webgraph document - bundeled separate webgraph postprocesing steps into one	11 years ago
Michael Peter Christen	5f4a6892c1	enhanced RowSet re-sort limit for small sets	11 years ago
reger	351c2be68d	fix: make sure adminAccount changes made via ConfigAccounts_p are effective immediately force to remove current credentials from knownuser cache	11 years ago
reger	5c9dcc269d	improve OAI-PMH import identifier recognition - find best fittng identifier (url) by checking all given dc:identifier in record (many entries proviede several identifiers) as identifier is currently a multivalued field use "getParams" in preference of splitting the 1st string by ";" - add resolve DOI:... identifier via http://dx.doi.org/	11 years ago
Michael Peter Christen	0e7d249a69	fixed another shutdown problem (only occurs if webgraph core is enabled)	11 years ago
Michael Peter Christen	e485fbd0ce	- let crawl loader jobs die after 10 seconds without new jobs - corrected shutdown order t prevent a deadlock during shutdown	11 years ago
Michael Peter Christen	bcd9dd9e1d	enhanced concurrent loading by using a fixed set of concurrent loader processes in favor of throwaway-processes. The control mechanism does less often report a 'queue full' message to the busy loop which then does not perform a long busy waiting; instead all requests are queued and new loader processes are started if necessary up to a given limit (as set before)	11 years ago
orbiter	051328271c	bugfix-bugfix	11 years ago
orbiter	eedcbcd906	bugfix to proxy handler: recognize the own yacyh-host	11 years ago
orbiter	d68e5ad0c4	NPE fix for Thread name (just commited yesterday, sorry)	11 years ago
reger	6878c90f99	fix: IPv6 INTRANET_PATTERNS for local ip (see http://bugs.yacy.net/view.php?id=378 ) requiring following ":" for fc and fd prefix and made pattern match case insesitive - add some more ipv6 test cases to MultiProtocolURLTest.java	11 years ago
reger	a2e5ea2026	status panel link to set max mem +url proxy same error text as in transparent	11 years ago
Michael Peter Christen	6ed9c0164e	attaching names to all Threads to get a better view in profiling tools like VisualVM	11 years ago
Michael Peter Christen	fdaeac374a	- enhanced postprocessing speed and memory footprint (by using HashMaps instead of TreeMaps) - enhanced memory footprint of database indexes (by introduction of optimize calls) - optimize calls shrink the amount of used memory for index sets if they are not changed afterwards any more	11 years ago
reger	ba49ff81ed	little more verbose proxy 403 error message	11 years ago
Michael Peter Christen	d325cb8912	fixes and enhancements for postprocessing	11 years ago
Michael Peter Christen	7c1b968378	another fix for the shutdown exceptions	11 years ago
orbiter	133d41386c	(again) full redesign of ConcurrentUpdateSolrConnector to remove out-of-order transactions regarding add and delete operations. Now all operations (add and delete) are executed concurrently in-order.	11 years ago
Michael Peter Christen	a632b0d2a4	added a forced commit to index deletion to enable synchronized index updates	11 years ago
Michael Peter Christen	1d069c5861	make sure that postprocessed documents are overwritten	11 years ago
Michael Peter Christen	0d2342575e	Merge branch 'master' of ssh://gitorious.org/yacy/rc1	11 years ago
Michael Peter Christen	3cc5c0ffdd	a concurrency enhancement which was not used because tests showed worse indexing speed. I leave the code there since it may be useful in SolrCloud environments.	11 years ago
Michael Peter Christen	e644981697	added one more postprocessing low memory check	11 years ago
reger	5e645f4449	Merge origin/master	11 years ago
reger	3b89176b9f	use config value htroot in Jetty init (was hardcoded) - move htroot exist check from old httpdfilehandler to startup, remove from filehandler and legacy proxyhandler - use SwitchboardConstant.htroot where appropriate	11 years ago
Michael Peter Christen	e1bf65c892	added short memory protection during postprocessing	11 years ago
Michael Peter Christen	90b47e83e6	fixed shutdown error when closing solr connectors	11 years ago
Michael Peter Christen	7640834b37	removed double concurrency to put Solr documents into the index. The writings to the solr index are also buffered in ConcurrentUpdateSolrConnector	11 years ago
Michael Peter Christen	0f6b72f24b	do not use luke requests for remote solr servers if the result is different from normal requests. This happens if the remote solr is actually a solrCloud; in such cases the luke request returns only the result of the single solr peer, not the whole cloud. also done: some refactoring.	11 years ago
Michael Peter Christen	c57026e242	recover from OOM	11 years ago
Michael Peter Christen	907db8b7a6	fix for bad query shortcut hack	11 years ago
Michael Peter Christen	a2b66fe2eb	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	9f6be762a6	- better logging for postprocessing - fixed collection bug in postprocessing	11 years ago
orbiter	da5d4128bf	prevent npe	11 years ago
orbiter	a878c7982c	prevent npe	11 years ago
orbiter	e4eb87d924	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	ced1a96f9c	fixed error cache	11 years ago
reger	3ba81bd08a	Merge origin/master	11 years ago
reger	4d896383db	fix: use timeout = proxy.ClientTimeout in ProxyHandler (was 10sec fix) see http://bugs.yacy.net/view.php?id=236	11 years ago
orbiter	cfb647db6e	- introduced a miss cache in ConcurrentUpdateSolrConnector - better usage of cache - bugfix for postprocessing	11 years ago
orbiter	a87d8e4a8e	changed caching of ConcurrentUpdateSolrConnector: it caches now also the url along with the load date. While this takes much more memory, it eliminates database lookups for getURL() requests, which happen equally often. This speeds up remote solr configurations.	11 years ago
orbiter	f6e441dd77	refactoring	11 years ago
orbiter	76c53faeb2	removed unused code (HostStat)	11 years ago
orbiter	d3a88eaecb	introducing ConcurrentUpdateSolrServer for remote solr servers. Scaling of write buffers and update queue size is made according to assigned memory.	11 years ago
reger	809e976578	remove unused java imports form yacy.java	11 years ago
reger	a9b06f8719	add a -config command line parameter e.g. -config "port=9090" "port.ssl=8043" - useful for remote installation to set any config file property - multipe parameter can be set at once, on Windows enclose parameter in doublequotes - special handling "adminAccount=adminuser:adminpwd" sets adminusername and md5 encoded admin-pwd - adjusted windows startbatch to allow command line parameter handling - remove not needed classpath calculation from startYACY_debug.bat	11 years ago
reger	0923b09216	fix: allow 4 character admin user name (was min 5 char)	11 years ago
Michael Peter Christen	254a7ac66c	fixed cleaning of index	11 years ago
Michael Peter Christen	28a7b42e6b	removed warning "sun.misc.BASE64Encoder is internal proprietary API and may be removed in a future release"	11 years ago
Michael Peter Christen	046f5a03cb	one more SolrIndexSearcher bugfix	11 years ago
sixcooler	78c01b3eff	fix for 'AlreadyClosedException: this IndexReader is closed'	11 years ago
Michael Peter Christen	1b5e3d523a	better control over close-state of remote solr connections	11 years ago
Michael Peter Christen	1a364572a5	fix for "org.apache.solr.core.SolrCore Too many close [count:-1] on org.apache.solr.core.SolrCore@51af7c57" -error	11 years ago
Michael Peter Christen	69391e5d9e	changed strategy to test existence of documents in Solr: using the update time. The reason for that is a better caching for the crawler double-check, which needs the update time for crawler steering.	11 years ago
Michael Peter Christen	790f103f32	delete fail-docs during postprocessing to prevent that they will appear again and stay in postprocessing forever.	11 years ago
Michael Peter Christen	ff656ce860	explicit call to optimize to add a expungeDeleted flag	11 years ago
Michael Peter Christen	9eb668e951	enhanced the resource observer The resource observer is now able to recognize free disk space AND available space for YaCy. The amount of space which is assigned for YaCy are defined in new settings in the configuration file. Furthermore, there is now a cleanup process which deletes files in case that an autodelete is activated. The autodelete is now BY DEFAULT ON if the disk space is low, which means that YaCy starts to delete documents when the disk is full!	11 years ago
Michael Peter Christen	fbee98c06f	fixed shortcut self-reference bug	11 years ago
Michael Peter Christen	e7a29a2851	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	bf97e38b83	removed clearURLIndex, which is a stub remaining from the old metadata database and not needed any more	11 years ago
orbiter	14764632b5	clear solr caches in case that an exception occurrs. The reason behind this hack is the occurrence of Exceptions like: W 2014/02/11 18:51:33 ConcurrentLog GC overhead limit exceeded java.io.IOException: GC overhead limit exceeded at net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:334) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getDocumentById(MirrorSolrConnector.java:173) at net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getDocumentById(ConcurrentUpdateSolrConnector.java:415) at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:331) at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:317) at net.yacy.search.query.SearchEvent.pullOneRWI(SearchEvent.java:1024) at net.yacy.search.query.SearchEvent.pullOneFilteredFromRWI(SearchEvent.java:1047) at net.yacy.search.query.SearchEvent$3.run(SearchEvent.java:1263) Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3077) at java.lang.StringCoding.decode(StringCoding.java:196) at java.lang.String.<init>(String.java:491) at java.lang.String.<init>(String.java:547) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:187) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:351) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110) at org.apache.lucene.index.IndexReader.document(IndexReader.java:436) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:657) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.SolrQueryResponse2SolrDocumentList(EmbeddedSolrConnector.java:230) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getDocumentListByParams(EmbeddedSolrConnector.java:320) at net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:330) ... 7 more This problem was analysed with the Eclipse Memory Analyser after a heap dump, where the following problem was reported as the main Problem Suspect: One instance of "org.apache.solr.util.ConcurrentLRUCache" loaded by "sun.misc.Launcher$AppClassLoader @ 0x42e940a0" occupies 902.898.256 (61,80%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "<system class loader>". This memory is part of the result cache of Solr. Flushing this cache appears the most appropriate solution to that problem.	11 years ago
Michael Peter Christen	bc28247089	Added methods in resource observer to calculate the available and the occupied disc space. These values are also shown on the status page. The disc space calculation shall be used for a disk-limitation of the search index.	11 years ago
Michael Peter Christen	0dda979801	adopted network image drawing to increased number of peers	11 years ago
Michael Peter Christen	ca8b100f96	run the cleanup process even when load is high, do postprocessing even if load > 1 (but < 2) but only if there is enough memory (now: 0.5 GB RAM available). The memory amount of the postprocessing is the cause that systems block because they run into a frequent-GC chain which almost locks the peer. If running with enough memory, the postprocessing is fast and not damaging to the system. Because the required RAM of 0.5 GB is never available in default setting, the postprocessing will not run if the peer is not reconfigured to use more memory.	11 years ago
Michael Peter Christen	195e5868d3	catch solr close exceptions	11 years ago
Michael Peter Christen	751c128544	extra sleep for remote searches enhances search results because there is more time for more remote peers to contribute on the first result page	11 years ago
Michael Peter Christen	0cabcbbe83	more efficient wordcount	11 years ago
Michael Peter Christen	3d474a843e	added memory protection for postprocessing	11 years ago
Michael Peter Christen	412d55523c	enhanced memory protection and OOM exception handling in Solr connector	11 years ago
Michael Peter Christen	d9858e1b8a	removed warnings and superfluous logging	11 years ago
Michael Peter Christen	acc8d7faa7	fixed setting of shortMemoryStatus in MemoryControl	11 years ago
Michael Peter Christen	94245ce0a8	fixed "Size in KBytes" calculation in PerformanceQueues_p.html, see http://bugs.yacy.net/view.php?id=362	11 years ago
Michael Peter Christen	726e8c3ad5	removed unused classes and servlets	11 years ago
Michael Peter Christen	6e59ca4ebf	removed jena library and all code that depended on jena. When jena was introduced, it was also used for search facets. The generic search facets are now deduced from generic solr fields which makes jena as tool for facet semantics superfluous.	11 years ago
Michael Peter Christen	9228214f9b	enrichment of PerformanceMemory display of SolrInfoMBean table	11 years ago
Michael Peter Christen	e8bdf16ea7	added statistic information for solr resources in PerformanceMemory	11 years ago
Michael Peter Christen	931541d198	re-inserted default value re-set button to performance queues and patched missing values for recent new queues	11 years ago
Michael Peter Christen	456e52e0d5	enhanced strategy to clear solr caches - redesigned the instance mirror class (which was a mess) - added final method to close a searcher (which otherwise keeps a cache) - changed cache clear method which iterates over resources and calls clear to all caches in the searcher resources	11 years ago
reger	bd1685c94a	fix not needed getFileExtension().toLower (double) add missing .getFileExtension	11 years ago
orbiter	a11f072504	enhanced didyoumean	11 years ago
Michael Peter Christen	c0e6a65ec3	enhanced didyoumean	11 years ago
Michael Peter Christen	6d2dab7b21	fixed 'resource leak' warning	11 years ago
orbiter	22e3524797	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	c40ba51ca6	added new suggest method which replaces more-than-one suggestions: instead of computing suggest permutations of the given words, the completion of a phrase using the given words is searched in the fulltext index.	11 years ago
reger	ad4b213145	remove unused static var from HTTPDProxyHandler	11 years ago
reger	b693ce9759	allow combining selection of different search nav's (facets) - selecting more than one nav combines the 2 selections (with AND) - unselecting one nav clears all selected (e.g. select filetype:pdf and /language/fr shows ~ french pdf's only)	11 years ago
reger	cb71413d19	fix page nav, to keeping modifier (was new issue)	11 years ago
orbiter	416481c33e	added a boost on appearance of combined words (in the same order the user submitted that) when searching for more than one word	11 years ago
reger	c589ee8c6e	URLproxy access check too tight respect config ip pattern (was own ip)	11 years ago
Michael Peter Christen	ebfaf753b7	- faster initialization of index files - removal of not used space if index files shrink (rare, but possible)	11 years ago
Michael Peter Christen	d2b8f2b477	enhancements for staticIP and ipv6 handling	11 years ago
reger	a71718a459	add config value for ssl/https port (default=8443) adjust server routines to use config	11 years ago
reger	a3e2cca8e9	improve isOlder check to not overwrite node index with metadata on equal load date	11 years ago
reger	9b24dae2b7	add language navigation filter clause to rwi results	11 years ago
reger	f307d65dcf	prepare for a language navigator works fine to restrict language for local solrSearches. More work needs to be done to make rwi/remote searches respect the modifier.language restriction.	11 years ago
reger	cf553e5045	added hint to web.xml and for completeness the full set of hardcoded mappings	11 years ago
Michael Peter Christen	c84bcc878a	first try to add a generic solr servlet as luke request servlet	11 years ago
Michael Peter Christen	4cb7e2a2ca	refactoring: renamed the SolrServlet to SolrSelectServlet for better naming of more Solr Servlets	11 years ago
Michael Peter Christen	dc06e407ce	added two virtual instances of solr for the both cores: collection1 and webgraph. These cores are now accessible at /solr/collection1/select instead /solr/select?core=collection1 and /solr/webgraph/select instead /solr/select?core=webgraph in addition to the old behavior to support compatibility to the old peers. These new paths are fully solr standard-conform and will allow the cross-linking between YaCy peers using their public solr API.	11 years ago
Michael Peter Christen	8b14e92ba4	added button in host browser to re-load 404/failed documents	11 years ago
orbiter	771d8261c1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	c351e47a84	fix for bad-formatted lonlat	11 years ago
reger	4c603b216e	optimize parse ServerSideInclude	11 years ago
orbiter	5ec0c969c9	fix for http://bugs.yacy.net/view.php?id=354	11 years ago
orbiter	0002abd583	fix for OOM during remote search and too high load protection	11 years ago
sixcooler	5a917e13c6	use less ram on dht-URL transfer by not using a URIMetadataNode[]	11 years ago
Michael Peter Christen	c87cdfca2e	do not set a load prerequisite that prevents the start of one-time-jobs	11 years ago
sixcooler	4d77ca52c9	workaround to let dht-out run on smal Systems like a Pi	11 years ago
Michael Peter Christen	6ada0daae9	making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering.	11 years ago
Michael Peter Christen	489c3fbc90	code simplifications / removed warnings	11 years ago
Michael Peter Christen	0168f80c28	new crawling factors can now be changed during runtime	11 years ago
Michael Peter Christen	be5e808236	- removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request)	11 years ago
sixcooler	40a4030b55	configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi	11 years ago
sixcooler	6d8c023a5e	lower client-connection for single-cpu-systems	11 years ago
Michael Peter Christen	77531850b5	reverted crawling strategy from latest commit.	11 years ago
Michael Peter Christen	c0da966dfa	enhanced crawler speed	11 years ago
Michael Peter Christen	79809342fa	added synchronization to exists() call bacause the concurrent call to that method showed in thread dump close to deadlock situations. Its also better to synchronize IO operations because they become faster then.	11 years ago
Michael Peter Christen	9a6912f2e6	if a http client thread is still running but we do not wait for it any more, call an interrupt	11 years ago
Michael Peter Christen	0d235a565b	cleanup crawl loader jobs	11 years ago
Michael Peter Christen	1ea17bd9f3	- removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class	11 years ago
reger	d3de309953	fix IOexception logging issue in DefaultServlet reason not sure but .logException triggers another exception	11 years ago
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	11 years ago
Michael Peter Christen	022c6d3ce1	do YaCy p2p connections using a timeout-request which covers the http request into a separate thread and ignores the furthure result of a request if that does not answer within the requested time-out. This is a try to solve a problem with the peer-ping, which hangs whenever a peer appears to be dead or blocked.	11 years ago
Michael Peter Christen	42f3733a05	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	25a6c05008	experimental removal of synchronization. This should work for all cases where the size() and isEmpty() method is used only for statistics, which happens at many locations in YaCy. If these methods are used for structual reasons (like accessing the last element in an array) then it may fail or cause other problems. As far as visible, this is not the case.	11 years ago
Michael Peter Christen	5695280edd	removed superfluous synchronization	11 years ago
Michael Peter Christen	a1977b7a75	removed debug code	11 years ago
orbiter	fd4abc0565	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	d5b8e473c8	added load limit for DHT transfer: RWI acceptance only if local load is not too high	11 years ago
reger	2614fa7aeb	Skip remote Solr search if last try showed error As the solr servlet may not be available (e.g. no public search page, old version, individual access setting) a /solr/select error is remembered in the seed.dna of the remote peer. This is not permanent, as flag is not stored and the seed is reloaded on several occasions, it is just a memory of the recent past status. Might also be set to "not available" on time-out of last try.	11 years ago
orbiter	a07e9b3582	concurrency-solid version of transmission limitation	11 years ago
orbiter	60ead31273	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	52bf7d1ac8	reduce load during dht transfer	11 years ago
sixcooler	f0587d4af5	NP-fix, which was found on a Pi under 'havy' load	11 years ago
Michael Peter Christen	0bf3cab8c7	- better 'extra'-peer selection - logging of health status for 'extra'-peer selection - concurrency for remote peer IO and interrupting the threads if time-out occurrs	11 years ago
orbiter	e3c4456c8e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	7f21d21d1d	added synchronization to deeply-embedded solr connector EmbeddedSolrConnector because deadlock situations show that methods in lucene class seem to block.	11 years ago
reger	9b06774414	fix role name in GSA servlet	11 years ago
reger	0c754dd794	implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. !!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash - default authentication is still BASIC - configuration at this time only manually in (DATA/settings) or defaults/web.xml (<auth-method> - the realmname is in defaults/yacy.init adminRealm=YaCy-AdminUI - fyi: the realmname is shown on login screen - changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin) - implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST - to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes ( "MD5:hash" )	11 years ago
Michael Peter Christen	ba44eb1160	when scaling the number of remote peers, also consider the machine load and the number of cores	11 years ago
Michael Peter Christen	f8ce7040ab	remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware.	11 years ago
Michael Peter Christen	47a82e471c	less blocking in SeedDB which caused deadlocks in peer ping	11 years ago
Michael Peter Christen	ec10ed45bd	better logging in logger	11 years ago
Michael Peter Christen	a5d7961812	replaced old caching in SolrConnector with a new one which is better for concurrency and should prevent from 100% CPU usage after a long run of a peer with a large number of documents.	11 years ago
reger	6e2fe777af	simulate Authorization cookie for yacy servlet header	11 years ago
reger	ea7cef5d05	fix NPE in TemplateEngine StackTrace For input string: "" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:504) at java.lang.Integer.parseInt(Integer.java:527) at net.yacy.server.http.TemplateEngine.writeTemplate(TemplateEngine.java:241) at net.yacy.server.http.TemplateEngine.writeTemplate(TemplateEngine.java:199) at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:896)	11 years ago
reger	cb6d0c2113	implementing YaCy legacy role names - taking out customized SecurityHandler code as the original/default seems to just work fine - with this individual sec. constraints can be applied via web.xml (using legacy role names)	11 years ago
reger	f09dbbef96	make SecurityHandler webappcontext ready	11 years ago
reger	37f2a82a5d	making root context (htroot) a WebAppContext - this allows additional features, like servlet configuration via web.xml and many more things. - currently the standard servlets are still configured in the code (so the supplied defaults/web.xml is not realy needed, yet), but could be expanded - lookup for web.xml - 1. in /DATA/SETTINGS then in /defaults	11 years ago
reger	28eae57e8b	spend CrawlQueues a fremem routine - clears errorStack - will not get hit often (but better little than nothing on low mem)	11 years ago
reger	b931bf6b48	fix use of url proxy access pattern pattern of transparent was used.	11 years ago
reger	280c4a3ac1	exclude terms with " for didYouMean suggestion causes Solr error (and wordindex likely finds suggestion) org.apache.solr.core.SolrCore org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'text_t:""d"': Lexical error at line 1, column 12. Encountered: <EOF> after : "" at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.query(EmbeddedSolrConnector.java:179) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector$DocListSearcher.<init>(EmbeddedSolrConnector.java:345) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getCountByQuery(EmbeddedSolrConnector.java:364) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getCountByQuery(MirrorSolrConnector.java:326) at net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getCountByQuery(ConcurrentUpdateSolrConnector.java:440) at net.yacy.search.index.Segment.getWordCountGuess(Segment.java:464) at net.yacy.data.DidYouMean.getSuggestions(DidYouMean.java:181) at suggest.respond(suggest.java:73)	11 years ago
reger	fbc1071f6d	Merge origin/master	11 years ago
reger	7b800a0c8e	fix: NPE on shutdown via script	11 years ago
Michael Peter Christen	ce4d42d77c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	644573cfc4	using the adminAccountUserName from yacy.conf within apicall.sh	11 years ago
reger	6932aa4d7a	use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. TODO: the bin/apicall.sh script should likely take that into account.	11 years ago
orbiter	2ead4e44d9	introduced a new storage path ARCHIVE inside of DATA which will be used as path for solr index dumps (instead of the SEGMENTS path). This will make a maintenance of index backups easier. It will also provide a tool to migrate from an freeworld index to a webportal index.	11 years ago
sixcooler	add0e42804	fix double-escaped urls from proxy-usage	11 years ago
sixcooler	865ce6f974	check blacklist proxyClient config	11 years ago
sixcooler	345f9aba27	make use of our DNS-cache again - this realy speeds up the lookup	11 years ago
reger	e6d284fe1e	better solution for prev. commit with MultiMapSolrParams.getFieldInt not returning default parameter	11 years ago
reger	0bc2fc14ab	improve NPE chance on missing parameters java.lang.NullPointerException at net.yacy.http.servlets.SolrServlet.service(SolrServlet.java:145) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)	11 years ago
reger	f06cef5d5b	reimplement proxy access by configured whitlist pattern was currently limited to own ip.	11 years ago
reger	05d6cc6ea3	setting of IPv4Stack moved earlier it seems even better to call system.setproperty before isrunning check (if nothing helps we have to set it in startup script)	11 years ago
reger	30d925a96e	reimplemented server access restriction via Jetty IPAccessHandler to allow only configured IP's to access. Handler is only loaded if a restriction is configured. Since IPAcessHandler (Jetty 8) does not support IPv6 system property java.net.preferIPv4Stack=true Testing showed system.setProperty seems to be sensitive to point of calling (earliest possible time seems to be best = early in yacy.main). Moved the "isrunning..." just open browser check also to the new routine to preread the yacy.config only once.	11 years ago
orbiter	3cb6c7861f	fixed shutdown authenticaton problem	11 years ago
Michael Peter Christen	ed06b5b94b	set a realm message to log-in input window which explains that a password for the account 'admin' can be (re-)set with the script bin/passwd.sh	11 years ago
Michael Peter Christen	7005ecdabd	cleanup	11 years ago
Michael Peter Christen	2939b47986	removed non-working realm setting in http client (auth for localhost was added in previous commit)	11 years ago
orbiter	9d52b337f3	added http authentification to YaCy http client for all localhost acesses to enable self-steering of the peer using the API table. This is necessary in case that an password for the administration pages is set.	11 years ago
Michael Peter Christen	c951945666	modified log-in detail to enable admin-login from localhost with stored hash even if localhost access is disabled. This is urgently needed for the apicall.sh script since that is used for high-availability set-up (checkalive and indexdump for index mirroring)	11 years ago
Michael Peter Christen	9bd71fdbb4	made the access tracker class static because it shall be used by the jetty auth module	11 years ago
Michael Peter Christen	1c56befb93	fixed mess with test on localhost (which means local hosts for some cases)	11 years ago
Michael Peter Christen	7d6fc79eb8	refactoring (usage of constant names for attributes of authentication check)	11 years ago
Michael Peter Christen	b9d36e45e0	removed the &amp explicit encoding of ampersand character since this is double-translated within the template replacement process.	11 years ago
reger	e2ccb6ce9d	modified DefaultServlet parameter on invoke templates call response with post=0 (if post empty) simulating previous behavior. (template servlets typically test for post==null, found one more Crawler.p.java were empty post caused problem, = defaults not correctly set)	11 years ago
reger	4c38bceafc	handle http connect for proxy refactor header cleanup (reuse existing code)	11 years ago
reger	cfabe8f67a	harmonize access restriction for urlproxy servlet with proxy handler, what is currently - use switched on in config - access from a local IP / hostname fix shutdown exception for crashprotection handler on interrupted connections.	11 years ago
reger	e6b9643fd6	extended request for local peer check to by hostname resolved ip the current islocal() check did not detect a domain.com address as request for the local peer.	11 years ago
reger	c797f108a1	add error response on deniedl proxy access send http 403 response	11 years ago
reger	0583f44306	reimplement proxy access log (to Jetty ProxyHandler) - using existing HTTPDProxyHandler logger - allow local loopback ip to access proxy	11 years ago
reger	8cbc1c970a	Security Hot-Fix: for transparent proxy.	11 years ago
reger	58ecf5e4dd	add to blacklist button in CrawlResults http://bugs.yacy.net/view.php?id=220 introduced Blacklist.add with sourcefile only parameter	11 years ago
reger	e9081c0f17	moved startup execAPIActions call after Jetty startup execAPIActions require http to be up. The 10s sleep was sufficient to allow Jetty to start, but it's more robust to place the call after http is assigned to switchboard/serverSwitch.	11 years ago
reger	19c1a7a5ca	change SolrServlet from Filter to Servlet (as no multicore required) this allows to simplify context/servlet initialization in Jetty init.	11 years ago
reger	14c977dd26	fix NPE GSAresponseWriter on query=null java.lang.NullPointerException at net.yacy.cora.federate.solr.responsewriter.GSAResponseWriter.highlight(GSAResponseWriter.java:328) at net.yacy.cora.federate.solr.responsewriter.GSAResponseWriter.write(GSAResponseWriter.java:263) at net.yacy.http.servlets.SolrServlet.service(SolrServlet.java:235)	11 years ago
orbiter	c3dee2d6bd	added security patch	11 years ago
orbiter	dcf46ce8f6	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	343d2ef49a	new data type for access tracker (unfinished)	11 years ago
reger	dd8ea0cdd6	fix "add to blacklist" button style in IndexControlRWIs_p - added default filename filter to select field (as only addition to *.black list is permanent) - modified Blacklist_p header/legend to show all active blacklists (to support understanding that all configured lists are active) - removed obsolete code in Blacklist_p servlet	11 years ago
reger	abbf487023	fix QueryGoal Image query (missing space) see query log example .. url_file_ext_s:(jpg OR png OR gif) ORcontent_type:(image/*)) ..	11 years ago
reger	26e9d7e066	fix NPE in IndexControlRWIs_p.html - metatags my be null Caused by: java.lang.NullPointerException at net.yacy.search.query.QueryParams.getFacets(QueryParams.java:445) at net.yacy.search.query.QueryParams.getBasicParams(QueryParams.java:400) at net.yacy.search.query.QueryParams.solrTextQuery(QueryParams.java:345) at net.yacy.search.query.QueryParams.solrQuery(QueryParams.java:334) at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:290) at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:176) at IndexControlRWIs_p.genSearchresult(IndexControlRWIs_p.java:641) at IndexControlRWIs_p.respond(IndexControlRWIs_p.java:141)	11 years ago
reger	7f9b9315fe	Merge origin/master	11 years ago
reger	8eaabb9600	remove dependency from old serverCore.java - remaining getPortNr not needed (as current release allows only to set plain integer as port, see ConfigBasic)	11 years ago
orbiter	2018e55f8b	switched back on index deletion (was accidently off because new jetty framework delivers never null to post arguments .. there may be more of that kind of problems)	11 years ago
orbiter	3961b643a3	write solr searches to search log	11 years ago
orbiter	15882beb19	fix for strange NPE java.lang.NullPointerException at net.yacy.search.Switchboard.updateMySeed(Switchboard.java:3667) at net.yacy.peers.Network.peerPing(Network.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165)	11 years ago
orbiter	f3ac923a7e	ftp client shall be able to open non-anonymous ftp servers if login details are given	11 years ago
reger	3d913558ab	display configured adminUserName in ConfigAccounts_p - fix read default username in in loginservice	11 years ago
reger	fbdd89e198	Merge origin/master	11 years ago
reger	65a2f3d5e7	tweak Jetty credentials to work with YaCy UserDB - user entry in UserDB with admin right can login to access protected pages - dto. admin user, choosen username is stored in conf (adminAccountUserName=)	11 years ago
Michael Peter Christen	ffdfe5fb9b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	7d6b34a89f	Merge origin/master	11 years ago
reger	45e8750ba5	nasty quick fix for admin login with other username as admin - userDB is not sync'ed with Jetty credentials as of now only the std. admin account can login switched initial browser open with ssl active back to std. http port	11 years ago
Michael Peter Christen	ee17bd0b69	added option to attach remote solr servers in read-only mode	11 years ago
Michael Peter Christen	25f9c35033	add patch which shall prevent that naive search mistakes like usage of regular expressions cause no results. Usage of '*' followed by a dot or any expression will now cause that this expression is used as a filetype search.	11 years ago
Michael Peter Christen	667a6adddb	- use default files from yacy.init property "defaultFiles" if no jetty-configuration is given for default files. - fix a problem with default paths if no path is given (i.e. http://localhost:8090 instead of http://localhost:8090/). Without this patch the path was resolved automatically to http://localhost:8090//	11 years ago
Michael Peter Christen	77aeb288a2	suppress deprecation warning (for now); TODO: find alternatives	11 years ago
reger	fca7f1d043	run SSL/HTTPS port (8443) ping test in migration only if SSL/HTTPS is on - see last commit	11 years ago
reger	71cac1a278	added SSL/HTTPS connector to support SSL/https connection on port 8443 !!! attention !!! to make sure YaCy can start, https will be disabled if port 8443 is used - added ping test for above to migration - as of now port for https is hardcoded to default 8443 - if not urgend required I'd leave it this way (it's standard) to use different ports for http and https - post https port on ConfigBasic.html (if active)	11 years ago
Michael Peter Christen	82c0525e71	wrong logger fix	11 years ago
Michael Peter Christen	e17624b6dd	added html retrieval from alternative DATA/HTDOCS path	11 years ago
Michael Peter Christen	07cee6b99c	removed more unused code	11 years ago
Michael Peter Christen	20b48f894f	refactoring: moving all servlets to the same package (the solr servlet is currently actually a filter which should be changed somehow)	11 years ago
Michael Peter Christen	84167adb49	removed unused anomichttpd code after migration to jetty	11 years ago
Michael Peter Christen	b461a27abb	fixed the SolrServlet	11 years ago
Michael Peter Christen	7603e879dc	Merge branch 'master' into HEAD Conflicts: .classpath source/net/yacy/cora/federate/solr/SolrServlet.java	11 years ago
Michael Peter Christen	25250405f1	solr servlet preparation for join with jetty branch	11 years ago
Michael Peter Christen	2f16770681	migrated to solr 4.6.0	11 years ago
Michael Peter Christen	57f0f71ac6	added patch to allow binary response writer	11 years ago
orbiter	937273d4e3	added parsing of metadata to surrogate reading: a dublin core record inside of surrogate input files may now contain tokens within the namespace 'md' (short for: metadata). The token names must be valid withing the namespace of the solr field names. All md-tokens inside of surrogate files then overwrite values within solr documents before they are written to the solr index. This makes it possible to assign collection names to each surrogate entry and also ranking information can be added. Please see the example file.	11 years ago
reger	18497f6475	remove unused init parameter from DefaultServlet - remove "RelativeResourceBase" parameter	11 years ago
orbiter	4de3fefdb5	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	7e346e1d79	using stringbuilder in query construction	11 years ago
reger	c84c313fe1	Merge origin/master into jetty	11 years ago
Michael Peter Christen	2702d9e56b	- added a SolrQueryResponse2SolrDocumentList method which is able to work around the unfolding process in Solr's BinaryResponseWriter. This was a huge performance bottleneck in the embedded solr connector and the problem is actually on Solr side, but we have now a workaround. - This made it possible to abstract a high-performance index access method which is implemented as method getDocumentListByParams. That method is also implemented in the SolrServerConnector and provides a very efficient access to a solr index if the index is embedded. - a popular use of the document list retrieval is a result count which can now also make use of the new method, via getDocumentCountByParams. - enhanced the Error cache which now does not store error documents within the ram cache if the document is also written to solr. When documents are retrieved from the cache, they are partly read from the ram cache and if not existent there, from the Solr index.	11 years ago
Michael Peter Christen	74466d731a	use pre-compiled patterns in ymark	11 years ago
Michael Peter Christen	34633044b4	made pattern computation static	11 years ago
Michael Peter Christen	ef7ddbc933	added date parser caches to prevent re-calculation of costly date parsing	11 years ago
Michael Peter Christen	552ef9f18e	fix for bad ErrorCache.exists test (bug from latest commit)	11 years ago
Michael Peter Christen	09412ea3a4	counting search requests in solr interface	11 years ago
Michael Peter Christen	303f5694ba	avoid usage of existsByQuery. If a document can be loaded by the ID before testing other fields from the existsByQuery request, then a document cache fills and queries after that one can be avoided.	11 years ago
reger	b43bbd3cc4	join DefaultServlet and Jetty8 implementation - removing Jetty 8 specific dependencies	11 years ago
reger	089c5007ee	move conditionalHeader to DefaultServlet - by removing Jetty specific implementation detail	11 years ago
Michael Peter Christen	79771c60c0	IPv6 fixes	11 years ago
reger	92d9c56f9f	Merge origin/master into jetty	11 years ago
Michael Peter Christen	78eac85161	better calibration of caches and queue maximum sizes	11 years ago
Michael Peter Christen	c8af19bd37	removed unnecessary check which causes a NPE when searching with empty search string	11 years ago
Michael Peter Christen	e3c2f09de9	- reduce computation in case that specific postprocessing fields are not selected - de-select citation rank computation	11 years ago
Michael Peter Christen	cfa08024c7	removed optimization bevore postprocessing because that may cause a time-out which will cause that postprocessing fails.	11 years ago
Michael Peter Christen	6f3a923691	fixed urlmask which was not able to combine several constraints	11 years ago
Michael Peter Christen	9a27bf6e82	removed filter computation in Protocol class for remote searches because that is already done in the QueryParams class	11 years ago
Michael Peter Christen	f1b5db2c45	- performance graph does not shop peer ping in memory monitor any more - after a forced GC, the PerformanceMemory view switches to automatic update by default	11 years ago
Michael Peter Christen	a125904a1c	fixed a NPE in surrogat processing	11 years ago
Michael Peter Christen	0db8e34625	enhanced webgraph processing	11 years ago
reger	ac067b5236	clean-up Jetty handler classes	11 years ago
reger	b75e92aac3	add read queryparameter in gsaservlet	11 years ago
reger	1e94719084	fix NPE on mime detection of unknown file extension	11 years ago
reger	effea4bca0	Merge origin/master into jetty Conflicts: source/net/yacy/cora/federate/solr/SolrServlet.java	11 years ago
sixcooler	2c2ebb0d92	tried some hardening in order not letting any Solr-Searchers open	11 years ago
Michael Peter Christen	a16534cb0a	tried to fix timeout and connection-lost problems when using an outside solr.	11 years ago
Michael Peter Christen	c3dcbdc8d5	try to recover from an OOM during citation index reading and fail-over to second solr core in case of unrecoverable OOM.	11 years ago
Michael Peter Christen	9932c441c8	fixed a problem with Date fields parsing Solr results if a remote Solr is attached.	11 years ago
sixcooler	94db054aff	memory-leak-fix: the DocListSearcher fires an query in its constructor and it is highly recommend to close every SolrRequest. Every Request, which is not closed leaves a Searcher with its Chaches an can not be garbage-collectet.	11 years ago
reger	26bb1e37b7	implement core selection in SolrServlet - making initcore() obsolete	11 years ago
Michael Peter Christen	ae55d69ef6	include/exclude size NPE fix (recently added)	11 years ago
Michael Peter Christen	2c39b65409	fixes for searches containing stopwords. The fix was done using a reconstruction of the search word set access method to protect that words are deleted from the sets from the outside of the QueryGoal class.	11 years ago
Michael Peter Christen	5592ea57f0	hack to remove compiler warnings about deprecated classes. It would be better to remove the deprecated usage but to do this the Solr core must adopt the latest apache http core changes as well .. this is not our fault.	11 years ago
orbiter	037cd0a57c	using the BinaryResponseWriter which is supported within the YaCy solr servlet since YaCy 1.63. This is much more performant for the client than using the XMLResponseWriter because parsing of XML data is very CPU intensive. Older YaCy peers are still requested using the XMLResponseWriter but the majority of YaCy peers already respond with the binary writer. This makes remote searches much faster and less CPU intensive.	11 years ago
orbiter	61409788eb	less word hash computations (removing some overhead because of MD5 calcs) using the clear word in a normalized form.	11 years ago
reger	f23471c471	add check to prevent index entries containing url_file_ext_s with ";jsession=xyz" note: check could be implemented in MultiProtocolURL (but at this time didn't oversee possible implication)	11 years ago
reger	5c4a3d1c01	Merge origin/master into jetty	11 years ago
reger	444a9ae674	remove unused options and attributes from DefaultServlet cleanup obsolete class files	11 years ago
reger	8da75a4b0c	fix contentType definition for Solr html responswriter from xml to html (hint: value is currently not used, but is in SolrServlet)	11 years ago
Michael Peter Christen	ccf2f4e43b	refactoring of seed attributes (introduced more constants)	11 years ago
Michael Peter Christen	1f0bfa8fec	added test to Base64Order (runs successfully!)	11 years ago
orbiter	b7f1e5af51	added new servlet which generates the same file as the principal peers upload to a bootstrap position you can call it either with http://localhost:8090/yacy/seedlist.html or to generate json (or jsonp) with http://localhost:8090/yacy/seedlist.json http://localhost:8090/yacy/seedlist.json?callback=seedlist	11 years ago
orbiter	3e552550d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	c2d720cdaf	purge a lucene cache - possible memory leak fix	11 years ago
reger	e4f49fb175	for searchresults with empty title use filename as title - to not store a title in index which isn't extracted from source the title is empty check only added to ResultEntry class	11 years ago
reger	b1dc9a6f52	- disable Jetty servlet defaultUseCache (prevent double caching) - include short memory status check for class cache in DefaultServlet - remove obsolete Resource interface for Jetty8YaCyDefaultServlet	11 years ago
reger	f111f30ace	Merge origin/master into jetty	11 years ago
reger	94293176a3	use writeOptionHeaders with ServletResponse parameter only	11 years ago
orbiter	ff86cb683f	fixed some XSS bugs reported by Marius from http://ctf365.com/	11 years ago
orbiter	da33ee0d77	extended also timeout fr webgraph postprocessing	11 years ago
orbiter	74f9e40747	extended timeout during postprocessing of 30 minutes.	11 years ago
orbiter	19a051bec8	more monitoring for postprocessing and enhanced layout in Crawler monitor page	11 years ago
Michael Peter Christen	9cf9727685	fix for wrong counter	11 years ago
Michael Peter Christen	fceac8cffd	more monitoring for postprocessing	11 years ago
Michael Peter Christen	6842783761	fixed and enhanced postprocessing	11 years ago
Michael Peter Christen	219d5934a4	fixed termination bug in Solr Connector	11 years ago
Michael Peter Christen	bf1bdd52a6	prevent requesting of 0-facets (which actually exist)	11 years ago
Michael Peter Christen	9d5895f643	enhanced and fixed postprocessing	11 years ago
Michael Peter Christen	f86fe90eda	enhanced mass storage speed to remote solr servers	11 years ago
Michael Peter Christen	6ed9821209	fixed several problems in solr connectors	11 years ago
Michael Peter Christen	191fd3d7e7	added an optimization option to HandleSet mass data storage structure	11 years ago
Michael Peter Christen	94b565ea0d	fixed keepalive min value	11 years ago

... 10 11 12 13 14 ...

7817 Commits (fb75fea446db62403c243471255b646b95e1043b)