yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	4d3cc02168	replaced old bzip2 library against better documented commons-compress package from http://commons.apache.org/compress/	13 years ago
Michael Peter Christen	c15fcde1c8	add-on to latest commit	13 years ago
Michael Peter Christen	81737dcb18	removed stack trace from swf parser since we cant do anything there	13 years ago
Michael Peter Christen	acf8d521a2	fix for http://bugs.yacy.net/view.php?id=126	13 years ago
Michael Peter Christen	89142d1e8d	removed (not all) warnings	13 years ago
Roland 'Quix0r' Haeder	a093ccf5eb	Now used synchronization in all close() methods to make sure all objects are 'closed' in an ordered way Conflicts: source/de/anomic/http/server/ChunkedInputStream.java source/de/anomic/http/server/ChunkedOutputStream.java source/de/anomic/http/server/ContentLengthInputStream.java source/net/yacy/cora/protocol/Domains.java source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java source/net/yacy/document/content/dao/PhpBB3Dao.java source/net/yacy/document/parser/html/AbstractTransformer.java source/net/yacy/kelondro/blob/BEncodedHeap.java source/net/yacy/kelondro/blob/HeapReader.java source/net/yacy/kelondro/index/RAMIndexCluster.java source/net/yacy/kelondro/io/ByteCountInputStream.java source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java source/net/yacy/kelondro/table/SQLTable.java	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	09484955dc	added new entry class for embed tags	13 years ago
Michael Peter Christen	453010bd68	- solved problems with backpath normalization - redesigned in/outbound link handover - removed iframe links from inbound/outbound in solr scheme	13 years ago
Michael Peter Christen	659178942f	- Redesigned crawler and parser to accept embedded links from the NOLOAD queue and not from virtual documents generated by the parser. - The parser now generates nice description texts for NOLOAD entries which shall make it possible to find media content using the search index and not using the media prefetch algorithm during search (which was costly) - Removed the media-search prefetch process from image search	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	a1a5b015d8	refactoring: moved document Classification to cora package	13 years ago
Michael Peter Christen	4d5da75814	fix for parser problem if a <a>-tag is 'within' html tags with unclosed tags. That prevented the <a> tags from beeing recognized. This is a fix for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516	13 years ago
Michael Peter Christen	046f3a7e8d	check if httpc has decompressed the release file and rename the file from .tar.gz to .tar if that happened	13 years ago
Michael Peter Christen	e101c2e0e2	added changes from copperdust (submitted by email): 1. Improved and fixed language detection: 1.1 Identificator.java - recognition fix (improved) 1.2 DCEntry.java - fix (changed detection order due to detection from tld in many cases is incorrect) 1.3 MultiProtocolURI.java - fixed and enhanced language from tld detection (all currently used top-level domains; ccTLD added but not tested). 2. Ukrainian language update. 3. Main Slavic languages langstats (tested and works fine).	13 years ago
Michael Peter Christen	8d63a5887c	bugfixes	13 years ago
Michael Peter Christen	9ad1d8dde2	complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order.	13 years ago
Michael Peter Christen	7e4e3fe5b6	free some memory after parsing html	13 years ago
Michael Peter Christen	4540174fe0	memory hacks	13 years ago
Michael Peter Christen	2e5cd6a1b2	fixed parser extension deny list generation and usage	13 years ago
Michael Peter Christen	8bee1472c9	there is no noindex, only nofollow in links	13 years ago
Michael Peter Christen	c560a582ac	fix for single-word vocabulary lines	13 years ago
Michael Peter Christen	ef78f22ee1	performance hack	13 years ago
Michael Peter Christen	1f4f60654a	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/document/parser/pdfParser.java	13 years ago
reger	32104360ce	PDFParser - return at least first 3 pages of PDF fix for pdf parsing without returning parsed text due to interruption by time out.	13 years ago
Michael Peter Christen	eadb58dd87	small enhancements in pdf parser	13 years ago
reger	b616de5973	PDFParser - return at least first 3 pages of PDF fix for pdf parsing without returning parsed text due to interruption by time out.	13 years ago
Michael Peter Christen	7f9b6b7a0c	added switches to ConfigParser to accept/deny documents by their extension	13 years ago
Michael Peter Christen	4901cee3cc	suppress auto-tagged subject entries when sending out or receiving metadata from other peers	13 years ago
Michael Peter Christen	83009d86f7	added the vocabulary navigator. It can be very simply tested by switching on the locale dictionaries.	13 years ago
Michael Peter Christen	a58dc4a91f	added autotagging to document condenser: - tags that are automatically generated now enrich the dc:subject - auto-generated tags have a '$' at the beginning of the tag - auto-generated tags lead the tag name with a vocabulary name each tag has the form $<vocabulary-name>:<tag-printname-space-replaced-by-'_'>	13 years ago
Michael Peter Christen	254adea51c	small fixes	13 years ago
Michael Peter Christen	b7bb84c0bb	set a limit to CharBuffer object size to fight against bad/too large content	13 years ago
Michael Christen	e6d51363ee	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Marek Otahal	72adbeae90	!Important: move from Hashtable to HashMap Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue, but I found notices that some (ugly big) helper classes had to be created in past to compensate missing Hashtable's functionality. I'd like input if we can remove some of them. look for //FIX: if these commits Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Michael Christen	fa8da7f89d	vocabularies are now also used as source for a did-you-mean computation	13 years ago
Michael Christen	eaec14ecc4	Dictionaries from words caches can now be used as autotagging vocabulary	13 years ago
Michael Peter Christen	91940fdf56	redesign of WordCache to be prepared to hold multiple independent dictionaries. Such dictionaries can then be also used as simplified vocabularies.	13 years ago
Michael Christen	bd40a10230	added autotaggig stub .. only reading and parsing of vocabularies at this time	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Christen	1f4afb4dc0	performance hacks	13 years ago
Michael Christen	762e0ecfb6	fixed localization dictionaries, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3418&view=next	13 years ago
Michael Christen	9cd469e6d6	added pull request from als plus an NPE fix	13 years ago
Al Sutton	39898cb94a	Added try/finally protection to ensure streams are closed. Added initial size guess for the CharBuffer	13 years ago
Al Sutton	4c67a964a1	Added try/finally protection to ensure streams are closed. Added initial size guess for the CharBuffer	13 years ago
Al Sutton	3f9b9f953f	Added close() to ensure buffer close actions are invoked	13 years ago
Al Sutton	d73c84f9a0	Allow initial buffer size definition in TransformWriter, and use available() method to set it in htmlParser. In this situation a ByteArrayInputStream is used so the available() method gives a good size estimation and avoid the buffer needing to be continually grown	13 years ago
Al Sutton	f02ea27b31	Added missing closure of ByteArrayInputSteam	13 years ago
Al Sutton	8993cac4d8	Initial performance improvements	13 years ago
orbiter	ebd840ebf6	- enhanced description on search front page - fixed language and heuristic modifier - added hint to crawl start that we can do also ftp and smb crawls - added a protocol extension to remote crawls to transport all search modifiers to remote peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8108 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e22f8497c9	- tested the ARC methods - removed strict authentication (if password is empty; this was buggy and not useful; can be switched on if necessary globally and not for each interface method) - increased speed of CrawlResults page (no dns lookup any more) - increased speed of favicon display (removed dns lookup) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8104 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5a55397f99	some last-minute performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	564374d1fe	- included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand. - reworked bookmark creation on crawlstart - many smaller adjustments to ymarks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8072 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	804e48888b	smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter should also be a little bit faster git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8057 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	85d6bf4ac4	fixed urls to media content during indexing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8021 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	0d858d48ec	replaced String with StringBuilder in suggestion process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8020 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
low012	277b454a62	) added comments ) minor refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7971 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	6b22865dbc	- removed some warinings - removed a dead update location git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	8a428d3e77	ensure termination of pdf parser to avoid deadlocking of other processes during search result preparation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7958 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	85a5487d6d	YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7943 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	0819e1d397	protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7942 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	49e5ca579f	added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	610b01e1c3	- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index. - some refactoring for mime type discovery git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	b5252ef91f	added new word recommendation library in DictionaryLoader_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	1c007188ad	bugfixes in html parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	231074bf0a	fixed a parsing bug by reverting SVN 7766 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7910 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
low012	24e76a7b69	) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.) ) Added description of where to place MediaWiki dump for import. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7905 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5dd2efc9a2	- bugfixes in html parser - new fields in solr - extended file viewer to debug parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	51cf697acd	refactoring: moved all score-related classes to new ranking package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	eb14111200	encapsulate potential expensive objects in TextSnippet to allow GC them asap this reduces chance of OOMs at massive search & snippet-fetching git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	a311596881	finishing up my commits (7855-7858) which could be helpful for not declaring inside loops (helps GC of some VMs) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	9170a434ed	throwing an exception again in FileUtils.copy(reader, writer) OOMs could occour here and should not be ignored git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7858 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	ce248cc8dd	less byte-arrays of response-content, less byte-array <-> stream conversation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7856 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	59b767eebd	stop loading via http at defined maximum of bytes - even size is unknown before loading using max-file-size of type int for parsing documents (since content is used as byte-arrays, 'integer' should be maximum) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	299af4943c	added another memory protection hack git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7849 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b06faab9d3	do not allocate a StringBuilder object in case that there is not enough memory for that git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7846 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	2d4bb139d3	- added counting of links with noindex tag for solr index - bugfixes for solr index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	bda3eec0ff	added parsing of canonical link element to html parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9706fc55aa	enhanced content scraper (should discover urls much faster in case of very large plain texts) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7787 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f667b9c289	enhanced identificator: using AtomicInteger for counter git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7785 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	115abc8917	- more attributes for search progress bar - moved cache strategy to cora package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	77fe69395d	added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7774 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	0c1b29f3c9	- applied many small performance hacks - added a memory limitation in the zip parser and the pdf parser - added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager. - added a search cache deletion process that removes search requests in case that throttling happens git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7766 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4bea3f9714	hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes). The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e28bd0d038	fix for some possible causes of memory leaks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7741 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	10e2f588f8	- enhanced ybr ranking computation - many speed/performance hacks - added solr charding and new charding web interface - added option to switch off the yacy index when using solr - added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr - refactoring/renaming of some method names to distinguish host/url hashes better - a large number of bug/npe fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3ed4a09368	small features, some bug fixes and performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	205cc75157	abstraction of surrogate main element (xmlns:geo was missing for wiki extracts) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7727 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	021840e5ba	removed (almost) deadlocks and unnecessary CPU load git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7726 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9248a4eef4	reduce teh effect of 'Bildersuche findet generierte HTML-Seiten als Bilder' see http://bugs.yacy.net/view.php?id=9 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7705 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	76f2817e00	a fix for the snippet computation and hopefully better snippets git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7701 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	deda54d684	- relaxed matching of string-search (this is now case-insensitive) - added transport of string-search pattern to remote search protocol - fixed a problem parsing snippets with a '-' inside git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7700 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	15e3a57b4e	removed unused functions in condenser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7698 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e3d19d0a90	fix in Document inboundlinks/outboundlinks sorting git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7690 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4e8fa03514	added more attributes to html evaluation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7688 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	528da7c9ea	removed unused class and added license header for new class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7680 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f6077b3cc0	added more attributes for html parser and enhanced data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7679 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	d8e934c085	better abstraction of http client identification git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7675 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b77b8cac0c	- enhanced html parser: recognized much more details in the content - added more properties to solr index - refactoring - more constants in switchboard - fix for some NPEs - recognition of more images - removed synchronization in HandleMap (obviously not necessary?) - added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7672 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3d5104d357	- fixed a bug in crawl start with file name (npe in new url) - added deletion of solr index in IndexControlRWIs - added asynchronous adding of large url lists (happens when crawls are startet with file) - fixed npe in Image display - replaced language warning with fine logging - added a domain name cache in Domains that helps to speed up the isLocal property (less DNS lookups) - added a new storage class for this new cache: KeyList. The domain key list is stored in DATA/WORK/globalhosts.list - added concurrent solr updates and chunked transfers (50 documents until a commit is done) for high-speed feeding (> 40000 ppm) - fixed a bug in content scraper that chopped off large parts of crawl lists (using crawl start from file) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7666 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	958ff4778e	enhanced location search: search is now done using verify=false (instead of verify=cacheonly) which will cause that much more targets can be found. This showed a bug where no location information was used from the metadata (and other metadata information) if cache=false is requested. The bug was fixed. Added also location parsing from wikimedia dumps. A wikipedia dump can now also be a source for a location search. Fixed many smaller bugs in connection with location search. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7657 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	c17d102bd8	enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7653 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b788182954	some enhancements to scoring speed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7652 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	01690eab86	fix for mediawiki importer and wikicode parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7651 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4c013d9088	more UTF8 getBytes() performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7649 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	564184909a	enhanced the surrogate parser: better reading of UTF-8 characters git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7634 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	156cf02703	- added an index constraint 'has location' to the condenser - added evaluation of the 'has location' constraint to search using the /location operator git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7633 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	0430a94eaa	the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages - added parser for in-text appearing geo-locations - added geo-locations to rss search result - added evaluation of metadata-attached geo-locations in yacysearch_location to show search results within a map git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7631 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9b25d07295	- added geo information parsing to html parser - extended metadata information in index with geolocalisation - added display of location in yacydoc and ViewFile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7629 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f3baaca920	- enhancements to DNS IP caching and crawler speed - bugfixes (NPEs) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7619 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	78d4c45d09	enhancement during search process: fast fail of search in case that all index feeder have terminated. This change should affect filtering and navigators and should cause that search navigation gets faster git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7614 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a50f28e6e7	- fixed missing save operation for peer name change - fixed import of mediawiki dump files - added script to add mediawiki dump files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7609 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	1989ebc24b	removed more warnings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7598 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	8f11d3a5bb	redesigned the ScoreMap classes: - new concurrent score map using atom operation from java concurrency classes - redesigned difference beween StaticScore and Dynamic Score into ScoreMap and ReversibleScoreMap allowed that many classes can now use simple ScoreMap Objects which can be used better in concurrent environments using the ConcurrentScoreMap - switched from DynamicScore to ConcurrentScoreMap usage wherever possible git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7586 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	694fa3a2a5	- replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion - changed menu structure slightly git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	30aed9824a	moved getBytes() to UTF8.getBytes() to use a default String encoding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7580 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
lotus	cb6d307bba	adding extension for parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7579 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3820525464	more memory protection: auto-flush of caches in case of memory shortage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7575 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e1b6916423	always try to guess the size of a StringBuilder to prevent too many memory re-allocations git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7572 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	3b40b98256	) set SVN properties ) minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7567 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	cb1f49d0f2	replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	8d14916c74	more patches for a better out-of-memory management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7555 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f8d0454c53	small bug fixes and experiments with search speed enhancement git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7549 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5e186e0122	continuing the fight against deadlocks during time formatting: better caching. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a92d80a545	performance enhancements using an alternative to a insensitive collator (a complex string compare): - less synchronizations - better speed ..at most important and commonly used classes: http headers, url parsing and html parsing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7526 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e717bf74ba	more logging, more care about OOMs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7503 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5892fff51f	introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased. Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods. The number of maximum peers is now not fixed to a specific number but may increase with - the partition exponent - the number of redundant peers - the robinson burst percentage - the multiword burst percentage The maximum can then be the number of senior peers (all visible peers). git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4588b5a291	- fixed document number limitation for crawls that restrict the number of documents per domain - some restructuring of the document counting and logging structures was necessary - better abstraction of CrawlProfiles - added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation - more refactoring to get the LibraryProvider more clean - some refactoring of the Condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	0cdfb82963	replaced more appearance of double values by float values git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7461 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	eb12e15738	moved all Double values to Float values because of http://www.exploringbinary.com/java-hangs-when-converting-2-2250738585072012e-308/ YaCy does not really need double-precision floating point computation anywhere, so this should not affect any feature git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7460 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	88773e4daa	changed the default port from 8080 to 8090 see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	9f38c0023d	*) Minor changes, mainly cleaning up a little bit, no functional changes. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7428 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	54e77e6255	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7426 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	10ae8d961b	- cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring) - cleaned up (removed special code and documentation for 27c3) - added remote search functions to be used within cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	9eae33f886	*) Ooops... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7406 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	a001e8075c	*) minor enhancements git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7405 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	11ea966f9e	) added SID file (Commodore 64) sound file parser ) minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7403 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3ca06d6290	patch for http://forum.yacy-websuche.de/viewtopic.php?p=21460#p21460 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7399 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	936e976c23	) added FreeMind (http://freemind.sourceforge.net/) mindmap parser ) minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7397 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	3d95981f7d	) cleaning up the code a little bit ) minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7396 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	2a6499364d	*) minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7395 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	c0274bd123	*) minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7394 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	59b70a5a92	another fix to the ftp crawler: now correct directory listings according to rfc2640 (path with spaces) and better title names for such files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7386 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9b25a33fd9	- fixed numerous bugs - better document names - fixed problem with ftp crawling - added automatic removal of search results from services that are not online according to the latest network scan: this does not delete the index but just does not show them. after the next network scan when the server is available again, the results are again showed. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7385 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	7bdb13bf7f	more fixes to smb crawling: better file names git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7384 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	c288fcf634	redesigned CrawlStartScanner user interface and added more features: - multiple hosts for environment scans can be given (comma-separated) - each service (ftp, smb, http, https) for the scan can be selected - the scan result can be accumulated or refreshed each time a network scan is made - a scheduler was added to repeat a scan and add all found urls to the indexer automatically git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7378 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	9d2159582f	* fix system update if urls are in blacklist (for example for very general blacklists like *.de) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7375 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	56264dcc17	- added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls - integrated new parser into loader processes: enrich document parser - fixed a concurrent modification exception in kelondro iterator - hand-over of document size from crawler to indexer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7374 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago

1 2 3 4 5 ...

382 Commits (354f0d9acdcdaba63cf499c6481a80586b4f9930)