yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
low012	277b454a62	) added comments ) minor refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7971 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	6b22865dbc	- removed some warinings - removed a dead update location git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	8a428d3e77	ensure termination of pdf parser to avoid deadlocking of other processes during search result preparation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7958 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	85a5487d6d	YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7943 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	0819e1d397	protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7942 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	49e5ca579f	added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	610b01e1c3	- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index. - some refactoring for mime type discovery git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	b5252ef91f	added new word recommendation library in DictionaryLoader_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	1c007188ad	bugfixes in html parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	231074bf0a	fixed a parsing bug by reverting SVN 7766 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7910 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
low012	24e76a7b69	) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.) ) Added description of where to place MediaWiki dump for import. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7905 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5dd2efc9a2	- bugfixes in html parser - new fields in solr - extended file viewer to debug parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	51cf697acd	refactoring: moved all score-related classes to new ranking package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	eb14111200	encapsulate potential expensive objects in TextSnippet to allow GC them asap this reduces chance of OOMs at massive search & snippet-fetching git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	a311596881	finishing up my commits (7855-7858) which could be helpful for not declaring inside loops (helps GC of some VMs) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	9170a434ed	throwing an exception again in FileUtils.copy(reader, writer) OOMs could occour here and should not be ignored git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7858 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	ce248cc8dd	less byte-arrays of response-content, less byte-array <-> stream conversation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7856 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	59b767eebd	stop loading via http at defined maximum of bytes - even size is unknown before loading using max-file-size of type int for parsing documents (since content is used as byte-arrays, 'integer' should be maximum) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	299af4943c	added another memory protection hack git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7849 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b06faab9d3	do not allocate a StringBuilder object in case that there is not enough memory for that git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7846 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	2d4bb139d3	- added counting of links with noindex tag for solr index - bugfixes for solr index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	bda3eec0ff	added parsing of canonical link element to html parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9706fc55aa	enhanced content scraper (should discover urls much faster in case of very large plain texts) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7787 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f667b9c289	enhanced identificator: using AtomicInteger for counter git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7785 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	115abc8917	- more attributes for search progress bar - moved cache strategy to cora package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	77fe69395d	added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7774 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	0c1b29f3c9	- applied many small performance hacks - added a memory limitation in the zip parser and the pdf parser - added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager. - added a search cache deletion process that removes search requests in case that throttling happens git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7766 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4bea3f9714	hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes). The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e28bd0d038	fix for some possible causes of memory leaks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7741 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	10e2f588f8	- enhanced ybr ranking computation - many speed/performance hacks - added solr charding and new charding web interface - added option to switch off the yacy index when using solr - added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr - refactoring/renaming of some method names to distinguish host/url hashes better - a large number of bug/npe fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3ed4a09368	small features, some bug fixes and performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	205cc75157	abstraction of surrogate main element (xmlns:geo was missing for wiki extracts) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7727 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	021840e5ba	removed (almost) deadlocks and unnecessary CPU load git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7726 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9248a4eef4	reduce teh effect of 'Bildersuche findet generierte HTML-Seiten als Bilder' see http://bugs.yacy.net/view.php?id=9 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7705 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	76f2817e00	a fix for the snippet computation and hopefully better snippets git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7701 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	deda54d684	- relaxed matching of string-search (this is now case-insensitive) - added transport of string-search pattern to remote search protocol - fixed a problem parsing snippets with a '-' inside git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7700 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	15e3a57b4e	removed unused functions in condenser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7698 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e3d19d0a90	fix in Document inboundlinks/outboundlinks sorting git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7690 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4e8fa03514	added more attributes to html evaluation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7688 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	528da7c9ea	removed unused class and added license header for new class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7680 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f6077b3cc0	added more attributes for html parser and enhanced data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7679 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	d8e934c085	better abstraction of http client identification git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7675 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b77b8cac0c	- enhanced html parser: recognized much more details in the content - added more properties to solr index - refactoring - more constants in switchboard - fix for some NPEs - recognition of more images - removed synchronization in HandleMap (obviously not necessary?) - added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7672 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3d5104d357	- fixed a bug in crawl start with file name (npe in new url) - added deletion of solr index in IndexControlRWIs - added asynchronous adding of large url lists (happens when crawls are startet with file) - fixed npe in Image display - replaced language warning with fine logging - added a domain name cache in Domains that helps to speed up the isLocal property (less DNS lookups) - added a new storage class for this new cache: KeyList. The domain key list is stored in DATA/WORK/globalhosts.list - added concurrent solr updates and chunked transfers (50 documents until a commit is done) for high-speed feeding (> 40000 ppm) - fixed a bug in content scraper that chopped off large parts of crawl lists (using crawl start from file) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7666 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	958ff4778e	enhanced location search: search is now done using verify=false (instead of verify=cacheonly) which will cause that much more targets can be found. This showed a bug where no location information was used from the metadata (and other metadata information) if cache=false is requested. The bug was fixed. Added also location parsing from wikimedia dumps. A wikipedia dump can now also be a source for a location search. Fixed many smaller bugs in connection with location search. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7657 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	c17d102bd8	enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7653 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b788182954	some enhancements to scoring speed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7652 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	01690eab86	fix for mediawiki importer and wikicode parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7651 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4c013d9088	more UTF8 getBytes() performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7649 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago

1 2 3 4 5

225 Commits (d871812621554588abfc82a3d4087db0f1d4374c)