Commit Graph

7993 Commits (3a44ee74ba8089a54c30fd6e4d669e33f90e88c8)

Author SHA1 Message Date
reger a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
9 years ago
reger 2048b7e057 support scraping start-/enddate from html tag with property "datetime"
9 years ago
reger 900d4584ba complet resource cleanup of lists in contentscraper's close()
9 years ago
reger 1f18653de0 pass parsed swf content trough htmlscraper
9 years ago
reger 18ecf57792 add support of compressed swf to swfParser
9 years ago
sixcooler 5cb7ba0dc4 fix for connections not getting closed to get favicon.ico during seach
9 years ago
reger ed3e16e092 apply remote result count config value to Bookmark Autosearch
9 years ago
Ryszard Goń a98c395023 Add the Autocrawl thread
9 years ago
Ryszard Goń 1728cd30c6 Create autocrawl profiles
9 years ago
reger ff27824964 fix swfParser reading file signature
9 years ago
reger c91e712178 further refactor using standard java / (one) utf-8 charset variable
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
sixcooler 5a35f9383a bump to solr/lucene 5.4.0
9 years ago
reger a58d34a4e8 check error URL cache before adding errorDoc to index
9 years ago
reger e9539b1086 reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart
9 years ago
reger cd26717ba2 fix low memory status hint (dht-in disabled)
9 years ago
reger a5faf73afa remove obsolete yacy.init entries interaction.*
9 years ago
sixcooler dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger 46ac0867ff fix poison mediawikiimporter output queue also after ExecutionException
9 years ago
reger a7591d3ed0 fix mediawikiimporter number format exception on coordinate parsing
9 years ago
reger 9da1712a31 increase http header EXPIRES for css and images in DefaultServlet
9 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
reger 80e2c82249 fix NPE on empty blog importfile parameter
9 years ago
reger e84d94f8ca fix mime table for ms office / open office documents
9 years ago
reger 45b9bd8403 adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
9 years ago
reger d5fd031449 fix reading of ippattern config array in URLProxy
9 years ago
reger b7e8358645 make use of header.getContentType where possible (mime is normalized afterwards)
9 years ago
reger 7a8c077838 fix HeaderFramework.mime() to strip charset parameter.
9 years ago
reger b4b6910d60 fix (todo): correct doc.id of remote search result if no match with newly
9 years ago
reger dec3e6ad96 fix: adjust urlstub for mailto links
9 years ago
reger cb83e65f89 drop returning document language "en" if unknown (fix todo)
9 years ago
reger 0c5548a7ff fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger 71c416f383 show mailto links in ViewFile.html linklist
9 years ago
reger 6b7c10cef8 fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago
reger 14803d58cd let html scraper accept html5 <link rel="icon"> for favicon links
9 years ago
luc b4cdacee76 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc ba0a293f5c Corrected another case of
9 years ago
reger 4d2b934487 prevent mailto links getting into parser result document's in/outbound link collection
9 years ago
luc 8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
9 years ago
luc a2c08402af Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 70595d05d0 Modified MemoryControl.main() test to properly end for better results
9 years ago
sixcooler 1be67d9ab6 CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years
9 years ago
reger 28b8bc290a fix use of NETWORK_SEARCHVERIFY for rwi verification
9 years ago
reger 020630efd8 remove unused network scanner parameter from queryparameter
9 years ago
luc ad5586f8f6 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 8ebefa4233 Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
9 years ago
luc 7736ee5a42 Updated MediaWimporter main() : display usage in console and stop
9 years ago
reger cdb8f3b10d make current ranking score value avail. to search interface / api
9 years ago
luc 27d11f8671 Fixed isSolrDump function : PushBackInputStream was not unread when
9 years ago
Michael Peter Christen 135a123a77 less logging in new language detection
9 years ago
Michael Peter Christen ef8cd80593 fix for npe
9 years ago
reger 0347bfa71f Apply collection query constraint/modifiert to rwi result stack.
9 years ago
luc 2a67d2ba6f Corrected error management for unsupported image formats, parsing
9 years ago
Michael Peter Christen d6e9834040 Merge branch 'master' of
9 years ago
Michael Peter Christen d82d311995 Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger b5371ea8c1 read/init crawl queue in a thread
9 years ago
reger 1160b13172 remove unused md5 from ViewFile servlet params
9 years ago
reger e163ea88f6 fix vsdParser (Visio) parser return statement
9 years ago
reger b2c8bc0ae6 remove md5_s from default index fields
9 years ago
luc e40ae0943b - No max dimensions specified : render raw image data when source and
9 years ago
reger 90686a75a2 fix flux factor (additional crawl delay by access count) calculation
9 years ago
luc 4af27289e5 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 297fdb60d3 throw exception if crawler hostqueue can't create hostpath directory.
9 years ago
luc 755efac17d Use same max file size when loading all resource bytes or opening stream
9 years ago
luc bc6c79fc12 Corrected scaling function for non RGB images.
9 years ago
luc 1565559df8 Refactoring : extracted write InputStream method.
9 years ago
luc f0478bb14d BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
9 years ago
luc 07437986e7 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 97cc03ef6a start using a template for urlproxy header
9 years ago
luc f01d49c37a Process large or local file images dealing directly with content
9 years ago
luc 3c4c77099d If available, check content length before downloading. Check also
9 years ago
luc 5bbb2e1730 Ensure resource is closed when reading a full file InputStream
9 years ago
luc 6291a57300 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 0d3c5b223e have psParser cleanup temp file
9 years ago
reger 7d0d19cb8e avoid File.deleteOnExit() on temp files
9 years ago
luc bfe51001e3 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 02e4489a23 set tmpfile.deleteOnExit by default,
9 years ago
reger 2985baaa01 Exclude repetitive protocol part in tokenized url
9 years ago
reger ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
9 years ago
reger 52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
9 years ago
luc 49331dc523 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 47d70732f6 improve locale translator
9 years ago
sixcooler 646afe9183 do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler 194df613de not using 'location' as defaultfacetfield - since we removed it being
9 years ago
sixcooler d3b9349b6f simplification / speedup of GenerationMemoryStrategy
9 years ago
sixcooler 4a905ec134 fix to not let the AccessTracker-Log grow to much, but have enough data
9 years ago
reger 20e18d79f8 harmonize document title for archive parsers
9 years ago
luc f11b5e8309 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 112ae013f4 update bzip and bzip parser process,
9 years ago
reger e76a90837b update zip and tar parser process,
9 years ago
luc 4e673ffc9a Ensure closing of InputStream even when an exception occurs.
9 years ago
luc 10696b53f7 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 8532565c7d optimize order of parsers to try
9 years ago
reger 681889ae64 use current tar library for untar files
9 years ago
reger 5d71fc70e3 fix tarParser early exit on looping content
9 years ago
luc bcc2e7cb5b Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 2fcf6f104c fix bzipParser recognition
9 years ago
luc 745e97a575 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt()
9 years ago