Commit Graph

3511 Commits (caf9e98f09b144933c9f23840cebfc8b5739a931)

Author SHA1 Message Date
reger caf9e98f09 put metadata dc_publisher in corresponding schema field
9 years ago
reger 38e2b054d4 remove servlet classloder internal cache map (to save the resources, cache hits marginal)
9 years ago
reger 6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
9 years ago
reger b65e2b527d include use of condenser's content text for language detection.
9 years ago
reger 937fbb0b9f correct isHidden() for smb from last commit
9 years ago
reger 535d4bf75f respect hidden attribute for file and smb directory listing
9 years ago
reger c28142095a add findClass() to servlet class loader (used in YaCyDefaltServlet)
9 years ago
reger a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
9 years ago
reger 2048b7e057 support scraping start-/enddate from html tag with property "datetime"
9 years ago
reger 900d4584ba complet resource cleanup of lists in contentscraper's close()
9 years ago
reger 1f18653de0 pass parsed swf content trough htmlscraper
9 years ago
reger 18ecf57792 add support of compressed swf to swfParser
9 years ago
sixcooler 5cb7ba0dc4 fix for connections not getting closed to get favicon.ico during seach
9 years ago
reger ed3e16e092 apply remote result count config value to Bookmark Autosearch
9 years ago
Ryszard Goń a98c395023 Add the Autocrawl thread
9 years ago
Ryszard Goń 1728cd30c6 Create autocrawl profiles
9 years ago
reger ff27824964 fix swfParser reading file signature
9 years ago
reger c91e712178 further refactor using standard java / (one) utf-8 charset variable
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
sixcooler 5a35f9383a bump to solr/lucene 5.4.0
9 years ago
reger a58d34a4e8 check error URL cache before adding errorDoc to index
9 years ago
reger e9539b1086 reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart
9 years ago
reger cd26717ba2 fix low memory status hint (dht-in disabled)
9 years ago
reger a5faf73afa remove obsolete yacy.init entries interaction.*
9 years ago
sixcooler dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger 46ac0867ff fix poison mediawikiimporter output queue also after ExecutionException
9 years ago
reger a7591d3ed0 fix mediawikiimporter number format exception on coordinate parsing
9 years ago
reger 9da1712a31 increase http header EXPIRES for css and images in DefaultServlet
9 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
reger 80e2c82249 fix NPE on empty blog importfile parameter
9 years ago
reger e84d94f8ca fix mime table for ms office / open office documents
9 years ago
reger 45b9bd8403 adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
9 years ago
reger d5fd031449 fix reading of ippattern config array in URLProxy
9 years ago
reger b7e8358645 make use of header.getContentType where possible (mime is normalized afterwards)
9 years ago
reger 7a8c077838 fix HeaderFramework.mime() to strip charset parameter.
9 years ago
reger b4b6910d60 fix (todo): correct doc.id of remote search result if no match with newly
9 years ago
reger dec3e6ad96 fix: adjust urlstub for mailto links
9 years ago
reger cb83e65f89 drop returning document language "en" if unknown (fix todo)
9 years ago
reger 0c5548a7ff fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger 71c416f383 show mailto links in ViewFile.html linklist
9 years ago
reger 6b7c10cef8 fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago
reger 14803d58cd let html scraper accept html5 <link rel="icon"> for favicon links
9 years ago
luc b4cdacee76 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc ba0a293f5c Corrected another case of
9 years ago
reger 4d2b934487 prevent mailto links getting into parser result document's in/outbound link collection
9 years ago
luc 8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
9 years ago
luc a2c08402af Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 70595d05d0 Modified MemoryControl.main() test to properly end for better results
9 years ago
sixcooler 1be67d9ab6 CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years
9 years ago