Commit Graph

8419 Commits (3cedbbd4ed035682f4aab584535d1aa78e799c9a)

Author SHA1 Message Date
reger 4cc38e979d add InputStream close after reading input file (Vocabulary_p servlet)
9 years ago
reger 6bf9c55584 adjust Solr select servlet to lates bugfix for boostquery (bq param)
9 years ago
Burkhard 9a18e2297b Merge pull request #51 from JeremyRand/multiple-boost-query
9 years ago
reger f0d7b93372 make use and activate autodetect charset in Vocabulary input from file
9 years ago
JeremyRand 433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
9 years ago
JeremyRand 58824dfa6c Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.
9 years ago
reger 9e94989237 upd to PDFBox 2.0.1
9 years ago
reger d0a571bed2 del cytag trail for own index.html (save resource not used by default)
9 years ago
reger de46879637 fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut)
9 years ago
reger 24b0fa2a38 extend snapshot Html2Image.pdf2image to use PDFBox image export capability
9 years ago
reger eb2a00b1d8 fix NPE on missing crawldepth_i
9 years ago
reger efb9f1a8b7 save resource for unused blacklistFiles map
9 years ago
reger 5f113be760 cleanup connectPeer & yacyVersion.latestRelease usage
9 years ago
reger 7097dcbdbd cleanup hack for partial Solr update on multivalued datefields
9 years ago
reger f10ea3c155 clean-out unused SwitchboardConstants
9 years ago
reger ef24593347 delete obsolete SEARCHRESULT busythread constants
9 years ago
reger 125b5e26a5 apply bugfix for ChartPlotter from Pullreq 42
9 years ago
reger 06ce9ae711 prevent "unchecked conversion" compiler message
9 years ago
reger b4a576dbdf exclude unused protocol param "duetime"
9 years ago
reger 3bd6ae8d8b keep addon/Notepad++ keyword marker on lng export
9 years ago
reger 16837d60c7 fix version in locale version file
9 years ago
reger 0fb01e429e fix migration, account for ssl port in config (for auto-disable https)
9 years ago
reger 7be1c7a05a fix logger name
9 years ago
reger 1d940e5a94 upd commons-compress 1.11
9 years ago
reger 7789c32c82 delete crawl queue on init exception
9 years ago
reger f781b9dd47 revert call condition f. migration.installSkins
9 years ago
reger 3adb670f44 remove never used Domains.myHostNames set
9 years ago
reger 6ecc180299 fix rwi doubledom return best (highest) ranking
9 years ago
reger 2343e3f1cd keep and update existing xlf translation master instead of create new
9 years ago
reger a1935f485f Added utility class CreateTranslationMasters to create a language independant
9 years ago
reger acaf51b296 keep ConfigLanguage_p as 1st entry in exported translation file
9 years ago
reger 61c5b6b403 fix empty drop down list in ConfigLanguage after wrong/empty download
9 years ago
reger 4eddabee42 translate Network History screen -> de
9 years ago
reger 90c79014ae remove unused translator routine which also doesn't handle rel path input
9 years ago
reger 902e79e261 Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
9 years ago
reger d9adc2c255 load handler for Transparent Proxy on startup only if feature is activated
9 years ago
reger ec24a0c85a add test case for optimized toTokens()
9 years ago
reger cada24f918 adjust utility ListNonTranslatedFiles for path compare on windows
9 years ago
reger fb8ae14b21 make migration version safe
9 years ago
reger 258cd41577 reduce logging (EmbeddedSolrConnector.query)
9 years ago
reger 6783ef5540 move example code SearchClient out of yacycore package
9 years ago
Michael Peter Christen b89465d952 0N - basic dump upload servlet infrastructure, to share index dumps
9 years ago
Michael Peter Christen f12a900f3e harmonization of http post of files for one and several files - this had
9 years ago
Michael Peter Christen 849ab671a9 0n: modified the p2p bootstraping process - rules had been too tight and
9 years ago
reger 764f5100f0 fix delete of temp file after odt % ooxml parser
9 years ago
reger 379e9b330d use supplied url port to get robots.txt in crawlers hostqueue
9 years ago
reger 58a959403d fix mixed logfactory in UrlProxyServlet,
9 years ago
Michael Peter Christen 2494a820c7 0N - added recording of dump exports if given time frame is not negative
9 years ago
Michael Peter Christen ef2cc4f690 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen a6bf0b1649 0N - added option to generate index export files for a specific number
9 years ago
reger 6d56beaed8 fix assertion exception in toString of MultiProtocolURL
9 years ago
reger 42a7bdb2af fix SolrSelectServlet authentication to default to true
9 years ago
reger dbb28bb4f3 del unused statistic parameter (from status servlet)
9 years ago
reger 06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
9 years ago
reger caf9e98f09 put metadata dc_publisher in corresponding schema field
9 years ago
reger 38e2b054d4 remove servlet classloder internal cache map (to save the resources, cache hits marginal)
9 years ago
luc 3f338777f7 Also check and index eventual icon url information from metadata.
9 years ago
luc 9f712146df Display icons in ViewFile "links" mode.
9 years ago
luc 26f1ead57c Created ViewFavicon class specialized in favicon viewing.
9 years ago
reger 6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
9 years ago
reger b65e2b527d include use of condenser's content text for language detection.
9 years ago
luc 07222b3e1a Added favicon url transmission in RWI chunks.
9 years ago
luc 480772c070 Fixed json search results from commit "Improved URLLicence reliability"
9 years ago
reger 937fbb0b9f correct isHidden() for smb from last commit
9 years ago
reger 535d4bf75f respect hidden attribute for file and smb directory listing
9 years ago
luc 3cc5619d93 Improved HTML icons indexing and rendering in search results.
9 years ago
luc edef6cd0dc Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger c28142095a add findClass() to servlet class loader (used in YaCyDefaltServlet)
9 years ago
luc f7b854465b Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
9 years ago
reger 2048b7e057 support scraping start-/enddate from html tag with property "datetime"
9 years ago
reger 900d4584ba complet resource cleanup of lists in contentscraper's close()
9 years ago
luc aa60ad1dbc Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 1f18653de0 pass parsed swf content trough htmlscraper
9 years ago
reger 18ecf57792 add support of compressed swf to swfParser
9 years ago
sixcooler 5cb7ba0dc4 fix for connections not getting closed to get favicon.ico during seach
9 years ago
luc ef83e34b8a Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger ed3e16e092 apply remote result count config value to Bookmark Autosearch
9 years ago
Ryszard Goń a98c395023 Add the Autocrawl thread
9 years ago
Ryszard Goń 1728cd30c6 Create autocrawl profiles
9 years ago
luc 41767a01c2 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger ff27824964 fix swfParser reading file signature
9 years ago
luc 7aa1a29e33 Return more accurate HTTP status 400 with detail message when some error
9 years ago
luc bd9dc2f32b Corrected NullPointerException cases occuring in YJsonResponseWriter
9 years ago
luc 0076f9f97d Updated documented sample url
9 years ago
luc cfdbc2b487 Improved URLLicence reliability for use by conccurrent non authaurized
9 years ago
reger c91e712178 further refactor using standard java / (one) utf-8 charset variable
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
sixcooler 5a35f9383a bump to solr/lucene 5.4.0
9 years ago
reger a58d34a4e8 check error URL cache before adding errorDoc to index
9 years ago
reger e9539b1086 reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart
9 years ago
reger cd26717ba2 fix low memory status hint (dht-in disabled)
9 years ago
reger a5faf73afa remove obsolete yacy.init entries interaction.*
9 years ago
sixcooler dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger 46ac0867ff fix poison mediawikiimporter output queue also after ExecutionException
9 years ago
reger a7591d3ed0 fix mediawikiimporter number format exception on coordinate parsing
9 years ago
reger 9da1712a31 increase http header EXPIRES for css and images in DefaultServlet
9 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
reger 80e2c82249 fix NPE on empty blog importfile parameter
9 years ago
reger e84d94f8ca fix mime table for ms office / open office documents
9 years ago
reger 45b9bd8403 adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
9 years ago
reger d5fd031449 fix reading of ippattern config array in URLProxy
9 years ago
reger b7e8358645 make use of header.getContentType where possible (mime is normalized afterwards)
9 years ago
reger 7a8c077838 fix HeaderFramework.mime() to strip charset parameter.
9 years ago
reger b4b6910d60 fix (todo): correct doc.id of remote search result if no match with newly
9 years ago
reger dec3e6ad96 fix: adjust urlstub for mailto links
9 years ago
reger cb83e65f89 drop returning document language "en" if unknown (fix todo)
9 years ago
reger 0c5548a7ff fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger 71c416f383 show mailto links in ViewFile.html linklist
9 years ago
reger 6b7c10cef8 fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago
reger 14803d58cd let html scraper accept html5 <link rel="icon"> for favicon links
9 years ago
luc b4cdacee76 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc ba0a293f5c Corrected another case of
9 years ago
reger 4d2b934487 prevent mailto links getting into parser result document's in/outbound link collection
9 years ago
luc 8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
9 years ago
luc a2c08402af Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 70595d05d0 Modified MemoryControl.main() test to properly end for better results
9 years ago
sixcooler 1be67d9ab6 CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years
9 years ago
reger 28b8bc290a fix use of NETWORK_SEARCHVERIFY for rwi verification
9 years ago
reger 020630efd8 remove unused network scanner parameter from queryparameter
9 years ago
luc ad5586f8f6 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 8ebefa4233 Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
9 years ago
luc 7736ee5a42 Updated MediaWimporter main() : display usage in console and stop
9 years ago
reger cdb8f3b10d make current ranking score value avail. to search interface / api
9 years ago
luc 27d11f8671 Fixed isSolrDump function : PushBackInputStream was not unread when
9 years ago
Michael Peter Christen 135a123a77 less logging in new language detection
9 years ago
Michael Peter Christen ef8cd80593 fix for npe
9 years ago
reger 0347bfa71f Apply collection query constraint/modifiert to rwi result stack.
9 years ago
luc 2a67d2ba6f Corrected error management for unsupported image formats, parsing
9 years ago
Michael Peter Christen d6e9834040 Merge branch 'master' of
9 years ago
Michael Peter Christen d82d311995 Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger b5371ea8c1 read/init crawl queue in a thread
9 years ago
reger 1160b13172 remove unused md5 from ViewFile servlet params
9 years ago
reger e163ea88f6 fix vsdParser (Visio) parser return statement
9 years ago
reger b2c8bc0ae6 remove md5_s from default index fields
9 years ago
luc e40ae0943b - No max dimensions specified : render raw image data when source and
9 years ago
reger 90686a75a2 fix flux factor (additional crawl delay by access count) calculation
9 years ago
luc 4af27289e5 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 297fdb60d3 throw exception if crawler hostqueue can't create hostpath directory.
9 years ago
luc 755efac17d Use same max file size when loading all resource bytes or opening stream
9 years ago
luc bc6c79fc12 Corrected scaling function for non RGB images.
9 years ago
luc 1565559df8 Refactoring : extracted write InputStream method.
9 years ago
luc f0478bb14d BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
9 years ago
luc 07437986e7 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 97cc03ef6a start using a template for urlproxy header
9 years ago
luc f01d49c37a Process large or local file images dealing directly with content
9 years ago
luc 3c4c77099d If available, check content length before downloading. Check also
9 years ago
luc 5bbb2e1730 Ensure resource is closed when reading a full file InputStream
9 years ago
luc 6291a57300 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 0d3c5b223e have psParser cleanup temp file
9 years ago
reger 7d0d19cb8e avoid File.deleteOnExit() on temp files
9 years ago
luc bfe51001e3 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 02e4489a23 set tmpfile.deleteOnExit by default,
9 years ago
reger 2985baaa01 Exclude repetitive protocol part in tokenized url
9 years ago
reger ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
9 years ago
reger 52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
9 years ago
luc 49331dc523 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 47d70732f6 improve locale translator
9 years ago
sixcooler 646afe9183 do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler 194df613de not using 'location' as defaultfacetfield - since we removed it being
9 years ago
sixcooler d3b9349b6f simplification / speedup of GenerationMemoryStrategy
9 years ago
sixcooler 4a905ec134 fix to not let the AccessTracker-Log grow to much, but have enough data
9 years ago
reger 20e18d79f8 harmonize document title for archive parsers
9 years ago
luc f11b5e8309 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 112ae013f4 update bzip and bzip parser process,
9 years ago
reger e76a90837b update zip and tar parser process,
9 years ago
luc 4e673ffc9a Ensure closing of InputStream even when an exception occurs.
9 years ago
luc 10696b53f7 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 8532565c7d optimize order of parsers to try
9 years ago
reger 681889ae64 use current tar library for untar files
9 years ago
reger 5d71fc70e3 fix tarParser early exit on looping content
9 years ago
luc bcc2e7cb5b Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 2fcf6f104c fix bzipParser recognition
9 years ago
luc 745e97a575 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger 11f3666660 increase use of pre.defined CATCHALL_QUERY string
9 years ago
reger a58ee49307 Optimize internal imagequery focus on using content_type to select images
9 years ago
luc fc3294382e Updated javadocs for warning on target encoding format potential errors.
9 years ago
luc aa70ff4ff6 Corrected images alpha channel rendering
9 years ago
reger d223cf0ae4 adjust MediaWiki importer geo coordinate calculation
9 years ago
reger 2b775d5be6 fix typo in WikiCode coordinate calculation
9 years ago
reger bbe9df2bb3 fix MediawikiImporter for bz2 dump
9 years ago
reger c6687dd560 fix a system.out to log.fine
9 years ago
reger e53c6bbd51 fix init of peer flags
9 years ago
Michael Peter Christen ac034db8bc Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger 826f14f37f fix unnececary set null of peer flags, causing reread
9 years ago
luc 5902ce032e Corrected NullPointerException case when ImageIO reader is not found for
9 years ago
reger c6495a5b62 add a log entry on parsing ajax crawling scheme snapshot
9 years ago
reger 9252e36aeb implement ajax crawling scheme for ajax sites which adhere to the proposed use of hash-bangs to provide html content
9 years ago
Michael Peter Christen d1ae999ef9 replaced HashMap with LinkedHashMap to preserve the object order
9 years ago
Michael Peter Christen 7d075a1d76 added log lines
9 years ago
Michael Peter Christen 092dac086e Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger 7a64bebb86 init Recrawl job chunk size to max crawl loader during job start, to use some system preferences
9 years ago
luc d6522fa4a2 Integrated haraldk/TwelveMonkeys library to first add TIF image format
9 years ago
Michael Peter Christen 9244694e64 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 151ccd50a9 fix for image size field values (must be multi-valued)
9 years ago
reger c9937973e3 unescape MultiProtocolURL getAttributes() return values.
9 years ago
reger 78e8c6f3e5 refactor special handling (static override) of SUPPORTED_EXTENSIONS/MIME_TYPES
9 years ago
reger d54c5d310a add links with image extension not automatically to image links.
9 years ago
reger 851e8f6c8a check jpeg file signature in genericImageParser
9 years ago
reger fb75fea446 use recrawljob w/o sort results by date
9 years ago
reger 43c27aa550 upd to solr/lucene 5.3.1
9 years ago
reger 688f7b2a5c allow/display svg images in image results previews
9 years ago
reger d5330391de remove some unused var allocation in parser
9 years ago
Michael Peter Christen 3d7dd9d3aa follow-up to latest commit: also flush the search cache if all crawls
9 years ago
Michael Peter Christen c737ff235d in case that the include_string contains several entries including
9 years ago
Michael Peter Christen 8e555d79a3 add also 1-character tokens to the token list because that could be also
9 years ago
reger 7c82cd4415 add a end condition to svgParser for wrong content
9 years ago
reger 356d4d1301 remove rdfParser from init (current function identical with genericParser)
10 years ago
reger c647d899e3 add svgParser to parse metadate from svg images
10 years ago
reger bad34804fe optimize parseInt for <img> tag attribute parsing
10 years ago
Michael Peter Christen 6ebc2451a9 Merge pull request #14 from luccioman/master
10 years ago
reger 2f51baff4f check for loading error (includs unsupported formats)
10 years ago
luc 5578886f6f Merge branch 'master' of https://github.com/luccioman/yacy_search_server.git
10 years ago
luc c38d6c1f37 Correction for mantis 535: inurl: parameter doesn't work on URLs with
10 years ago
reger 52e3eb4ce8 harmonize/correct assignment to Ymarkmeta.mime
10 years ago
Michael Peter Christen 87f358058e Fix for index entries which have id's not computed as hash from the url.
10 years ago
reger 3f2b8ab5e5 optionally include mime in p2p url exchange string
10 years ago
reger a3195d78ae add Portuguese month names to date recognition
10 years ago
reger d2cc11ea8f fix html parser taking <style> content as text.
10 years ago
Michael Peter Christen 5f706797cb patch for a bug inside of solr since solr 5.0 when using a boost
10 years ago
reger 7889fc2389 Hack to prevent Solr issue on partial update on a document containing multivalued date field
10 years ago
reger b4cbdea1e7 adapt SolrServerConnector.add to handle error on partial update input document.
10 years ago
reger 98ab655917 on reindex delete index document with invalid url
10 years ago
reger 1e8369e18b use a parsed date in Document.toString
10 years ago
luccioman 199b2ce52d Translator refactoring : to simplify locale files writing, process keys
10 years ago
luccioman 4dd9c0d5d9 Merge from main repository
10 years ago
reger 3428b6f13b improve filtering by filetype navigator.
10 years ago
reger e37a4f0b3d prevent metadata records in index w/o valid url
10 years ago
reger 41c4eade51 extract modification date from vCard (vcfParser)
10 years ago
reger 8768896975 extract lastmodified from openoffice doc
10 years ago
Michael Peter Christen c40c302748 when many crawl queues are generated, this NPE can occur; probably
10 years ago
reger 367fe388b9 fix exception throw after sendError in DefaultServlet
10 years ago
luccioman 9752bd5f88 Added utils to help translation without launching full YaCy application
10 years ago
luccioman 2f0f0180e2 Added a function to list files recursively.
10 years ago
luccioman 7e4c1d2282 Translator refactoring :
10 years ago
reger 802ccaead6 fix init of error cache, use latest faildates => load_date_dt
10 years ago
reger dba7f15073 apply same size constrain on result image from doc
10 years ago
reger 4cf875336c complete TODO: getFileExtension handle dot in query part
10 years ago
sixcooler 87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
10 years ago
reger eaf0e8ff2c start recording/indexing pixel size for image document
10 years ago
reger c33229fc0c check mime prior to ext for metadata modification for images
10 years ago
reger 19f1308bf0 enforce th result images limit to > 16x16px
10 years ago
reger 0e4ba0360b fix NPE on .yacyh result url of disconnected peer
10 years ago
reger 7ed812a2bf log missing seed.port
10 years ago
reger 206883f80d fix: Preserve protocol in url proxy
10 years ago
reger f7b0b3b7b3 avoid runtime exception by earlier testing for seed.ip=null
10 years ago
Michael Peter Christen 906b5fd742 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen 8f90767889 fix for filesystem crawl
10 years ago
sixcooler a3dd4be749 added / corrected charste to be 1.7 compatible.
10 years ago
Michael Peter Christen 8028410ab7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen df3314ac1a added a new facet type based on a probabilistic classifier using
10 years ago
reger 1409cabe8b exclude more default search fields from text copy to text_t
10 years ago
reger e2e73258ca remove obsolete interface SearchAccumulator
10 years ago
Michael Peter Christen dbbad23e12 removed warnings
10 years ago
Michael Peter Christen 500cfa9457 enhanced logging
10 years ago
Michael Peter Christen c14bc8d9b7 revert of fq transformation (recent fix)
10 years ago
Michael Peter Christen 203df5a750 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
reger fa08ca207e ! finish running crawls before applying !
10 years ago
reger ee77f24e52 use some more declared HeaderFramework constants
10 years ago
Michael Peter Christen 11a848da5a Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen b94bd7f20a a collection of search query enhancements:
10 years ago
reger dbe2594c38 replace deprecated myPublicLocalIP() in AbstractRemoteHandler
10 years ago
reger 6d3534e725 remove unused Transmission hit counter
10 years ago
reger cb67eb7baf use more absolute path for config file opening
10 years ago
Michael Peter Christen 1ccbf739b1 added bayes filter from Philipp Nolte, originally taken from
10 years ago
Michael Peter Christen 1bced1ae60 using latest enhanced (un/)gzip methods from loklak for yacy
10 years ago
Michael Peter Christen 3e6657288d Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen de8cfbe1d7 added export option to export the fulltext of the search index text only
10 years ago
reger 2fb6ebe88a move java environment parameter setting disabling SNI (Server Name Indicator) support for https connections from code to startup script allowing admin to ~easy/transparent alter the YaCy default FALSE setting.
10 years ago
Michael Peter Christen fbeae20b3a try a healing of the cache if the index file is corrupted
10 years ago
Michael Peter Christen 03ea723889 added log lines for query performance profiling
10 years ago
Michael Peter Christen 0e87a99ab8 more fixes for special windows paths
10 years ago
Michael Peter Christen e5b6424eed patch for bad windows file paths
10 years ago
Michael Peter Christen 0aa6fcf259 remove old vocabularies and synonyms before adding new
10 years ago
Michael Peter Christen 289018b559 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen 7b412e8c07 added msg (text emails) format; should be handled by html parser.
10 years ago
reger f91298d3b6 fix one implicit Integer/Long type conversion
10 years ago
reger 821262a179 add CommonPattern for multiple spaces
10 years ago
Ryszard Goń 59096935d0 Use language-detection library for increased accuracy
10 years ago
Michael Peter Christen 90f75c8c3d added enrichment of synonyms and vocabularies for imported documents
10 years ago
Michael Peter Christen 7829480b82 refactoring: separated condenser and tokenizer
10 years ago
Michael Peter Christen 593de05922 enhanced surrogate import process speed (dramatically!)
10 years ago
Michael Peter Christen 3c4c69adea fix for
10 years ago
Michael Peter Christen 1fec7fb3c1 suppress access to solr when doing search suggestions in case that the
10 years ago
Michael Peter Christen 694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
10 years ago
sixcooler e427efbe54 Next Try for a fix for upload-connection staying in blocked state.
10 years ago
reger 0fab445b19 Resourceobserver log warning - deleting releases files - only on actual deletes
10 years ago
sixcooler ef6a64b2a4 Fix for upload-connection staying in blocked state.
10 years ago
reger c973f94936 add log entry on release file delete by ResourceObserver
10 years ago
reger 121972752c implement deleteOldDownloads in RexourceObserver on low diskspace
10 years ago
Michael Peter Christen 9c12555be5 added link to Snapshots in search results if the snapshot exists and
10 years ago
reger 72f6a0b0b2 enhance recrawl job
10 years ago
reger 7478338a40 remove augmented parsing activation from frontend
10 years ago
reger 11aa2edfe1 remove RDFa parser activation from frontend
10 years ago
reger 49b79987c9 remove obsolete searchfl work table
10 years ago
Michael Peter Christen d0aff91f23 fix for index import
10 years ago
Michael Peter Christen 34de1e8cbc gzip compression will perform more efficient and with better compression
10 years ago
Michael Peter Christen 98be59ce9c full solr xml exports will now be automatically compressed during
10 years ago
Michael Peter Christen a1a8edfc0a wrap HeaReader close() in a catch Throwable block to prevent that an
10 years ago
Michael Peter Christen b43811d38c added surrogate import process for exported solr dumps.
10 years ago
Michael Peter Christen b77537294d prevent disc usage when showing tray animation
10 years ago
Michael Peter Christen eec78e1b0c added intensity option to graphics
10 years ago
Michael Peter Christen a5007f345e re-licensing some of my old visualization classes under LGPL 2.1
10 years ago
Michael Peter Christen c99a665593 adding a 3-pixel font generator made some time ago..
10 years ago
Michael Peter Christen c7576d6028 added a full solr export to the IndexControlURLs_p.html servlet. The
10 years ago
Michael Peter Christen 197f7449e5 All entities of crawl profiles are now editable in the crawl profile
10 years ago
reger 1d8e1e4bac - Image search expand box, adjust javascript hs padtominsize parameter, to make sure expand box doesn't shrink on small images
10 years ago
reger 8b35656007 remove hard throw exception in makeResultEntry
10 years ago
reger af57fbefad use available mime (instead null) on imageresult from metadatanode
10 years ago
reger dd7782bac0 revert deletion of BinSearch
10 years ago
reger 000dde9511 Eleminate duplication of values for search ResultEntry
10 years ago
reger 29c4aa3991 fix compiler notification of missing serialID
10 years ago
reger 3d53da8236 refactor ResultEntry to be based on MetadataNode/SolrDocument
10 years ago
reger d882991bc5 Implement sharing of ioDispatcher for term & citation index
10 years ago
reger 370ba9da71 On imageSearch prefere mime to sort out none-image documents
10 years ago
reger cd31633369 improve MultiprotocolURL.getFileExtension()
10 years ago
reger c60ccdfbcf Increase IODspatcher dumpQueue size to 2 to reduce risk of concurrent emergency dump,
10 years ago
reger 8a9622c31c fix string OoB on getImagelinks with long alttext
10 years ago
reger 3e742d1e34 Init remote crawler on demand
10 years ago
reger 13f013f64a Limit extra sleep of BusyThread on LowMemCycle
10 years ago
reger cd7c0e0aae detail optimization of RecrawlThread
10 years ago
reger ace71a8877 Initial (experimental) implementation of index update/re-crawl job
10 years ago
reger 141cd80456 correct log msg text
10 years ago
reger f3ce99bfb8 fix extract of inboundlinks_protocol_sxt
10 years ago
reger 2bc9cb5828 fix early return in addToCrawler
10 years ago
Michael Peter Christen f5f88272e4 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen 5c67c4d460 fix for latest commit, see
10 years ago
reger c37dda8849 fix NPE on MultiProtocolURL on url with parameter value and '='
10 years ago
Michael Peter Christen f810915717 added crawl start from a clone with very, very large url: they are now
10 years ago
Michael Peter Christen 51de86c992 disabled debug thread dumps
10 years ago
Michael Peter Christen d524a9d77c Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen 0710648c31 enable api calls with very long urls
10 years ago
reger 31346e873b upd library reference of missing jsch-0.1.21 in seeduploadscp.xml
10 years ago
reger 609c52e987 refactor getBookmark
10 years ago
reger 1481a8ab56 add opensearch rss results to dht collection (due to text = snippet)
10 years ago
reger f134aa7f7f persist bookmark timestamp
10 years ago
reger 752eec6697 fix NPE in addToIndex when used outside searchEvent
10 years ago
Michael Peter Christen fbf85a1561 added temporary debug output in http client
10 years ago
Michael Peter Christen ff29b0e503 added option to re-index exported xml snapshot dumps to
10 years ago
Michael Peter Christen 6f4fe4b175 revert of 8a7c68e4c7
10 years ago
Michael Peter Christen 97930a6aad added must-not-match filter to snapshot generation.
10 years ago
Michael Peter Christen 9d8f426890 adding a try-catch to link graph processing to prevent that a single
10 years ago
reger 8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
10 years ago
reger 7224209486 break out of NormalizeDistributor loop on timeout
10 years ago
reger 47e61f8325 fix typo in image filter query
10 years ago
reger 4b4ab6799f fix String out of range in Collection Nav
10 years ago
reger 572cfe8fd4 improve character encoding for urlproxy servlet
10 years ago
reger 6bc8a9b11e make Quality of Service Servlet available to prioritize requests from local host
10 years ago
Ryszard Goń ca1a70aec8 fix for Accept '?' URLs column in Crawl Profile List
10 years ago
reger 5408448a56 skip redundant add. of keywords to text
10 years ago
reger 296e97c78e put https port in peers dna
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen b060ba900d added parsing of contentprop attribute in html tags for
10 years ago
Michael Peter Christen 4cb4f67f38 added parsing of dd, dt and article html fields. The parsed result is
10 years ago
reger 1395f10e95 fix typecast for css links
10 years ago
Michael Peter Christen 3288489fd2 more logging during start-up
10 years ago
Michael Peter Christen abaaaef5f1 fix for filter queries
10 years ago
Michael Peter Christen 4d00175157 <experimental> added parsing of <article> html element.
10 years ago
Michael Peter Christen 1df6492019 enhanced suggestions
10 years ago
Michael Peter Christen ae02c92fd0 logging fix
10 years ago
Michael Peter Christen 5651713134 better debugging of fq
10 years ago
Michael Peter Christen f5a032f293 split query into filter query and text query to get better ranking
10 years ago
Michael Peter Christen 2e88028c1a when selecting collections in navigation, do show the un-selected
10 years ago
Michael Peter Christen 1de9b21c65 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger 5f4cd8d6f5 replace deprecated getIP with getIPs in AbstractRemoteHandler
10 years ago
Michael Peter Christen fa7edc9f7a refactoring of filter queries (several queries instead only one)
10 years ago
Michael Peter Christen 40389987ec Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen f9ba50379d added an expansion option to search facets on result page:
10 years ago
reger 1f0f77bb77 make location facet return results
10 years ago
reger b1ec0644e5 fix NPE in location search on missing/empty PubDate in underlaying rss data
10 years ago
reger c1dcc8c456 fix display and limit of max server connections after startup
10 years ago
reger 839b962c20 correct percent encoding for '%' char
10 years ago
Michael Peter Christen 9bf0d7ecb9 added a new collection type 'dht' to all documents from the peer-to-peer
10 years ago
reger 796770e070 prevent overwrite of crawled or received full documents by (newer) metadata
10 years ago
Michael Peter Christen ee2490ab98 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger 431311df42 fix get fresh_date_dt to allow returned value to be date in future
10 years ago
otter 74c7e8b686 Fixes hanging FlushThread (see
10 years ago
reger f63fff9008 fix snippet containig number with comma as desmo point http://mantis.tokeek.de/view.php?id=344
10 years ago
reger b241264632 fix error on *abc query input
10 years ago
reger 2ef8ffdb60 apply UTF-8 encoding
10 years ago
reger 7120ea42f1 fix for path with char code > 255
10 years ago
reger 1d81bd0687 fix url encoding for path see http://mantis.tokeek.de/view.php?id=559
10 years ago
reger 62087fb8b2 fix MultiProtocolURL mailto protocol detection
10 years ago
reger 2e8c24e02a fix link to DeReWo download file
10 years ago
reger 706f75ddc2 try to fix hang on index blob merge on shutdown
10 years ago
reger f94e34058c fix url (path) %-decoding http://mantis.tokeek.de/view.php?id=519
10 years ago
reger 7e09bff4a1 exclude default search fields from text copy to text_t
10 years ago
reger 86073a5ba3 For remote crawlReceipt add document abstract/description
10 years ago
reger 8af70950d9 harmonize snippet computation
10 years ago
Michael Peter Christen fd4e2c809a Show dates in the content of a document in the search result:
10 years ago
Michael Peter Christen 893889bc7b added special terms for on: - Date modifier: tomorrow, today; i.e.:
10 years ago
Michael Peter Christen 710a0efa1b generalized time period computations
10 years ago
Michael Peter Christen d9d3111d10 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
10 years ago
reger d7259419f3 postpone raw snippet html encoding upon use
10 years ago
reger de56d934b2 apply query parameter getQueryFields() to GSA servlet
10 years ago
reger 2d2299f484 fix mimetype of rss items in rss parser
10 years ago
Michael Peter Christen b432049d59 enhanced date parsing time
10 years ago
reger 9b0de2de64 introduce getQueryFields to return default query fields (queryparamter QF)
10 years ago
reger a0f04db9ea add extracted description/subject to pptParser
10 years ago
reger 8ec1db76ee url unescape add check for inconsistent utf8 multibyte parsing
10 years ago
reger 4b97ddb9ec stop sending crawl receipts if receiver got offline
10 years ago
reger 7e35518787 add extracted description/subject to docParser
10 years ago
reger f0a5188e11 replace depreciated HTTPClient setStaleConnectionCheckEnabled with setValidateAfterInactivity()
10 years ago
reger 7b569d2dbe replace depriciated HTTPClient ALLOW_ALL_HOSTNAME_VERIFIER with NoopHostnameVerifier()
10 years ago
reger fba34e12ef fix formatting issue if snippet contains html code
10 years ago
reger e48720a58c fix NPE in snippet computation
10 years ago
reger eda0aeaf26 allow/recognize host in file: protocol crawl target
10 years ago
reger df83fcc4fc disable optimistic GC assumption in StandardMemoryStrategy
10 years ago
Michael Peter Christen 8ff76f8682 the cleanup process experienced a 100% CPU load situation and the loop
10 years ago
Michael Peter Christen 1f5b5c0111 npe fix for latest scraper feature
10 years ago
Michael Peter Christen ee97302a23 hack to make date detection faster (while it becomes a bit incomplete
10 years ago
Michael Peter Christen 6578ff3ddb enhanced suggest function
10 years ago
reger fe6f5a395d fix Umlaut handling in blekko heuristic search term
10 years ago
reger 23924348e2 url with semicolon or comma handling in proxy request
10 years ago
reger 9025fe3518 upd error message for proxy
10 years ago
Michael Peter Christen 97ba5ddbb7 configuration option for maxload limit for remote search
10 years ago
reger c454ef69c6 add shortMemory check to heuristic search
10 years ago
reger 9e1ec5fec4 refactor: just some more useages of constant for term ":[* TO *]"
10 years ago
reger 8c491f51a5 remove hardcoded initialization of language nav if not used
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 1cb290170e refactoring of autotagging code (combined same code pieces)
10 years ago
Michael Peter Christen c3b55455fc enhanced initialization speed of vocabularies by using better
10 years ago
Michael Peter Christen 68c605d637 replace with CommonPattern.SPACE for split
10 years ago
Michael Peter Christen de3e373913 using precompiled CommonPattern.TAB for split
10 years ago
Michael Peter Christen 1f5047b15f using precompiled pattern CommonPattern.SEMICOLON for splits
10 years ago
Michael Peter Christen a8a2b7a803 persistency for vocabulary facet switch
10 years ago
Michael Peter Christen efbc9a3561 introducting a new getConfig method which parses comma-separated llists
10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
10 years ago
Michael Peter Christen ac19690d30 refactoring with CommonPattern.COMMA
10 years ago
Michael Peter Christen cf9b22ca5c do not reindex based on vocabulary fields (there are meanwhile many of
10 years ago
Michael Peter Christen 5a060c9f26 refactoring of reindexSolr (just replaced constant string)
10 years ago
Michael Peter Christen b5a55c8b3d fix for wkhtmltopdf (custom header does not work)
10 years ago
Michael Peter Christen 3d717b749a fix for urlmaskfilter
10 years ago
Michael Peter Christen bee5ee7cce removed some warnings
10 years ago
Michael Peter Christen 783cf6fbc7 the LinkedBlockingQueue is much faster than the ArrayBlockingQueue
10 years ago
Michael Peter Christen 6390454652 fix for vocabulary on/off setting
10 years ago
Michael Peter Christen a3c5995bde Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger 5ca0762179 fix: eom on parsing ico file by genericImageParser
10 years ago
Michael Peter Christen 4cd2d68e03 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen dc5700148f update to latest code changes from json.org
10 years ago
reger 42b0672be3 Let auto-disabled crawls recover if low resource condition vanished.
10 years ago
Michael Peter Christen 287c528f46 replaced old JavaApplicationStub for Mac Application framework with new
10 years ago
Michael Peter Christen 4c9d2a7c64 reverted 'do not show all options' strategy. This is actually confusing
10 years ago
Michael Peter Christen 7db2888336 fixed font size and print page generation in pdf snapshots
10 years ago
reger 24f68a4eb7 refactor opensearch heuristic
10 years ago
Michael Peter Christen 3b51636ecb fix for mediawiki import
10 years ago
Michael Peter Christen b07afbc115 a test with http://validator.w3.org/feed/#validate_by_input shows that
10 years ago