Commit Graph

7980 Commits (7789c32c82885c6bdb08c9adf1cbf4d745b92519)

Author SHA1 Message Date
reger 297fdb60d3 throw exception if crawler hostqueue can't create hostpath directory.
9 years ago
luc 755efac17d Use same max file size when loading all resource bytes or opening stream
9 years ago
luc bc6c79fc12 Corrected scaling function for non RGB images.
9 years ago
luc 1565559df8 Refactoring : extracted write InputStream method.
9 years ago
luc f0478bb14d BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
9 years ago
luc 07437986e7 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 97cc03ef6a start using a template for urlproxy header
9 years ago
luc f01d49c37a Process large or local file images dealing directly with content
9 years ago
luc 3c4c77099d If available, check content length before downloading. Check also
9 years ago
luc 5bbb2e1730 Ensure resource is closed when reading a full file InputStream
9 years ago
luc 6291a57300 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 0d3c5b223e have psParser cleanup temp file
9 years ago
reger 7d0d19cb8e avoid File.deleteOnExit() on temp files
9 years ago
luc bfe51001e3 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 02e4489a23 set tmpfile.deleteOnExit by default,
9 years ago
reger 2985baaa01 Exclude repetitive protocol part in tokenized url
9 years ago
reger ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
9 years ago
reger 52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
9 years ago
luc 49331dc523 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 47d70732f6 improve locale translator
9 years ago
sixcooler 646afe9183 do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler 194df613de not using 'location' as defaultfacetfield - since we removed it being
9 years ago
sixcooler d3b9349b6f simplification / speedup of GenerationMemoryStrategy
9 years ago
sixcooler 4a905ec134 fix to not let the AccessTracker-Log grow to much, but have enough data
9 years ago
reger 20e18d79f8 harmonize document title for archive parsers
9 years ago
luc f11b5e8309 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 112ae013f4 update bzip and bzip parser process,
9 years ago
reger e76a90837b update zip and tar parser process,
9 years ago
luc 4e673ffc9a Ensure closing of InputStream even when an exception occurs.
9 years ago
luc 10696b53f7 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 8532565c7d optimize order of parsers to try
9 years ago
reger 681889ae64 use current tar library for untar files
9 years ago
reger 5d71fc70e3 fix tarParser early exit on looping content
9 years ago
luc bcc2e7cb5b Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 2fcf6f104c fix bzipParser recognition
9 years ago
luc 745e97a575 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger 11f3666660 increase use of pre.defined CATCHALL_QUERY string
9 years ago
reger a58ee49307 Optimize internal imagequery focus on using content_type to select images
9 years ago
luc fc3294382e Updated javadocs for warning on target encoding format potential errors.
9 years ago
luc aa70ff4ff6 Corrected images alpha channel rendering
9 years ago
reger d223cf0ae4 adjust MediaWiki importer geo coordinate calculation
9 years ago
reger 2b775d5be6 fix typo in WikiCode coordinate calculation
9 years ago
reger bbe9df2bb3 fix MediawikiImporter for bz2 dump
9 years ago
reger c6687dd560 fix a system.out to log.fine
9 years ago
reger e53c6bbd51 fix init of peer flags
9 years ago
Michael Peter Christen ac034db8bc Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger 826f14f37f fix unnececary set null of peer flags, causing reread
9 years ago
luc 5902ce032e Corrected NullPointerException case when ImageIO reader is not found for
9 years ago
reger c6495a5b62 add a log entry on parsing ajax crawling scheme snapshot
9 years ago
reger 9252e36aeb implement ajax crawling scheme for ajax sites which adhere to the proposed use of hash-bangs to provide html content
9 years ago
Michael Peter Christen d1ae999ef9 replaced HashMap with LinkedHashMap to preserve the object order
9 years ago
Michael Peter Christen 7d075a1d76 added log lines
9 years ago
Michael Peter Christen 092dac086e Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger 7a64bebb86 init Recrawl job chunk size to max crawl loader during job start, to use some system preferences
9 years ago
luc d6522fa4a2 Integrated haraldk/TwelveMonkeys library to first add TIF image format
9 years ago
Michael Peter Christen 9244694e64 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 151ccd50a9 fix for image size field values (must be multi-valued)
9 years ago
reger c9937973e3 unescape MultiProtocolURL getAttributes() return values.
9 years ago
reger 78e8c6f3e5 refactor special handling (static override) of SUPPORTED_EXTENSIONS/MIME_TYPES
9 years ago
reger d54c5d310a add links with image extension not automatically to image links.
9 years ago
reger 851e8f6c8a check jpeg file signature in genericImageParser
9 years ago
reger fb75fea446 use recrawljob w/o sort results by date
9 years ago
reger 43c27aa550 upd to solr/lucene 5.3.1
9 years ago
reger 688f7b2a5c allow/display svg images in image results previews
9 years ago
reger d5330391de remove some unused var allocation in parser
9 years ago
Michael Peter Christen 3d7dd9d3aa follow-up to latest commit: also flush the search cache if all crawls
9 years ago
Michael Peter Christen c737ff235d in case that the include_string contains several entries including
9 years ago
Michael Peter Christen 8e555d79a3 add also 1-character tokens to the token list because that could be also
9 years ago
reger 7c82cd4415 add a end condition to svgParser for wrong content
9 years ago
reger 356d4d1301 remove rdfParser from init (current function identical with genericParser)
9 years ago
reger c647d899e3 add svgParser to parse metadate from svg images
9 years ago
reger bad34804fe optimize parseInt for <img> tag attribute parsing
9 years ago
Michael Peter Christen 6ebc2451a9 Merge pull request #14 from luccioman/master
9 years ago
reger 2f51baff4f check for loading error (includs unsupported formats)
9 years ago
luc 5578886f6f Merge branch 'master' of https://github.com/luccioman/yacy_search_server.git
9 years ago
luc c38d6c1f37 Correction for mantis 535: inurl: parameter doesn't work on URLs with
9 years ago
reger 52e3eb4ce8 harmonize/correct assignment to Ymarkmeta.mime
9 years ago
Michael Peter Christen 87f358058e Fix for index entries which have id's not computed as hash from the url.
9 years ago
reger 3f2b8ab5e5 optionally include mime in p2p url exchange string
9 years ago
reger a3195d78ae add Portuguese month names to date recognition
9 years ago
reger d2cc11ea8f fix html parser taking <style> content as text.
9 years ago
Michael Peter Christen 5f706797cb patch for a bug inside of solr since solr 5.0 when using a boost
9 years ago
reger 7889fc2389 Hack to prevent Solr issue on partial update on a document containing multivalued date field
9 years ago
reger b4cbdea1e7 adapt SolrServerConnector.add to handle error on partial update input document.
9 years ago
reger 98ab655917 on reindex delete index document with invalid url
9 years ago
reger 1e8369e18b use a parsed date in Document.toString
9 years ago
luccioman 199b2ce52d Translator refactoring : to simplify locale files writing, process keys
9 years ago
luccioman 4dd9c0d5d9 Merge from main repository
9 years ago
reger 3428b6f13b improve filtering by filetype navigator.
9 years ago
reger e37a4f0b3d prevent metadata records in index w/o valid url
9 years ago
reger 41c4eade51 extract modification date from vCard (vcfParser)
9 years ago
reger 8768896975 extract lastmodified from openoffice doc
9 years ago
Michael Peter Christen c40c302748 when many crawl queues are generated, this NPE can occur; probably
9 years ago
reger 367fe388b9 fix exception throw after sendError in DefaultServlet
9 years ago
luccioman 9752bd5f88 Added utils to help translation without launching full YaCy application
9 years ago
luccioman 2f0f0180e2 Added a function to list files recursively.
9 years ago
luccioman 7e4c1d2282 Translator refactoring :
9 years ago
reger 802ccaead6 fix init of error cache, use latest faildates => load_date_dt
9 years ago
reger dba7f15073 apply same size constrain on result image from doc
9 years ago
reger 4cf875336c complete TODO: getFileExtension handle dot in query part
9 years ago
sixcooler 87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
9 years ago
reger eaf0e8ff2c start recording/indexing pixel size for image document
9 years ago
reger c33229fc0c check mime prior to ext for metadata modification for images
9 years ago
reger 19f1308bf0 enforce th result images limit to > 16x16px
9 years ago
reger 0e4ba0360b fix NPE on .yacyh result url of disconnected peer
9 years ago
reger 7ed812a2bf log missing seed.port
9 years ago
reger 206883f80d fix: Preserve protocol in url proxy
9 years ago
reger f7b0b3b7b3 avoid runtime exception by earlier testing for seed.ip=null
9 years ago
Michael Peter Christen 906b5fd742 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 8f90767889 fix for filesystem crawl
9 years ago
sixcooler a3dd4be749 added / corrected charste to be 1.7 compatible.
9 years ago
Michael Peter Christen 8028410ab7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen df3314ac1a added a new facet type based on a probabilistic classifier using
9 years ago
reger 1409cabe8b exclude more default search fields from text copy to text_t
9 years ago
reger e2e73258ca remove obsolete interface SearchAccumulator
9 years ago
Michael Peter Christen dbbad23e12 removed warnings
9 years ago
Michael Peter Christen 500cfa9457 enhanced logging
9 years ago
Michael Peter Christen c14bc8d9b7 revert of fq transformation (recent fix)
9 years ago
Michael Peter Christen 203df5a750 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
reger fa08ca207e ! finish running crawls before applying !
9 years ago
reger ee77f24e52 use some more declared HeaderFramework constants
9 years ago
Michael Peter Christen 11a848da5a Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen b94bd7f20a a collection of search query enhancements:
9 years ago
reger dbe2594c38 replace deprecated myPublicLocalIP() in AbstractRemoteHandler
9 years ago
reger 6d3534e725 remove unused Transmission hit counter
9 years ago
reger cb67eb7baf use more absolute path for config file opening
9 years ago
Michael Peter Christen 1ccbf739b1 added bayes filter from Philipp Nolte, originally taken from
9 years ago
Michael Peter Christen 1bced1ae60 using latest enhanced (un/)gzip methods from loklak for yacy
9 years ago
Michael Peter Christen 3e6657288d Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen de8cfbe1d7 added export option to export the fulltext of the search index text only
9 years ago
reger 2fb6ebe88a move java environment parameter setting disabling SNI (Server Name Indicator) support for https connections from code to startup script allowing admin to ~easy/transparent alter the YaCy default FALSE setting.
9 years ago
Michael Peter Christen fbeae20b3a try a healing of the cache if the index file is corrupted
9 years ago
Michael Peter Christen 03ea723889 added log lines for query performance profiling
9 years ago
Michael Peter Christen 0e87a99ab8 more fixes for special windows paths
9 years ago
Michael Peter Christen e5b6424eed patch for bad windows file paths
9 years ago
Michael Peter Christen 0aa6fcf259 remove old vocabularies and synonyms before adding new
9 years ago
Michael Peter Christen 289018b559 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 7b412e8c07 added msg (text emails) format; should be handled by html parser.
9 years ago
reger f91298d3b6 fix one implicit Integer/Long type conversion
9 years ago
reger 821262a179 add CommonPattern for multiple spaces
10 years ago
Ryszard Goń 59096935d0 Use language-detection library for increased accuracy
10 years ago
Michael Peter Christen 90f75c8c3d added enrichment of synonyms and vocabularies for imported documents
10 years ago
Michael Peter Christen 7829480b82 refactoring: separated condenser and tokenizer
10 years ago
Michael Peter Christen 593de05922 enhanced surrogate import process speed (dramatically!)
10 years ago
Michael Peter Christen 3c4c69adea fix for
10 years ago
Michael Peter Christen 1fec7fb3c1 suppress access to solr when doing search suggestions in case that the
10 years ago
Michael Peter Christen 694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
10 years ago
sixcooler e427efbe54 Next Try for a fix for upload-connection staying in blocked state.
10 years ago
reger 0fab445b19 Resourceobserver log warning - deleting releases files - only on actual deletes
10 years ago