Commit Graph

8419 Commits (3cedbbd4ed035682f4aab584535d1aa78e799c9a)

Author SHA1 Message Date
reger 96467c5467 remove not needed counter in Tokeninzer (completing last changes)
9 years ago
luccioman d66b0f7b7b Fixed french messages encoding in YaCy tray.
9 years ago
reger 7efb66ee10 adjust the WordReference.join wordsintext calc to take the max (instead of sum)
9 years ago
luccioman 0a9ff14d96 Fixed NullPointerException case and added Javadoc
9 years ago
luccioman 06d4f93d03 Merged master into postprocessing branch
9 years ago
Michael Peter Christen b73d2db914 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 25a3c7a6d0 catch exception and write end of object
9 years ago
reger 272cdd496a reactivate sentence counter in WordTokenizer for phrasepos ranking,
9 years ago
Michael Peter Christen 5e165a8150 removed unused imports
9 years ago
Michael Peter Christen c716648c78 enhanced json encoding of strings
9 years ago
Michael Peter Christen 6139bd85a8 fix for broken facet names
9 years ago
Michael Peter Christen 5060f9fee9 fix for too long snippets
9 years ago
Michael Peter Christen 8681cee3f3 fix for bad comma
9 years ago
Michael Peter Christen db6d8fc197 fix for bad json
9 years ago
Michael Peter Christen 8f4a341735 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 9934f546bb added default fl to solr query, removed large texts retrieval and
9 years ago
reger 120bf7e6e2 implemented RWI WordReference to return the word position value (was always left empty)
9 years ago
reger e310ec5f70 fix posInText ranking calculation to score 0 on no position info
9 years ago
luccioman 74f9927ddc Merge remote-tracking branch 'origin/master' into dist_macOS
9 years ago
reger 51c077f493 adjust the getTopics() and getTopicNavigator() to current useage
9 years ago
reger 39dd244693 fix ConcurrentScoreMap.set() calculation of totalCount()
9 years ago
reger ebf818ad95 log a error on aborted news publish (due to duplicate news.id)
9 years ago
reger cc2d9dd3f1 reactivate the use of included-in-topwords boost in postRanking
9 years ago
luccioman 39ea28adfd Merged master to dist_macOS branch.
9 years ago
luccioman 8255e91c99 Fixed serverClassLoader.findClass method
9 years ago
reger 6801673a07 apply postranking media search boost only on media queries
9 years ago
luccioman 1dc4306058 Fixed indentation for better readability.
9 years ago
luccioman 8c49a755da Postprocessing refactoring
9 years ago
luccioman 42f45760ed Refactored postprocessing
9 years ago
reger 4386e84b55 correct NewPool rentention calculation
9 years ago
reger 5e72d37f0a TransNews_p: add ad-hoc translation of target file on positive vote (additon to local translation)
9 years ago
reger 9462a32244 Added news service for easy, community driven UI translation support.
9 years ago
reger f8d6543a23 Rename class CreateTranslationMaster to TranslationManager and add
9 years ago
reger 19b4509d54 speed-up reading of xlif language file, by using xmlparser (stax) instead of jaxb
9 years ago
Michael Peter Christen e1fac86f53 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen a9316ceff6 force browser-caching of favicons from search results
9 years ago
Orbiter 503312ca43 Merge pull request #61 from luccioman/heroku_experiments
9 years ago
reger 33bf35d90f missing file for prev commint "Introduction of additional language setting browser"
9 years ago
reger 16e8ed3f01 Introduce additional language setting "browser/Browser Language" for UI internationalization.
9 years ago
reger 3b47a07dd1 change unused servletProperties entry CONNECTION_PROP_CLIENT_REQUEST_HEADER to
9 years ago
reger 036c1dc6ef fix CookieTest_p formatting (output of <br> as text),
9 years ago
Michael Peter Christen bf6709d196 fixed missing browser activation in linux
9 years ago
Michael Peter Christen d8504418b6 enhanced browser-caching of static content
9 years ago
Michael Peter Christen 079112358c Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen efeb592661 don't do solr optimization, this create high IO load. We should leave
9 years ago
luccioman 46b8836548 Copy image resources contained in donation iframe.
9 years ago
reger 4c7a77662a eleminate dependency on file-extension in storeDocument but use supported mime-type
9 years ago
reger ebde21079a refactor xlsParser to include Excel file attribute (like author) in parser result doc.
9 years ago
luccioman 744c9a2615 Opensearch desc : handle https protocol url with default port (443)
9 years ago
luccioman b9c28893ee Merged master to 'heroku' branch.
9 years ago
Michael Peter Christen 103a8348b3 fix for NPE and small performance enhancement
9 years ago
reger 2910fe35c1 add missing scheduler calc of next exec_date (call of calculateAPIScheduler)
9 years ago
reger 70d47ae38a keep scheduler selection by repeat entry from 07311020d4
9 years ago
reger 7c3f932e5d revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)
9 years ago
reger 07311020d4 postpone apicall exec date init until actual call
9 years ago
reger 5e335b32da fix Blacklist.contains() matching path pattern to string
9 years ago
reger 5e9e871192 fix Blacklist.remove by using pattern.toString to find pattern to remove,
9 years ago
reger 1843ea7e69 on Blacklist.add pattern to source file also update internal entry maps
9 years ago
reger bf6ce33da3 Correct use of _htDocsPath config in YaCyDefaultServlet to use servlet config variable
9 years ago
luccioman 480027ec98 Merge remote-tracking branch 'origin/master' into heroku_experiments
9 years ago
reger fcad2d0744 add uses of config constant INDEX_RECEIVE_ALLOW
9 years ago
reger 226f81cfcf declare poison pill url MultiProtocolURL() as protected to make sure not
9 years ago
reger f8632ad292 prevent string index out of bounds MultiProtocolURL.getPaths
9 years ago
reger 35a7d57260 update lucenematchversion to current (5.2.0 -> 5.5.0)
9 years ago
reger 9b07bbf955 deprecate newurl(), not used and already replaced
9 years ago
luccioman 47d486298f Merged changes from master.
9 years ago
reger 774b3906a9 fix GenericFormatter.parse ("time","timeoffset")
9 years ago
reger 27163af0e1 improve detection of referenced links by taking http and https link protocol
9 years ago
reger f89d4eb51d fix MultiProtocolURL init (assign of host) for urls with '/' in query part
9 years ago
reger 87fcfc6d78 Adjusted hash computation and toNormalform for file:// protocol to deliver
9 years ago
luccioman d6bf90803f Merged from maain master branch.
9 years ago
luccioman 9b9c112263 Handle more propertly local port configuration by system property
9 years ago
reger 3811184abd fix GSA servlet clientIP retrival
9 years ago
reger 7ab41d4ff1 use directories original lastmodified date in file- & smbloader in response
9 years ago
reger 708bcbb042 one more replacement to use cached hosthash vs. calculated
9 years ago
luccioman b57a06d88e Let Heroku decide which http port to use
9 years ago
reger 22db449f2a to prevent crawler to concurrently access and alter same crawl queue
9 years ago
luccioman 893a40995a Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Orbiter 50c5ddf1a1 Merge pull request #56 from luccioman/LibreJS
9 years ago
Michael Peter Christen 7466d390b2 small refactoring + do not accept too old peers during bootstrap
9 years ago
luccioman 6e96c7341a Merge remote-tracking branch 'origin/master'
9 years ago
reger 8d58a48029 remove wrong log line in CrawlSwitchboard
9 years ago
reger 5aaa057c65 ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read.
9 years ago
reger 41c36ffd75 exclude rejected results from result count
9 years ago
reger d4da4805a8 internal wiki code, require header line to start with markup
9 years ago
reger e952e355a2 have Translator servlet adhoc apply added translation by translating a single file
9 years ago
reger b119ff65be clean out not used Switchboard variables
9 years ago
reger 223071337b Translator to take caution of word boundaries to identify text portion to
9 years ago
luccioman 009657791e Merge remote-tracking branch 'origin/master' into LibreJS
9 years ago
luccioman a73c9327a5 JavaScript License fixes for LibreJS compatibility
9 years ago
reger 0c40401d28 fix MessageBoard test for null data
9 years ago
reger 5b22c63030 Adjust TranslatorXliff to load default 1st and merge downloaded or modified local translation.
9 years ago
reger a2e0f00456 optimize Translator
9 years ago
reger a6ba1faa80 introduce a translation edit servlet Translator_p.html YaCy's UI text translation
9 years ago
reger b3c9041f79 remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames
9 years ago
reger bd8f7c11f5 Use transparent addToCrawler in AutoSearch instead of addToIndex
9 years ago
reger f23d8ab47b fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP()
9 years ago
reger bb0076c3dd fix: assure close inputstream in TranslatorXliff after reading xlf file
9 years ago
reger 6384b7d82e fix NPE in Load_MediawikiWiki servlet in intranet mode
9 years ago
Michael Peter Christen 596b5dfa59 add the JRE version in the seed. Purpose: identify if it is possible to
9 years ago
reger 4cc38e979d add InputStream close after reading input file (Vocabulary_p servlet)
9 years ago
reger 6bf9c55584 adjust Solr select servlet to lates bugfix for boostquery (bq param)
9 years ago
Burkhard 9a18e2297b Merge pull request #51 from JeremyRand/multiple-boost-query
9 years ago
reger f0d7b93372 make use and activate autodetect charset in Vocabulary input from file
9 years ago
JeremyRand 433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
9 years ago
JeremyRand 58824dfa6c Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.
9 years ago
reger 9e94989237 upd to PDFBox 2.0.1
9 years ago
reger d0a571bed2 del cytag trail for own index.html (save resource not used by default)
9 years ago
reger de46879637 fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut)
9 years ago
reger 24b0fa2a38 extend snapshot Html2Image.pdf2image to use PDFBox image export capability
9 years ago
reger eb2a00b1d8 fix NPE on missing crawldepth_i
9 years ago
reger efb9f1a8b7 save resource for unused blacklistFiles map
9 years ago
reger 5f113be760 cleanup connectPeer & yacyVersion.latestRelease usage
9 years ago
reger 7097dcbdbd cleanup hack for partial Solr update on multivalued datefields
9 years ago
reger f10ea3c155 clean-out unused SwitchboardConstants
9 years ago
reger ef24593347 delete obsolete SEARCHRESULT busythread constants
9 years ago
reger 125b5e26a5 apply bugfix for ChartPlotter from Pullreq 42
9 years ago
reger 06ce9ae711 prevent "unchecked conversion" compiler message
9 years ago
reger b4a576dbdf exclude unused protocol param "duetime"
9 years ago
reger 3bd6ae8d8b keep addon/Notepad++ keyword marker on lng export
9 years ago
reger 16837d60c7 fix version in locale version file
9 years ago
reger 0fb01e429e fix migration, account for ssl port in config (for auto-disable https)
9 years ago
reger 7be1c7a05a fix logger name
9 years ago
reger 1d940e5a94 upd commons-compress 1.11
9 years ago
reger 7789c32c82 delete crawl queue on init exception
9 years ago
reger f781b9dd47 revert call condition f. migration.installSkins
9 years ago
reger 3adb670f44 remove never used Domains.myHostNames set
9 years ago
reger 6ecc180299 fix rwi doubledom return best (highest) ranking
9 years ago
reger 2343e3f1cd keep and update existing xlf translation master instead of create new
9 years ago
reger a1935f485f Added utility class CreateTranslationMasters to create a language independant
9 years ago
reger acaf51b296 keep ConfigLanguage_p as 1st entry in exported translation file
9 years ago
reger 61c5b6b403 fix empty drop down list in ConfigLanguage after wrong/empty download
9 years ago
reger 4eddabee42 translate Network History screen -> de
9 years ago
reger 90c79014ae remove unused translator routine which also doesn't handle rel path input
9 years ago
reger 902e79e261 Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
9 years ago
reger d9adc2c255 load handler for Transparent Proxy on startup only if feature is activated
9 years ago
reger ec24a0c85a add test case for optimized toTokens()
9 years ago
reger cada24f918 adjust utility ListNonTranslatedFiles for path compare on windows
9 years ago
reger fb8ae14b21 make migration version safe
9 years ago
reger 258cd41577 reduce logging (EmbeddedSolrConnector.query)
9 years ago
reger 6783ef5540 move example code SearchClient out of yacycore package
9 years ago
Michael Peter Christen b89465d952 0N - basic dump upload servlet infrastructure, to share index dumps
9 years ago
Michael Peter Christen f12a900f3e harmonization of http post of files for one and several files - this had
9 years ago
Michael Peter Christen 849ab671a9 0n: modified the p2p bootstraping process - rules had been too tight and
9 years ago
reger 764f5100f0 fix delete of temp file after odt % ooxml parser
9 years ago
reger 379e9b330d use supplied url port to get robots.txt in crawlers hostqueue
9 years ago
reger 58a959403d fix mixed logfactory in UrlProxyServlet,
9 years ago
Michael Peter Christen 2494a820c7 0N - added recording of dump exports if given time frame is not negative
9 years ago
Michael Peter Christen ef2cc4f690 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen a6bf0b1649 0N - added option to generate index export files for a specific number
9 years ago
reger 6d56beaed8 fix assertion exception in toString of MultiProtocolURL
9 years ago
reger 42a7bdb2af fix SolrSelectServlet authentication to default to true
9 years ago
reger dbb28bb4f3 del unused statistic parameter (from status servlet)
9 years ago
reger 06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
9 years ago
reger caf9e98f09 put metadata dc_publisher in corresponding schema field
9 years ago
reger 38e2b054d4 remove servlet classloder internal cache map (to save the resources, cache hits marginal)
9 years ago
luc 3f338777f7 Also check and index eventual icon url information from metadata.
9 years ago
luc 9f712146df Display icons in ViewFile "links" mode.
9 years ago
luc 26f1ead57c Created ViewFavicon class specialized in favicon viewing.
9 years ago
reger 6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
9 years ago
reger b65e2b527d include use of condenser's content text for language detection.
9 years ago
luc 07222b3e1a Added favicon url transmission in RWI chunks.
9 years ago
luc 480772c070 Fixed json search results from commit "Improved URLLicence reliability"
9 years ago
reger 937fbb0b9f correct isHidden() for smb from last commit
9 years ago
reger 535d4bf75f respect hidden attribute for file and smb directory listing
9 years ago
luc 3cc5619d93 Improved HTML icons indexing and rendering in search results.
9 years ago
luc edef6cd0dc Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger c28142095a add findClass() to servlet class loader (used in YaCyDefaltServlet)
9 years ago
luc f7b854465b Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
9 years ago
reger 2048b7e057 support scraping start-/enddate from html tag with property "datetime"
9 years ago
reger 900d4584ba complet resource cleanup of lists in contentscraper's close()
9 years ago
luc aa60ad1dbc Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 1f18653de0 pass parsed swf content trough htmlscraper
9 years ago
reger 18ecf57792 add support of compressed swf to swfParser
9 years ago
sixcooler 5cb7ba0dc4 fix for connections not getting closed to get favicon.ico during seach
9 years ago
luc ef83e34b8a Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger ed3e16e092 apply remote result count config value to Bookmark Autosearch
9 years ago
Ryszard Goń a98c395023 Add the Autocrawl thread
9 years ago
Ryszard Goń 1728cd30c6 Create autocrawl profiles
9 years ago
luc 41767a01c2 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger ff27824964 fix swfParser reading file signature
9 years ago
luc 7aa1a29e33 Return more accurate HTTP status 400 with detail message when some error
9 years ago
luc bd9dc2f32b Corrected NullPointerException cases occuring in YJsonResponseWriter
9 years ago
luc 0076f9f97d Updated documented sample url
9 years ago
luc cfdbc2b487 Improved URLLicence reliability for use by conccurrent non authaurized
9 years ago
reger c91e712178 further refactor using standard java / (one) utf-8 charset variable
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
sixcooler 5a35f9383a bump to solr/lucene 5.4.0
9 years ago
reger a58d34a4e8 check error URL cache before adding errorDoc to index
9 years ago
reger e9539b1086 reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart
9 years ago
reger cd26717ba2 fix low memory status hint (dht-in disabled)
9 years ago
reger a5faf73afa remove obsolete yacy.init entries interaction.*
9 years ago
sixcooler dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger 46ac0867ff fix poison mediawikiimporter output queue also after ExecutionException
9 years ago
reger a7591d3ed0 fix mediawikiimporter number format exception on coordinate parsing
9 years ago
reger 9da1712a31 increase http header EXPIRES for css and images in DefaultServlet
9 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
reger 80e2c82249 fix NPE on empty blog importfile parameter
9 years ago
reger e84d94f8ca fix mime table for ms office / open office documents
9 years ago
reger 45b9bd8403 adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
9 years ago
reger d5fd031449 fix reading of ippattern config array in URLProxy
9 years ago
reger b7e8358645 make use of header.getContentType where possible (mime is normalized afterwards)
9 years ago
reger 7a8c077838 fix HeaderFramework.mime() to strip charset parameter.
9 years ago
reger b4b6910d60 fix (todo): correct doc.id of remote search result if no match with newly
9 years ago
reger dec3e6ad96 fix: adjust urlstub for mailto links
9 years ago
reger cb83e65f89 drop returning document language "en" if unknown (fix todo)
9 years ago
reger 0c5548a7ff fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger 71c416f383 show mailto links in ViewFile.html linklist
9 years ago
reger 6b7c10cef8 fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago
reger 14803d58cd let html scraper accept html5 <link rel="icon"> for favicon links
9 years ago
luc b4cdacee76 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc ba0a293f5c Corrected another case of
9 years ago
reger 4d2b934487 prevent mailto links getting into parser result document's in/outbound link collection
9 years ago
luc 8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
9 years ago
luc a2c08402af Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 70595d05d0 Modified MemoryControl.main() test to properly end for better results
9 years ago
sixcooler 1be67d9ab6 CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years
9 years ago
reger 28b8bc290a fix use of NETWORK_SEARCHVERIFY for rwi verification
9 years ago
reger 020630efd8 remove unused network scanner parameter from queryparameter
9 years ago
luc ad5586f8f6 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 8ebefa4233 Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
9 years ago
luc 7736ee5a42 Updated MediaWimporter main() : display usage in console and stop
9 years ago
reger cdb8f3b10d make current ranking score value avail. to search interface / api
9 years ago
luc 27d11f8671 Fixed isSolrDump function : PushBackInputStream was not unread when
9 years ago
Michael Peter Christen 135a123a77 less logging in new language detection
9 years ago
Michael Peter Christen ef8cd80593 fix for npe
9 years ago
reger 0347bfa71f Apply collection query constraint/modifiert to rwi result stack.
9 years ago
luc 2a67d2ba6f Corrected error management for unsupported image formats, parsing
9 years ago
Michael Peter Christen d6e9834040 Merge branch 'master' of
9 years ago
Michael Peter Christen d82d311995 Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger b5371ea8c1 read/init crawl queue in a thread
9 years ago
reger 1160b13172 remove unused md5 from ViewFile servlet params
9 years ago
reger e163ea88f6 fix vsdParser (Visio) parser return statement
9 years ago
reger b2c8bc0ae6 remove md5_s from default index fields
9 years ago
luc e40ae0943b - No max dimensions specified : render raw image data when source and
9 years ago
reger 90686a75a2 fix flux factor (additional crawl delay by access count) calculation
9 years ago
luc 4af27289e5 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 297fdb60d3 throw exception if crawler hostqueue can't create hostpath directory.
9 years ago
luc 755efac17d Use same max file size when loading all resource bytes or opening stream
9 years ago
luc bc6c79fc12 Corrected scaling function for non RGB images.
9 years ago
luc 1565559df8 Refactoring : extracted write InputStream method.
9 years ago
luc f0478bb14d BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
9 years ago
luc 07437986e7 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 97cc03ef6a start using a template for urlproxy header
9 years ago
luc f01d49c37a Process large or local file images dealing directly with content
9 years ago
luc 3c4c77099d If available, check content length before downloading. Check also
9 years ago
luc 5bbb2e1730 Ensure resource is closed when reading a full file InputStream
9 years ago
luc 6291a57300 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 0d3c5b223e have psParser cleanup temp file
9 years ago
reger 7d0d19cb8e avoid File.deleteOnExit() on temp files
9 years ago
luc bfe51001e3 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 02e4489a23 set tmpfile.deleteOnExit by default,
9 years ago
reger 2985baaa01 Exclude repetitive protocol part in tokenized url
9 years ago
reger ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
9 years ago
reger 52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
9 years ago
luc 49331dc523 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 47d70732f6 improve locale translator
9 years ago
sixcooler 646afe9183 do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler 194df613de not using 'location' as defaultfacetfield - since we removed it being
9 years ago
sixcooler d3b9349b6f simplification / speedup of GenerationMemoryStrategy
9 years ago
sixcooler 4a905ec134 fix to not let the AccessTracker-Log grow to much, but have enough data
9 years ago
reger 20e18d79f8 harmonize document title for archive parsers
9 years ago
luc f11b5e8309 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 112ae013f4 update bzip and bzip parser process,
9 years ago
reger e76a90837b update zip and tar parser process,
9 years ago
luc 4e673ffc9a Ensure closing of InputStream even when an exception occurs.
9 years ago
luc 10696b53f7 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 8532565c7d optimize order of parsers to try
9 years ago
reger 681889ae64 use current tar library for untar files
9 years ago
reger 5d71fc70e3 fix tarParser early exit on looping content
9 years ago
luc bcc2e7cb5b Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 2fcf6f104c fix bzipParser recognition
9 years ago
luc 745e97a575 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger 11f3666660 increase use of pre.defined CATCHALL_QUERY string
9 years ago
reger a58ee49307 Optimize internal imagequery focus on using content_type to select images
9 years ago
luc fc3294382e Updated javadocs for warning on target encoding format potential errors.
9 years ago
luc aa70ff4ff6 Corrected images alpha channel rendering
9 years ago
reger d223cf0ae4 adjust MediaWiki importer geo coordinate calculation
9 years ago
reger 2b775d5be6 fix typo in WikiCode coordinate calculation
9 years ago
reger bbe9df2bb3 fix MediawikiImporter for bz2 dump
9 years ago
reger c6687dd560 fix a system.out to log.fine
9 years ago
reger e53c6bbd51 fix init of peer flags
9 years ago
Michael Peter Christen ac034db8bc Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger 826f14f37f fix unnececary set null of peer flags, causing reread
9 years ago
luc 5902ce032e Corrected NullPointerException case when ImageIO reader is not found for
9 years ago
reger c6495a5b62 add a log entry on parsing ajax crawling scheme snapshot
9 years ago
reger 9252e36aeb implement ajax crawling scheme for ajax sites which adhere to the proposed use of hash-bangs to provide html content
9 years ago
Michael Peter Christen d1ae999ef9 replaced HashMap with LinkedHashMap to preserve the object order
9 years ago
Michael Peter Christen 7d075a1d76 added log lines
9 years ago
Michael Peter Christen 092dac086e Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger 7a64bebb86 init Recrawl job chunk size to max crawl loader during job start, to use some system preferences
9 years ago
luc d6522fa4a2 Integrated haraldk/TwelveMonkeys library to first add TIF image format
9 years ago
Michael Peter Christen 9244694e64 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 151ccd50a9 fix for image size field values (must be multi-valued)
9 years ago
reger c9937973e3 unescape MultiProtocolURL getAttributes() return values.
9 years ago
reger 78e8c6f3e5 refactor special handling (static override) of SUPPORTED_EXTENSIONS/MIME_TYPES
9 years ago
reger d54c5d310a add links with image extension not automatically to image links.
9 years ago
reger 851e8f6c8a check jpeg file signature in genericImageParser
9 years ago
reger fb75fea446 use recrawljob w/o sort results by date
9 years ago
reger 43c27aa550 upd to solr/lucene 5.3.1
9 years ago
reger 688f7b2a5c allow/display svg images in image results previews
9 years ago
reger d5330391de remove some unused var allocation in parser
9 years ago
Michael Peter Christen 3d7dd9d3aa follow-up to latest commit: also flush the search cache if all crawls
9 years ago
Michael Peter Christen c737ff235d in case that the include_string contains several entries including
9 years ago
Michael Peter Christen 8e555d79a3 add also 1-character tokens to the token list because that could be also
9 years ago
reger 7c82cd4415 add a end condition to svgParser for wrong content
9 years ago
reger 356d4d1301 remove rdfParser from init (current function identical with genericParser)
10 years ago
reger c647d899e3 add svgParser to parse metadate from svg images
10 years ago
reger bad34804fe optimize parseInt for <img> tag attribute parsing
10 years ago
Michael Peter Christen 6ebc2451a9 Merge pull request #14 from luccioman/master
10 years ago
reger 2f51baff4f check for loading error (includs unsupported formats)
10 years ago
luc 5578886f6f Merge branch 'master' of https://github.com/luccioman/yacy_search_server.git
10 years ago
luc c38d6c1f37 Correction for mantis 535: inurl: parameter doesn't work on URLs with
10 years ago
reger 52e3eb4ce8 harmonize/correct assignment to Ymarkmeta.mime
10 years ago
Michael Peter Christen 87f358058e Fix for index entries which have id's not computed as hash from the url.
10 years ago
reger 3f2b8ab5e5 optionally include mime in p2p url exchange string
10 years ago
reger a3195d78ae add Portuguese month names to date recognition
10 years ago
reger d2cc11ea8f fix html parser taking <style> content as text.
10 years ago
Michael Peter Christen 5f706797cb patch for a bug inside of solr since solr 5.0 when using a boost
10 years ago
reger 7889fc2389 Hack to prevent Solr issue on partial update on a document containing multivalued date field
10 years ago
reger b4cbdea1e7 adapt SolrServerConnector.add to handle error on partial update input document.
10 years ago
reger 98ab655917 on reindex delete index document with invalid url
10 years ago
reger 1e8369e18b use a parsed date in Document.toString
10 years ago
luccioman 199b2ce52d Translator refactoring : to simplify locale files writing, process keys
10 years ago
luccioman 4dd9c0d5d9 Merge from main repository
10 years ago
reger 3428b6f13b improve filtering by filetype navigator.
10 years ago
reger e37a4f0b3d prevent metadata records in index w/o valid url
10 years ago
reger 41c4eade51 extract modification date from vCard (vcfParser)
10 years ago
reger 8768896975 extract lastmodified from openoffice doc
10 years ago
Michael Peter Christen c40c302748 when many crawl queues are generated, this NPE can occur; probably
10 years ago
reger 367fe388b9 fix exception throw after sendError in DefaultServlet
10 years ago
luccioman 9752bd5f88 Added utils to help translation without launching full YaCy application
10 years ago
luccioman 2f0f0180e2 Added a function to list files recursively.
10 years ago
luccioman 7e4c1d2282 Translator refactoring :
10 years ago
reger 802ccaead6 fix init of error cache, use latest faildates => load_date_dt
10 years ago
reger dba7f15073 apply same size constrain on result image from doc
10 years ago
reger 4cf875336c complete TODO: getFileExtension handle dot in query part
10 years ago
sixcooler 87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
10 years ago
reger eaf0e8ff2c start recording/indexing pixel size for image document
10 years ago
reger c33229fc0c check mime prior to ext for metadata modification for images
10 years ago
reger 19f1308bf0 enforce th result images limit to > 16x16px
10 years ago
reger 0e4ba0360b fix NPE on .yacyh result url of disconnected peer
10 years ago
reger 7ed812a2bf log missing seed.port
10 years ago
reger 206883f80d fix: Preserve protocol in url proxy
10 years ago
reger f7b0b3b7b3 avoid runtime exception by earlier testing for seed.ip=null
10 years ago
Michael Peter Christen 906b5fd742 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen 8f90767889 fix for filesystem crawl
10 years ago