reger
1160b13172
remove unused md5 from ViewFile servlet params
9 years ago
reger
e163ea88f6
fix vsdParser (Visio) parser return statement
...
(final block un-necessary throw)
9 years ago
reger
b2c8bc0ae6
remove md5_s from default index fields
...
it is not assigned a value / not used
Due to above also excluded from transfer protocol.
9 years ago
reger
90686a75a2
fix flux factor (additional crawl delay by access count) calculation
9 years ago
reger
d79fa7fbeb
upd to Jetty v9.2.14.v20151106
9 years ago
reger
297fdb60d3
throw exception if crawler hostqueue can't create hostpath directory.
...
In rare cases hostname may not be a valid filesystem directory name,
which can't be created (e.g. containing '*' char). To prevent crawl queue
looping on this invalid entry by throwing a malformedurlexception.
9 years ago
reger
97cc03ef6a
start using a template for urlproxy header
...
It is included as iframe /proxmsg/urlproxyheader.html
to allow full servlet functionallity and flexibility to display some
index/meta data in future.
9 years ago
reger
d08e421809
fix link to logo (yacysearch.xsl)
9 years ago
reger
0d3c5b223e
have psParser cleanup temp file
9 years ago
reger
7d0d19cb8e
avoid File.deleteOnExit() on temp files
...
JVM registers each file in a list regardless of already deleted and never
cleans up the list during runtime.
This accumulates to a considerable amount of mem during large crawls and/or
long uptime.
To tackle this, all temp files are now created in a subdir of java.io.tmpdir
and the jvm tmpdir property is set to this subdir, which is deleted by
code on shutdown.
Additionally let pdfParser use this tmp subdir too.
9 years ago
reger
02e4489a23
set tmpfile.deleteOnExit by default,
...
to make sure files are removed on shutdown.
9 years ago
reger
2985baaa01
Exclude repetitive protocol part in tokenized url
...
used as description if none is avail. from parser.
9 years ago
reger
ca3d26a401
harmonize wordsintitle & CollectionSchema.title_words_val calculation,
...
remove obsolete partial init of wordreference from urimetadata
9 years ago
reger
7bf03856d1
add link to quick select blacklist
...
from title list
9 years ago
reger
440ce6d198
add German translation to re-crawl job
9 years ago
reger
5362a80f1c
upd to httpcore 4.4.4
9 years ago
reger
e90593450c
upd to TwelveMonkeys ImageIO 3.2
9 years ago
reger
b4dbff6a6a
fix yacysearch.json "totalResults"
...
element "totalResults" is included twice (at begin & end),
only the element after performing the search holds number > 0
see http://mantis.tokeek.de/view.php?id=608
9 years ago
reger
52a9040ae6
Sort out double keywords (dc_subject) early in parsed documents
...
- by direct using Set vs. List
- remove not neede String[] getter
9 years ago
reger
47d70732f6
improve locale translator
...
- skip empty line
- robustness file section detection (space independant)
9 years ago
sixcooler
646afe9183
do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler
194df613de
not using 'location' as defaultfacetfield - since we removed it being
...
default.
9 years ago
sixcooler
d3b9349b6f
simplification / speedup of GenerationMemoryStrategy
9 years ago
sixcooler
f5a9948860
do not store subfield *_coordinate
9 years ago
sixcooler
fca353e5eb
set startuptype of most solr handlers to lazy
9 years ago
sixcooler
4a905ec134
fix to not let the AccessTracker-Log grow to much, but have enough data
...
to monitor.
(+gitignore-correction)
9 years ago
sixcooler
209f502f09
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
20e18d79f8
harmonize document title for archive parsers
9 years ago
sixcooler
d481653202
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
112ae013f4
update bzip and bzip parser process,
...
to return one document for the file with combined parser results of the
containing file and registers it with supplied url and mime of the archive.
9 years ago
reger
e76a90837b
update zip and tar parser process,
...
to return one document for the file with combined parser results of the
containing files.
9 years ago
sixcooler
bc610e5382
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
8532565c7d
optimize order of parsers to try
...
- start with a parser matching the remote supplied mime
9 years ago
reger
681889ae64
use current tar library for untar files
...
- remove old source copy
9 years ago
reger
5d71fc70e3
fix tarParser early exit on looping content
...
- adjust check of data available according to doc
- return null on no recognized content (to not exit TextParser next parser try)
- use commons.compress directly
9 years ago
reger
2fcf6f104c
fix bzipParser recognition
...
- Bzip2Inputstream checks magic byte itself to identify bz2 (leave it in input)
- try to suppy fitting mime for parsing bz2 content
9 years ago
reger
a60b1fb6c2
differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger
02afba730e
fix detection of https port changed after set in System Admin
9 years ago
reger
11f3666660
increase use of pre.defined CATCHALL_QUERY string
9 years ago
reger
a58ee49307
Optimize internal imagequery focus on using content_type to select images
...
(in favor of url file extension)
9 years ago
sixcooler
b61f91f0d4
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
81f53fc83a
upd readme.mediawiki min java version 1.7
9 years ago
reger
d223cf0ae4
adjust MediaWiki importer geo coordinate calculation
...
- allow lat/long 0.xxx
- south / west assignment
include test class
9 years ago
sixcooler
7e2723a894
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
2b775d5be6
fix typo in WikiCode coordinate calculation
9 years ago
reger
a2dcf64039
fix IndexImportMediawiki_p servlet's refresh header
...
add url parameter to make sure no parameter are included in refresh url
which could cause unwanted restart of import job
see http://mantis.tokeek.de/view.php?id=591 comments
9 years ago
reger
bbe9df2bb3
fix MediawikiImporter for bz2 dump
...
skip reading bz2 file magicbyte to identify bz2 format as inputstream reset would be required. Common compress reads and checks the magicbytes internally and throws ioexception if wrong, making preread obsolete.
9 years ago
reger
c6687dd560
fix a system.out to log.fine
...
in bmpParser
9 years ago
reger
c720b4c249
remove override of dynamicField coordinate_p in solr schema
...
(coordinate_p is not a mandatory field as such doesn't need to be declared as schema.field)
9 years ago
reger
e53c6bbd51
fix init of peer flags
...
(remove hiding of ssl flag)
9 years ago